Python Interface

From Phaserwiki
Revision as of 13:42, 4 May 2012 by Airlie (talk | contribs) (Functions relevant to the Run-Jobs)

As an alternative to keyword input, Phaser can be called directly from a python script. This is the way Phaser is called in Phenix and we encourage developers of other automation pipelines to use the python scripting too. In order to call Phaser in python you will need to have Phaser installed from source.

Input-Objects, Run-Jobs, and Results-Objects

Using Phaser through the python interface is similar to using Phaser through the keyword interface. Each mode of operation of Phaser described above is controlled by an "input-object" (similar to the command script), has a Phaser "run-job" which runs the Phaser executable for the corresponding mode, and produces a "result-object" (which includes the logfile text). The user input is passed to the "input-object" with a calls to set- or add- functions. Phaser is then run with a call to the "run-job" function, which takes the "input-object" for control. Results are returned from the "result-object" with get-functions.

Functionality Input-Object Run-Job Results-Object
Anisotropy Correction i = InputANO() r = runANO(i) ResultANO()
Cell Content Analysis i = InputCCA() r = runCCA(i) ResultCCA()
Normal Mode Analysis i = InputNMA() r = runNMA(i) ResultNMA()
Automated MR i = InputMR_AUTO() r = runMR_AUTO(i) ResultMR()
Fast Rotation Function i = InputMR_FRF() r = runMR_FRF(i) ResultMR_RF()
Brute Rotation Function i = InputMR_BRF() r = runMR_BRF(i) ResultMR_RF()
Fast Translation Function i = InputMR_FTF() r = runMR_FTF(i) ResultMR_TF()
Brute Translation Function i = InputMR_BTF() r = runMR_BTF(i) ResultMR_TF()
Refinement and Phasing i = InputMR_RNP() r = runMR_RNP(i) ResultMR()
Log-Likelihood Gain i = InputMR_LLG() r = runMR_LLG(i) ResultMR()
Packing i = InputMR_PAK() r = runMR_PAK(i) ResultMR()
Automated Experimental Phasing i = InputEP_AUTO() r = runEP_AUTO(i) ResultEP()
SAD Experimental Phasing i = InputEP_SAD() r = runEP_SAD(i) ResultEP()

The major difference between running Phaser though the keyword interface and running Phaser though the python scripting is that the data reading and Phaser functionality are separated. For the Phaser "run-job" functions, the reflection data (for Miller indices, Fobs and SigmaFobs) are simply arrays, the space group is given as a Hall string, and the unitcell is given as an array of 6 numbers. This is an important feature of the Phaser python scripting as it means that the Phaser "run-job" functions are not tied to mtz file input, but the data can be read in python from any file format, and then the data passed to Phaser.

For the convenience of developers and users, the python scripting comes with data-reading jiffies to read data from mtz files. (These are the same mtz reading jiffies that are used internally by Phaser when calling Phaser from keyword input.)

Functionality Input-Object Run-Job Result-Object
Read Data for MR i = InputMR_DAT() r = runMR_DAT(i) ResultMR_DAT()
Read Data for EP i = InputEP_DAT() r = runEP_DAT(i) ResultEP_DAT()

Input-Object set- and add-Functions

The syntax of the set- and add- functions on the "input-objects" mirror the keyword input. Each "input-object" only has set- or add- functions corresponding to the keywords that are relevant for that mode. Attempting to set a value on an "input-object" that is irrelevant for that mode will result in an error. This differs from the keyword input, where the parser simply ignores any keywords that are not relevant to the current mode. Some functions are common to all input-objects (described in the table below).

Note that setting the space group by name or number does not specify the setting. It is best to set the space group via the Hall symbol, which is unique to the full definition of the space group.

Input Objects Python Set Function
ROOT filename i.setROOT(filename)
MUTE [ON|OFF] i.setMUTE(True|False)
TITLe title i.setTITL(title)
VERBose [ON|OFF] i.setVERB(True|False)
VERBose [ON|OFF] EXTRA i.setVERB_EXTRA(True|False)
SPACegroup name i.setSPAC_NAME(name)
SPACegroup number i.setSPAC_NUM(number)
SPACegroup Hall i.setSPAC_HALL(hall)
CELL a b c alpha beta gamm i.setCELL(a,b,c,alpha,beta,gamma)
Cell set from array of 6 numbers i.setCELL([a,b,c,alpha,beta,gamma])

Results-Object get-Functions

Data are extracted from the "result-objects" with get-functions. The get-functions are mostly specific to the type of "result-object" (described in sections below), but some are common to all "result-objects" (described in table below).

Ralf Grosse-Kunstleve's scitbx::af::shared<double> array type is heavily used for passing of arrays into the Phaser "input-objects" and extracting arrays from the Phaser "result-objects". This is a reference counted array type that can be used directly in python and in C++. It is part of the Phaser installation, when Phaser is installed from source. The scitbx (SCIentific ToolBoX) is part of the cctbx (Computational Crystallography ToolBoX) which is hosted by sourceforge

Results Objects Python Get Function
Exit status "success" r.Success()
Exit status "failure" r.Failure()
Type of Error (see error table). SYNTAX errors are not thrown in python
as they are generated by keyword input
Message associated with error r.ErrorMessage()
Text of Summary r.summary()
Text of Logfile r.logfile()
Text of Verbose Logfile r.verbose()
SpaceGroup Hall Symbol r.getSpaceGroupHall()
SpaceGroup Name (Hermann Mauguin, edited for CCP4 compatibility
in R3 H3 R32 H32)
SpaceGroup Number r.getSpaceGroupNumber()
Number of symmetry operators r.getSpaceGroupNSYMM()
Number of primative symmetry operators r.getSpaceGroupNSYMP()
Symmetry operator #s, Rotation matrix element i,j (range 0-2) r.getSpaceGroupR(s,i,j)
Symmetry operator #s, Translation vector element i (range 0-2) r.getSpaceGroupT(s,i)
Unit Cell (array of 6 numbers) r.getUnitCell()

Error Handling

Exit status is indicated by Success() and Failure() functions of the "result-objects". Success indicates successful execution of Phaser, not that it has solved the structure! For molecular replacement jobs, the foundSolutions() function indicates that Phaser has found one or more potential solutions, the numSolutions() function returns how many solutions were found and the uniqueSolution() function returns True if only one solution was found. More detailed error information in the case of Failure is given by ErrorName() and ErrorMessage().

Advanced Information: All errors are thrown and caught internally by the "run-jobs", and so do not generate "Runtime Errors" in the python script. In particular "INPUT" errors are not thrown by the set- or add-functions of the "input-objects", but are stored in the "input-object" and passed to the "result-object" once the "run-job" is called. Results objects are derived from std::exception, and so can be thrown. Function what() returns ErrorName() (not the ErrorMessage()).

Logfile Handling

Writing of the logfile to standard output can be silenced with the i.setMUTE(True) function. The logfile or summary text can then be printed to standard output with the print r.logfile() or print r.summary() functions.

Advanced Information: Setting i.setMUTE(True) prevents real time viewing of the progress of a Phaser job. This may present an inconvenience for users. If you want to view the logfile information but not have it go to standard output, Logfile text can be redirected to a python string using an alternative call to the "run-job" function that includes passing an "output-object" (which controls the Phaser logging methods) on which the output stream has been set to a python string. This feature of Phaser was developed thanks to Ralf Grosse-Kunstleve.

Automated Molecular Replacement

ResultMR Python Get Function
Solutions were found (boolean) r.foundSolutions()
Number of Solutions that were found (int) r.numSolutions()
Only one solution found (boolean) r.uniqueSolution()
LLG values for all solutions in decreasing order r.getValues()
Script output file r.getSolFile()
Xml output file r.getXmlFile()
PDB files corresponding to solutions in decreasing LLG order r.getPdbFiles()
MTZ files corresponding to solutions in decreasing LLG order r.getMtzFiles()
PDB file name of top solution r.getTopPdbFile()
MTZ file name of top solution r.getTopMtzFile()
PDB file name number i r.getPdbFile(i)
MTZ file name number i r.getMtzFile(i)
All file names output r.getFilenames()
Templates matching solution i (returns integer array) r.getTemplatesForSolution(i)
Solutions matching template i (returns integer array) r.getSolutionsForTemplate(i)
Number of PDB files r.getNumPdbFiles()
Number of MTZ files r.getNumMtzFiles()
List of details of solutions (rotation, translation)
in decreasing LLG order (returns mr_solution type)
Top solution set (returns mr_set type) r.getTopSet()
Solution set number i (returns mr_set object) r.getSet(i)

Reading MTZ Files for Experimental Phasing

ResultEP_DAT Python Get Function
Miller Indices (array) r.getMiller()
Non-anomalous F values for crystal "xtal" and dataset "wave" (array) r.getF(xtal,wave)
Non-anomalous SIGF values for crystal "xtal" and dataset "wave" (array) r.getSIGF(xtal,wave)
Boolean flags for F (and SIGF) present for crystal "xtal" and dataset "wave" (array) r.getP(xtal,wave)
Anomalous F+ values for crystal "xtal" and dataset "wave" (array) r.getFpos(xtal,wave)
Anomalous SIGF+ values for crystal "xtal" and dataset "wave" (array) r.getSIGFpos(xtal,wave)
Boolean flags for F+ (and SIGF+) present for crystal "xtal" and dataset "wave" (array) r.getPpos(xtal,wave)
Anomalous F- values for crystal "xtal" and dataset "wave" (array) r.getFneg(xtal,wave)
Anomalous SIGF- values for crystal "xtal" and dataset "wave" (array) r.getSIGFneg(xtal,wave)
Boolean flags for F- (and SIGF-) present for crystal "xtal" and dataset "wave" (array) r.getPneg(xtal,wave)

Automated Experimental Phasing

ResultEP Python Get Function
Log-likelihood of refined solution r.getLogLikelihood()
Miller Indices (array) r.getMiller()
Boolean array flagging reflections included in electron denisty r.getSelected()
Figures of merit for phased dataset (array) r.getFOM()
Amplitudes for weighted electrion density of phased dataset (array) r.getFWT()
Phases for weighted electrion density of phased dataset (array) r.getPHWT()
Phases for electrion density of phased dataset (array) r.getPHIB()
Amplitudes for log-likelihood gradient map r.getFLLG()
Phases for log-likelihood gradient map r.getPHLLG()
Atoms included in final solution for crystal xtal r.getAtoms(xtalid)
Atoms rejected from final solution for crystal xtal r.getRejectedAtoms(xtalid)
f' for atomtype "type" in crystal "xtald" dataset "wave" r.getFp(xtalid,wave,type)
f" for atomtype "type" in crystal "xtald" dataset "wave" r.getFdp(xtalid,wave,type)
Name of output MTZ file r.getMtzFile()
Name of output PDB file r.getPdbFile()
Name of output SOL file r.getSolFile()
Name of output XML file r.getXmlFile()
Overall low resolution limit r.stats_lores()
Overall high resolution limit r.stats_hires()
Overall figure of merit for all reflections r.stats_fom()
Overall figure of merit for acentrics r.stats_acentric_fom()
Overall figure of merit for centrics r.stats_centric_fom()
Overall figure of merit for singleton r.stats_singleton_fom()
Overall number of reflections r.stats_num()
Overall number of acentric reflections r.stats_acentric_num()
Overall number of centric reflections r.stats_centric_num()
Overall number of singleton reflections r.stats_singleton_num()
Number of resolution bins for statistics r.stats_numbins()

Anisotropy Correction

If data LABIN are set (as e.g. Fobs and Sigma) column label in output MTZ file are Fobs_ISO and Sigma_ISO. If LABIN is not set on input object, default output is F_ISO and SIGF_ISO.

ResultANO Python Get Function
Miller Indices (array) r.getMiller()
F values (array) r.getF()
SIGF values (array) r.getSIGF()
Corrected F (array) r.getCorrectedF()
Corrected SIGF (array) r.getCorrectedSIGF()
Correction Factor r.getCorrection()
Apply scale and correction factors to array new_array = r.getScaledCorrected(array)
Factor to put data on absolute scale r.WilsonK()
Wilson B factor r.WilsonB()
Measure of anisotropy r.getAnisoDeltaB()
Eigenvalues of anisotropy r.getEigenBs()
Eigenvectors and Eigenvalues or anisotropy r.getEigenSystem()
Name of output MTZ file r.getMtzFile()
Output MTZ file corrected F label r.getLaboutF()
Output MTZ file corrected SIGF label r.getLaboutSIGF()
Name of output XML file r.getXmlFile()

Cell Content Analysis

ResultCCA Python Get Function
Molecular weight of the assembly used for VM calculations r.getAssemblyVM()
Number of multiples of the assembly within allowed VM range r.getNum()
Array of the multiples (Z) of the assembly within allowed VM range r.getZ()
Array of the values of VM corresponding to the multiples (Z) of the assembly r.getVM()
Array of the probabilities of VM corresponding to the multiples (Z) of the assembly r.getProb()
Most probable multiple (Z) of the assembly r.getBestZ()
VM of the most probable multiple (Z) of the assembly r.getBestVM()
Probability of the most probable multiple (Z) of the assembly r.getBestProb()
XML file name r.getXmlFile()
Optimal VM for space group, unit cell and resolution r.getOptimalVM()
Optimal MW for space group, unit cell and resolution r.getOptimalMW()

Normal Mode Analysis

ResultNMA Python Get Function
Number of total perturbations along combinations of normal modes r.getNum()
Array of all pdb files r.getPdbFiles()
Name of pdb file for perturbation #i r.getPdbFile(i)
Array of normal modes contributing to perturbation #i r.getModes(i)
Array of displacements along modes contributing to perturbation #i r.getDisplacements(i)
Script output file r.getSolFile()
Xml output file r.getXmlFile()