Difference between revisions of "Famos"
|  (→Algorithm) |  (→Usage from the command line) | ||
| Line 19: | Line 19: | ||
| '''phaser.famos''' uses [https://www.phenix-online.org/documentation/file_formats.html#phil-files-eff-def-phil Phenix PHIL input] in either a text file or as keywords like: | '''phaser.famos''' uses [https://www.phenix-online.org/documentation/file_formats.html#phil-files-eff-def-phil Phenix PHIL input] in either a text file or as keywords like: | ||
| − |    phaser.famos moving.pdb=pdbfile1  | + |    phaser.famos moving.pdb=pdbfile1 fixed.pdb=pdbfile2 | 
| or: | or: | ||
Latest revision as of 17:45, 8 September 2016
phaser.famos (phaser.find_alt_orig_sym_mate) is a script for determining the best common origin for different molecular replacement solutions, in real space. The coordinates do not need to be identical. The common origin is found via secondary structure matching.
Author
Robert D. Oeffner
Purpose
phaser.famos attempts to find the best superposition of two different molecular replacement solutions of the same dataset termed moving.pdb and fixed.pdb with respect to all symmetry operations and alternate origin shifts permitted by the spacegroup of the crystal. If either of the pdb files contain more chains each chain will be tested for their best match against chains in the other pdb file. It does so by calculating a score value, MLAD (see Algorithm), for all possible symmetry operations and alternate origin shifts of each chain in moving.pdb compared with chains in fixed.pdb. The transformation with the smallest MLAD is retained for that particular pair of chains.
The script returns the best matches between pairs of chains in the two files. This includes the MLAD values, chain IDs, symmetry transformations and alternate origins for the two structures. If the same origin shift is not applied to all pairs of chains a warning will be printed. If the space group of the crystal has a floating origin this generally results in an origin offset between chains that is not a rational number.
The superposed structure and the fixed structure can be visually inspected in a molecular viewer such as Coot or Pymol. Matches with low MLAD scores will have correspondingly good superpositions.
Usage from the command line
phaser.famos uses Phenix PHIL input in either a text file or as keywords like:
phaser.famos moving.pdb=pdbfile1 fixed.pdb=pdbfile2
or:
phaser.famos my_phil_input.txt.
The PHIL input scopes, moving and fixed, specifies the MR solutions. Both scopes are programmatically equivalent and must be non-empty. This means that a scope should either specify the parameter xyzfname or the sub-scope mrsolution or the sub-scope pickle.solution.
Examples:
- Use the pdb file from one of the MR solutions in a scope. This is useful for simple cases such as when the solution comes from the PHENIX MRage GUI or when it is obtained from another MR program than Phaser.
- Specify a Phaser MR solution by assigning the .sol file name of the solution to the mrsolution.solfname parameter of the sub-scope as well as assigning the IDs of the ensembles and the corresponding MR models to the parameters mrsolution.ensembles.name and mrsolution.ensembles.xyzfname. Multiple search components are specified as multiple ensembles. This way of specifying solutions is useful for solution files produced when Phaser is run from the command line and it produces several solutions of which one is the correct solution.
- Use a solution from the Phaser-MR GUI in PHENIX. In that case the parameter pickle.solution.pklfname in the sub-scope should be assigned to the solution file that is produced by PHENIX after an MR calculation. The pickle.solution.philfname should then be assigned to the input file for the MR calculation. This way of specifying solutions is useful when MR solutions are available from having run Phaser through the PHENIX interface.
- If space group and the unit cell dimensions is not available from any of the input files then these need to be specified by assigning the parameter, spacegroupfname, either to a PDB file with a CRYST1 record or to an MTZ file with that information. This should be the data file used for the molecular replacement calculation.
Examples of PHIL input
Unless the input just constitutes of two PDB files a PHIL file is the easiest way to enter input. A few examples of PHIL for phaser.famos are given below.
Testing a command-line Phaser MR solution file against a solution specified as a PDB file:
 AltOrigSymMates.fixed.mrsolution
 {
   solfname = "testdata/MR_3ECI_A0_2P82_A0.sol"
   ensembles
   {
     name = "MR_2P82_A0"
     xyzfname = "testdata/sculpt_2P82_A0.pdb"
   }
   ensembles
   {
     name = "MR_3ECI_A0"
     xyzfname = "testdata/sculpt_3ECI_A0.pdb"
   }
 }
 
 AltOrigSymMates.moving_pdb="testdata/2z0d.pdb"
Testing two set of solution files from the PHENIX Phaser-MR GUI against one another:
 AltOrigSymMates.fixed.pickle_solution
 {
   philfname = "testdata/phaser_mr_13.eff"
   pklfname = "testdata/phaser_mr_13.pkl"
 }
 AltOrigSymMates.moving.pickle_solution
 {
   philfname = "testdata/phaser_mr_11.eff"
   pklfname = "testdata/phaser_mr_11.pkl"
 }
Testing two solution files from the command-line version of Phaser against one another:
 AltOrigSymMates.fixed.mrsolution
 {
   solfname = "testdata/MR_3ECI_A0_2P82_A0.sol"
   ensembles
   {
     name = "MR_2P82_A0"
     xyzfname = "testdata/sculpt_2P82_A0.pdb"
   }
   ensembles
   {
     name = "MR_3ECI_A0"
     xyzfname = "testdata/sculpt_3ECI_A0.pdb"
   }
 }
 AltOrigSymMates.moving.mrsolution
 {
   solfname = "testdata/MR_2ZPN_A0_2P82_A0.sol"
   ensembles
   {
     name = "MR_2P82_A0"
     xyzfname = "testdata/sculpt_2P82_A0.pdb"
   }
   ensembles
   {
     name = "MR_2ZPN_A0"
     xyzfname = "testdata/sculpt_2ZPN_A0.pdb"
   }
 }
 AltOrigSymMates.spacegroupfname = "testdata/2Z0D.mtz"
For more information on the PHIL input see the bottom of this page.
Also move HETATM (hetero atoms)
Invoking this flag will move hetero atoms (ligands, waters, metals, etc.) in conjunction with their associated peptide chain. The program first invokes phenix.sort_hetatms as to associate hetero atoms sensibly with adjacent peptide chains.
After having identified the transformation for a chain yielding the smallest MLAD score it then subjects the associated hetero atoms to the same transformation.
Debug mode
Invoking the debug flag produces individual pdb files of the C-alpha atoms used for each SSM alignment of the moving scope for each permitted symmetry operation and alternative origin. A gold atom is placed at the centroid of the C-alpha atoms. Similar files are produced for the fixed scope. These files are stored in a subfolder named "AltOrigSymMatesFiles".
If a floating origin is present in the space group a table of MLAD values is produced by sliding a copy of the chains in the moving scope along the polar axis in the fractional interval [-0.5, 0.5] for each permitted symmetry operation and alternative origin.
List of all available keywords
See Phenix documentation for phenix.find_alt_orig_sym_mate
Output
The closest match between chains in the moving section to chains in the fixed section is saved with the name of the pdb file in the fixed scope, concatenated with the name of the pdb file in the moving scope but prepended with "MinMLAD_". A log file with the name of the pdb file in the fixed scope, concatenated with the name of the pdb file in the moving scope but prepended with "AltOrigSymMLAD_" is written containing standard output. All files are saved in the current working directory. If MR solutions specified in the PHIL contains multiple solutions then phaser.famos will output mulitple log files corresponding to each MR solution.
Do the MR solutions match?
A good match between two chains usually have a MLAD value below 1.5 whereas a bad match usually have a value above 2.0. This is a rule of thumb and exceptions do occur. It is advisable to visually inspect that the structures superpose one another in a molecular graphics viewing program.
Algorithm
phaser.famos computes configurations by looping over all symmetry operations and alternative origin shifts. An alignment between C-alpha atoms from moving.pdb and fixed.pdb is computed using secondary structure matching (SSM). If that fails or if the MLAD score achieved with SSM is larger than 2.0 an alignment is computed using MMTBX alignment functions which is part of the CCTBX. To estimate the best match a distance measure between the aligned C-alpha atoms is computed for each configuration. The mean log absolute deviation (MLAD) is defined as:
MLAD(dR) = Σ( log(dr·dr/(|dr| + 0.9) + max(0.9, min(dr·dr,1))) - log(0.9)),
where dr is the difference vector between a pair of aligned C-alpha atoms and the sum is taken over all atom pairs in the alignment. The factor log(0.9) is subtracted to ensure that MLAD(0) = 0.0, i.e. that two identical structures produces the value zero.
MLAD can loosely be interpreted as a distance measure between structures. But it is not a metric in a strict mathematical sense since the triangle inequality is not fulfilled. Unlike a plain root mean square deviation the logarithm in the MLAD formula will downplay contributions of atom pairs where the atoms are spatially distant. If an RMSD were employed such atom pairs would contribute on an equal footing with those groups of atom pairs that can be superposed perfectly. This in turn may lead to non-optimal superpositions when the structures tested against one another consists of multiple domains where one domain has undergone a domain motion, i.e. where a subset of atoms in one chain are bound to be spatially distant from the atoms in the other chain they have been paired with.
phaser.famos will for each chain in the fixed scope find the smallest MLAD with a copy of each chain in the moving scope for a given symmetry operation and alternative origin. When all chains in the fixed scope have been tested these copies will be saved to a file. For spacegroups with floating origin the minimum MLAD is found by doing a Golden Sectioning minimization along the polar axis for each copy of chains in moving.pdb.
Caveats
The program tests all solutions present in solution files entered as fixed scope against all solutions present in solution files entered as moving scope. Consequently the execution time is proportional to the number of solutions in fixed scope times the number of solutions in moving scope.
The execution time is proportional to the number of SSM alignments being tested; if SSM identifies 12 alignments the program will take 12 times as long.
Changes
The command-line syntax mentioned in the Computational Crystallography Newsletter 2012 January has been replaced by PHIL syntax.
Literature
Algorithms for deriving crystallographic space-group information R.W. Große-Kunstleve Acta Cryst. A55, 383-395 (1999)
phenix.find_alt_orig_sym_mate Robert D. Oeffner, Gábor Bunkóczi and Randy J. Read Computational Crystallography Newsletter 2012 January, 5-10 (2012)
Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. E. Krissinel and K. Henrick Acta Cryst. D60, 2256-2268 (2004)
