Difference between revisions of "Famos"

From Phaserwiki
(Usage)
(Purpose)
Line 10: Line 10:
  
 
'''phaser.famos''' attempts to find the best superposition of two different molecular replacement solutions of the same dataset termed moving.pdb and fixed.pdb with respect to all symmetry operations and alternate origin shifts permitted by the spacegroup of the crystal. If either of the pdb files contain more chains each chain will be tested for their best match against chains in the other pdb file. It does so by calculating a score value, MLAD (see Algorithm), for all possible symmetry operations and alternate origin shifts of each chain in moving.pdb compared with chains in fixed.pdb. The transformation with the smallest MLAD is retained for that particular pair of chains.
 
'''phaser.famos''' attempts to find the best superposition of two different molecular replacement solutions of the same dataset termed moving.pdb and fixed.pdb with respect to all symmetry operations and alternate origin shifts permitted by the spacegroup of the crystal. If either of the pdb files contain more chains each chain will be tested for their best match against chains in the other pdb file. It does so by calculating a score value, MLAD (see Algorithm), for all possible symmetry operations and alternate origin shifts of each chain in moving.pdb compared with chains in fixed.pdb. The transformation with the smallest MLAD is retained for that particular pair of chains.
 +
 +
The script returns the best matches between pairs of chains in the two files. This includes the MLAD values, chain IDs, symmetry transformations and alternate origins for the two structures. If the same origin shift is not applied to all pairs of chains a warning will be printed. If the space group of the crystal has a floating origin this generally results in an origin offset between chains that is not a rational number.
 +
 +
The superposed structure and the fixed structure can be visually inspected in a molecular viewer such as Coot or Pymol. Matches with low MLAD scores will have correspondingly good superpositions.
  
 
==Usage==
 
==Usage==

Revision as of 16:12, 8 September 2016

phaser.famos (phaser.find_alt_orig_sym_mate) is a script for determining the best common origin for different molecular replacement solutions, in real space. The coordinates do not need to be identical. The common origin is found via secondary structure matching.

Author

Robert D. Oeffner

Purpose

phaser.famos attempts to find the best superposition of two different molecular replacement solutions of the same dataset termed moving.pdb and fixed.pdb with respect to all symmetry operations and alternate origin shifts permitted by the spacegroup of the crystal. If either of the pdb files contain more chains each chain will be tested for their best match against chains in the other pdb file. It does so by calculating a score value, MLAD (see Algorithm), for all possible symmetry operations and alternate origin shifts of each chain in moving.pdb compared with chains in fixed.pdb. The transformation with the smallest MLAD is retained for that particular pair of chains.

The script returns the best matches between pairs of chains in the two files. This includes the MLAD values, chain IDs, symmetry transformations and alternate origins for the two structures. If the same origin shift is not applied to all pairs of chains a warning will be printed. If the space group of the crystal has a floating origin this generally results in an origin offset between chains that is not a rational number.

The superposed structure and the fixed structure can be visually inspected in a molecular viewer such as Coot or Pymol. Matches with low MLAD scores will have correspondingly good superpositions.

Usage

The script returns the best matches between pairs of chains in the two files. This includes the MLAD values, chain IDs, symmetry transformations and alternate origins for the two structures. If the same origin shift is not applied to all pairs of chains a warning will be printed. If the space group of the crystal has a floating origin this generally results in an origin offset between chains that is not a rational number.

The superposed structure and the fixed structure can be visually inspected in a molecular viewer such as Coot or Pymol. Matches with low MLAD scores will have correspondingly good superpositions.

Command line

phaser.famos uses Phenix PHIL input in either a text file or as keywords like:

 phaser.famos moving.pdb=pdbfile1 fixe.pdb=pdbfile2

or:

 phaser.famos my_phil_input.txt.

The PHIL input specifies the MR solution files either as "moving.pdb" and "fixed.pdb" or as the scopes, "moving" and "fixed". Both these scopes must hold valid content. The two scopes hold the parameter "xyzfname" and the sub-scopes "mrsolution" and "pickle_solution". For a scope to be valid one and only of the methods (1), (2) and (3) must be followed:

  • Assign the "moving.pdb" or "fixed.pdb" parameter to the pdb file from one of the MR solutions in question. This is useful for simple cases such as when the solution comes from the PHENIX MRage GUI is or when it is from a different MR program other than Phaser.
  • Specify a Phaser MR solution by assigning the file name of the solution file to the "moving.mrsolution.solfname" or "fixed.mrsolution.solfname" parameter as well as assigning the IDs of the ensembles and the corresponding MR models to the parameters "moving.mrsolution.ensembles.name" or "fixed.mrsolution.ensembles.name" and "moving.mrsolution.ensembles.xyzfname" or "fixed.mrsolution.ensembles.xyzfname" respectively. Multiple components are specified as multiple ensembles. This way of specifying solutions is useful for solution files produced when Phaser is run from the command line.
  • Use a solution from the PhaserMR GUI in PHENIX. In that case the parameter "moving.pickle_solution.pklfname" or "fixed.pickle_solution.pklfname" is assigned to the solution file that is produced by PHENIX after an MR calculation. The "moving.pickle_solution.philfname" or "fixed.pickle_solution.philfname" is then assigned to the input file for the MR calculation. This way of specifying solutions is useful when MR solutions are available from having run Phaser through the PHENIX interface.
  • There is no restriction on what method to use for the fixed scope depending on the method used on the moving scope, and vice versa. If space group and the unit cell dimensions is not available when for instance (2) is used for both the moving and the fixed scopes then these need to be specified by assigning the parameter, "spacegroupfname", to a PDB file with a CRYST1 record or to an MTZ file with that information. Typically this would be the data file used for the molecular replacement calculation.

Examples of PHIL input

Unless the input just constitutes of two PDB files a PHIL file is the easiest way to enter input. A few examples of PHIL for phaser.famos are given below.

Testing a command-line Phaser MR solution file against a solution specified as a PDB file:

 AltOrigSymMates.fixed.mrsolution
 {
   solfname = "testdata/MR_3ECI_A0_2P82_A0.sol"
   ensembles
   {
     name = "MR_2P82_A0"
     xyzfname = "testdata/sculpt_2P82_A0.pdb"
   }
   ensembles
   {
     name = "MR_3ECI_A0"
     xyzfname = "testdata/sculpt_3ECI_A0.pdb"
   }
 }
 
 AltOrigSymMates.moving_pdb="testdata/2z0d.pdb"

Testing two set of solution files from the PHENIX Phaser-MR GUI against one another:

 AltOrigSymMates.fixed.pickle_solution
 {
   philfname = "testdata/phaser_mr_13.eff"
   pklfname = "testdata/phaser_mr_13.pkl"
 }
 AltOrigSymMates.moving.pickle_solution
 {
   philfname = "testdata/phaser_mr_11.eff"
   pklfname = "testdata/phaser_mr_11.pkl"
 }

Testing two solution files from the command-line version of Phaser against one another:

 AltOrigSymMates.fixed.mrsolution
 {
   solfname = "testdata/MR_3ECI_A0_2P82_A0.sol"
   ensembles
   {
     name = "MR_2P82_A0"
     xyzfname = "testdata/sculpt_2P82_A0.pdb"
   }
   ensembles
   {
     name = "MR_3ECI_A0"
     xyzfname = "testdata/sculpt_3ECI_A0.pdb"
   }
 }
 AltOrigSymMates.moving.mrsolution
 {
   solfname = "testdata/MR_2ZPN_A0_2P82_A0.sol"
   ensembles
   {
     name = "MR_2P82_A0"
     xyzfname = "testdata/sculpt_2P82_A0.pdb"
   }
   ensembles
   {
     name = "MR_2ZPN_A0"
     xyzfname = "testdata/sculpt_2ZPN_A0.pdb"
   }
 }
 AltOrigSymMates.spacegroupfname = "testdata/2Z0D.mtz"

For more information on the PHIL input see the bottom of this page.

Also move HETATM (hetero atoms)

Invoking this flag will move hetero atoms (ligands, waters, metals, etc.) in conjunction with their associated peptide chain. The program first invokes phenix.sort_hetatms as to associate hetero atoms sensibly with adjacent peptide chains.

After having identified the transformation for a chain yielding the smallest MLAD score it then subjects the associated hetero atoms to the same transformation.

Debug mode

Invoking the debug flag produces individual pdb files of the C-alpha atoms used for each SSM alignment of the moving scope for each permitted symmetry operation and alternative origin. A gold atom is placed at the centroid of the C-alpha atoms. Similar files are produced for the fixed scope. These files are stored in a subfolder named "AltOrigSymMatesFiles".

If a floating origin is present in the space group a table of MLAD values is produced by sliding a copy of the chains in the moving scope along the polar axis in the fractional interval [-0.5, 0.5] for each permitted symmetry operation and alternative origin.

List of all available keywords

See Phenix documentation for phenix.find_alt_orig_sym_mate

Output

The closest match between chains in the moving section to chains in the fixed section is saved with the name of the pdb file in the fixed scope, concatenated with the name of the pdb file in the moving scope but prepended with "MinMLAD_". A log file with the name of the pdb file in the fixed scope, concatenated with the name of the pdb file in the moving scope but prepended with "AltOrigSymMLAD_" is written containing standard output. All files are saved in the current working directory. If MR solutions specified in the PHIL contains multiple solutions then phaser.famos will output mulitple log files corresponding to each MR solution.

Do the MR solutions match?

A good match between two chains usually have a MLAD value below 1.5 whereas a bad match usually have a value above 2.0. This is a rule of thumb and exceptions do occur. It is advisable to visually inspect that the structures superpose one another in a molecular graphics viewing program.

Algorithm

phaser.famos computes configurations by looping over all symmetry operations and alternative origin shifts. An alignment between C-alpha atoms from moving_pdb and fixed_pdb is computed using secondary structure matching (SSM). If that fails or if the MLAD score achieved with SSM is larger than 2.0 an alignment is computed using MMTBX alignment functions which is part of the CCTBX. To estimate the best match a distance measure between the aligned C-alpha atoms is computed for each configuration. The mean log absolute deviation (MLAD) is defined as:

MLAD(dR) = Σ( log(dr·dr/(|dr| + 0.9) + max(0.9, min(dr·dr,1))) - log(0.9)),

where dr is the difference vector between a pair of aligned C-alpha atoms and the sum is taken over all atom pairs in the alignment. The factor log(0.9) is subtracted to ensure that MLAD(0) = 0.0, i.e. that two identical structures produces the value zero.

MLAD can loosely be interpreted as a distance measure between structures. But it is not a metric in a strict mathematical sense since the triangle inequality is not fulfilled. Unlike a plain root mean square deviation the logarithm in the MLAD formula will downplay contributions of atom pairs where the atoms are spatially distant. If an RMSD were employed such contributions would contribute with the same amount as those atom pairs that can be superposed perfectly. This in turn may lead to incorrect super-positions when the chains tested against one another consists of multiple domains where one domain has undergone a domain motion, i.e. where a subset of atoms in one chain are bound to be spatially distant from the atoms in the other chain they have been paired with.

phaser.famos will for each chain in the fixed scope find the smallest MLAD with a copy of each chain in the moving scope for a given symmetry operation and alternative origin. When all chains in the fixed scope have been tested these copies will be saved to a file. For space groups with floating origin the minimum MLAD is found by doing a Golden Sectioning minimization along the polar axis for each copy of chains in moving_pdb.


Caveats

The program tests all solutions present in solution files entered as fixed scope against all solutions present in solution files entered as moving scope. Consequently the execution time is proportional to the number of solutions in fixed scope times the number of solutions in moving scope.

The execution time is proportional to the number of SSM alignments being tested; if SSM identifies 12 alignments the program will take 12 times as long.

Changes

The command-line syntax mentioned in the Computational Crystallography Newsletter 2012 January has been replaced by PHIL syntax.

Literature

Algorithms for deriving crystallographic space-group information R.W. Große-Kunstleve Acta Cryst. A55, 383-395 (1999)

phenix.find_alt_orig_sym_mate Robert D. Oeffner, Gábor Bunkóczi and Randy J. Read Computational Crystallography Newsletter 2012 January, 5-10 (2012)

Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. E. Krissinel and K. Henrick Acta Cryst. D60, 2256-2268 (2004)