Difference between revisions of "Famos"

From Phaserwiki
(Usage)
(Usage from the command line)
 
(11 intermediate revisions by the same user not shown)
Line 10: Line 10:
  
 
'''phaser.famos''' attempts to find the best superposition of two different molecular replacement solutions of the same dataset termed moving.pdb and fixed.pdb with respect to all symmetry operations and alternate origin shifts permitted by the spacegroup of the crystal. If either of the pdb files contain more chains each chain will be tested for their best match against chains in the other pdb file. It does so by calculating a score value, MLAD (see Algorithm), for all possible symmetry operations and alternate origin shifts of each chain in moving.pdb compared with chains in fixed.pdb. The transformation with the smallest MLAD is retained for that particular pair of chains.
 
'''phaser.famos''' attempts to find the best superposition of two different molecular replacement solutions of the same dataset termed moving.pdb and fixed.pdb with respect to all symmetry operations and alternate origin shifts permitted by the spacegroup of the crystal. If either of the pdb files contain more chains each chain will be tested for their best match against chains in the other pdb file. It does so by calculating a score value, MLAD (see Algorithm), for all possible symmetry operations and alternate origin shifts of each chain in moving.pdb compared with chains in fixed.pdb. The transformation with the smallest MLAD is retained for that particular pair of chains.
 
==Usage==
 
  
 
The script returns the best matches between pairs of chains in the two files. This includes the MLAD values, chain IDs, symmetry transformations and alternate origins for the two structures. If the same origin shift is not applied to all pairs of chains a warning will be printed. If the space group of the crystal has a floating origin this generally results in an origin offset between chains that is not a rational number.
 
The script returns the best matches between pairs of chains in the two files. This includes the MLAD values, chain IDs, symmetry transformations and alternate origins for the two structures. If the same origin shift is not applied to all pairs of chains a warning will be printed. If the space group of the crystal has a floating origin this generally results in an origin offset between chains that is not a rational number.
Line 17: Line 15:
 
The superposed structure and the fixed structure can be visually inspected in a molecular viewer such as Coot or Pymol. Matches with low MLAD scores will have correspondingly good superpositions.
 
The superposed structure and the fixed structure can be visually inspected in a molecular viewer such as Coot or Pymol. Matches with low MLAD scores will have correspondingly good superpositions.
  
==Command line==
+
==Usage from the command line==
  
 
'''phaser.famos''' uses [https://www.phenix-online.org/documentation/file_formats.html#phil-files-eff-def-phil Phenix PHIL input] in either a text file or as keywords like:
 
'''phaser.famos''' uses [https://www.phenix-online.org/documentation/file_formats.html#phil-files-eff-def-phil Phenix PHIL input] in either a text file or as keywords like:
  
   phaser.famos moving.pdb=pdbfile1 fixe.pdb=pdbfile2
+
   phaser.famos moving.pdb=pdbfile1 fixed.pdb=pdbfile2
  
 
or:
 
or:
Line 27: Line 25:
 
   phaser.famos my_phil_input.txt.
 
   phaser.famos my_phil_input.txt.
  
The PHIL input specifies the MR solution files either as "moving.pdb" and "fixed.pdb" or as the scopes, "moving" and "fixed". Both these scopes must hold valid content. The two scopes hold the parameter "xyzfname" and the sub-scopes "mrsolution" and "pickle_solution". For a scope to be valid one and only of the methods (1), (2) and (3) must be followed:
+
The PHIL input scopes, moving and fixed, specifies the MR solutions. Both scopes are programmatically equivalent and must be non-empty. This means that a scope should either specify the parameter xyzfname or the sub-scope mrsolution or the sub-scope pickle.solution.
   
+
 
*Assign the "moving.pdb" or "fixed.pdb" parameter to the pdb file from one of the MR solutions in question. This is useful for simple cases such as when the solution comes from the PHENIX MRage GUI is or when it is from a different MR program other than Phaser.
+
Examples:
 +
*Use the pdb file from one of the MR solutions in a scope. This is useful for simple cases such as when the solution comes from the PHENIX MRage GUI or when it is obtained from another MR program than Phaser.
 +
*Specify a Phaser MR solution by assigning the .sol file name of the solution to the mrsolution.solfname parameter of the sub-scope as well as assigning the IDs of the ensembles and the corresponding MR models to the parameters mrsolution.ensembles.name and mrsolution.ensembles.xyzfname. Multiple search components are specified as multiple ensembles. This way of specifying solutions is useful for solution files produced when Phaser is run from the command line and it produces several solutions of which one is the correct solution.
 +
*Use a solution from the Phaser-MR GUI in PHENIX. In that case the parameter pickle.solution.pklfname in the sub-scope should be assigned to the solution file that is produced by PHENIX after an MR calculation. The pickle.solution.philfname should then be assigned to the input file for the MR calculation. This way of specifying solutions is useful when MR solutions are available from having run Phaser through the PHENIX interface.
 +
*If space group and the unit cell dimensions is not available from any of the input files then these need to be specified by assigning the parameter, spacegroupfname, either to a PDB file with a CRYST1 record or to an MTZ file with that information. This should be the data file used for the molecular replacement calculation.
  
*Specify a Phaser MR solution by assigning the file name of the solution file to the "moving.mrsolution.solfname" or "fixed.mrsolution.solfname" parameter as well as assigning the IDs of the ensembles and the corresponding MR models to the parameters "moving.mrsolution.ensembles.name" or "fixed.mrsolution.ensembles.name" and "moving.mrsolution.ensembles.xyzfname" or "fixed.mrsolution.ensembles.xyzfname" respectively. Multiple components are specified as multiple ensembles. This way of specifying solutions is useful for solution files produced when Phaser is run from the command line.
 
   
 
*Use a solution from the PhaserMR GUI in PHENIX. In that case the parameter "moving.pickle_solution.pklfname" or "fixed.pickle_solution.pklfname" is assigned to the solution file that is produced by PHENIX after an MR calculation. The "moving.pickle_solution.philfname" or "fixed.pickle_solution.philfname" is then assigned to the input file for the MR calculation. This way of specifying solutions is useful when MR solutions are available from having run Phaser through the PHENIX interface.
 
*There is no restriction on what method to use for the fixed scope depending on the method used on the moving scope, and vice versa. If space group and the unit cell dimensions is not available when for instance (2) is used for both the moving and the fixed scopes then these need to be specified by assigning the parameter, "spacegroupfname", to a PDB file with a CRYST1 record or to an MTZ file with that information. Typically this would be the data file used for the molecular replacement calculation.
 
  
 
===Examples of PHIL input===
 
===Examples of PHIL input===
  
Unless the input just constitutes of two PDB files a PHIL file is the easiest way to enter input. A few examples of PHIL for phaser.famos are given below.
+
Unless the input just constitutes of two PDB files a PHIL file is the easiest way to enter input. A few examples of PHIL for '''phaser.famos''' are given below.
  
 
Testing a command-line Phaser MR solution file against a solution specified as a PDB file:
 
Testing a command-line Phaser MR solution file against a solution specified as a PDB file:
Line 66: Line 64:
 
     pklfname = "testdata/phaser_mr_13.pkl"
 
     pklfname = "testdata/phaser_mr_13.pkl"
 
   }
 
   }
 
 
   AltOrigSymMates.moving.pickle_solution
 
   AltOrigSymMates.moving.pickle_solution
 
   {
 
   {
Line 89: Line 86:
 
     }
 
     }
 
   }
 
   }
 
 
   AltOrigSymMates.moving.mrsolution
 
   AltOrigSymMates.moving.mrsolution
 
   {
 
   {
Line 126: Line 122:
 
==Output==
 
==Output==
  
The closest match between chains in the moving section to chains in the fixed section is saved with the name of the pdb file in the fixed scope, concatenated with the name of the pdb file in the moving scope but prepended with "MinMLAD_". A log file with the name of the pdb file in the fixed scope, concatenated with the name of the pdb file in the moving scope but prepended with "AltOrigSymMLAD_" is written containing standard output. All files are saved in the current working directory. If MR solutions specified in the PHIL contains multiple solutions then phaser.famos will output mulitple log files corresponding to each MR solution.
+
The closest match between chains in the moving section to chains in the fixed section is saved with the name of the pdb file in the fixed scope, concatenated with the name of the pdb file in the moving scope but prepended with "MinMLAD_". A log file with the name of the pdb file in the fixed scope, concatenated with the name of the pdb file in the moving scope but prepended with "AltOrigSymMLAD_" is written containing standard output. All files are saved in the current working directory. If MR solutions specified in the PHIL contains multiple solutions then '''phaser.famos''' will output mulitple log files corresponding to each MR solution.
  
 
===Do the MR solutions match?===
 
===Do the MR solutions match?===
Line 134: Line 130:
 
==Algorithm==
 
==Algorithm==
  
phaser.famos computes configurations by looping over all symmetry operations and alternative origin shifts. An alignment between C-alpha atoms from moving_pdb and fixed_pdb is computed using secondary structure matching (SSM). If that fails or if the MLAD score achieved with SSM is larger than 2.0 an alignment is computed using MMTBX alignment functions which is part of the CCTBX. To estimate the best match a distance measure between the aligned C-alpha atoms is computed for each configuration. The mean log absolute deviation (MLAD) is defined as:
+
'''phaser.famos''' computes configurations by looping over all symmetry operations and alternative origin shifts. An alignment between C-alpha atoms from moving.pdb and fixed.pdb is computed using secondary structure matching (SSM). If that fails or if the MLAD score achieved with SSM is larger than 2.0 an alignment is computed using MMTBX alignment functions which is part of the CCTBX. To estimate the best match a distance measure between the aligned C-alpha atoms is computed for each configuration. The mean log absolute deviation (MLAD) is defined as:
  
 
MLAD(dR) = Σ( log(dr·dr/(|dr| + 0.9) + max(0.9, min(dr·dr,1))) - log(0.9)),
 
MLAD(dR) = Σ( log(dr·dr/(|dr| + 0.9) + max(0.9, min(dr·dr,1))) - log(0.9)),
Line 140: Line 136:
 
where dr is the difference vector between a pair of aligned C-alpha atoms and the sum is taken over all atom pairs in the alignment. The factor log(0.9) is subtracted to ensure that MLAD(0) = 0.0, i.e. that two identical structures produces the value zero.
 
where dr is the difference vector between a pair of aligned C-alpha atoms and the sum is taken over all atom pairs in the alignment. The factor log(0.9) is subtracted to ensure that MLAD(0) = 0.0, i.e. that two identical structures produces the value zero.
  
MLAD can loosely be interpreted as a distance measure between structures. But it is not a metric in a strict mathematical sense since the triangle inequality is not fulfilled. Unlike a plain root mean square deviation the logarithm in the MLAD formula will downplay contributions of atom pairs where the atoms are spatially distant. If an RMSD were employed such contributions would contribute with the same amount as those atom pairs that can be superposed perfectly. This in turn may lead to incorrect super-positions when the chains tested against one another consists of multiple domains where one domain has undergone a domain motion, i.e. where a subset of atoms in one chain are bound to be spatially distant from the atoms in the other chain they have been paired with.
+
MLAD can loosely be interpreted as a distance measure between structures. But it is not a metric in a strict mathematical sense since the triangle inequality is not fulfilled. Unlike a plain root mean square deviation the logarithm in the MLAD formula will downplay contributions of atom pairs where the atoms are spatially distant. If an RMSD were employed such atom pairs would contribute on an equal footing with those groups of atom pairs that can be superposed perfectly. This in turn may lead to non-optimal superpositions when the structures tested against one another consists of multiple domains where one domain has undergone a domain motion, i.e. where a subset of atoms in one chain are bound to be spatially distant from the atoms in the other chain they have been paired with.
 
 
'''phaser.famos''' will for each chain in the fixed scope find the smallest MLAD with a copy of each chain in the moving scope for a given symmetry operation and alternative origin. When all chains in the fixed scope have been tested these copies will be saved to a file. For space groups with floating origin the minimum MLAD is found by doing a Golden Sectioning minimization along the polar axis for each copy of chains in moving_pdb.
 
  
 +
'''phaser.famos''' will for each chain in the fixed scope find the smallest MLAD with a copy of each chain in the moving scope for a given symmetry operation and alternative origin. When all chains in the fixed scope have been tested these copies will be saved to a file. For spacegroups with floating origin the minimum MLAD is found by doing a Golden Sectioning minimization along the polar axis for each copy of chains in moving.pdb.
  
 
===Caveats===
 
===Caveats===

Latest revision as of 16:45, 8 September 2016

phaser.famos (phaser.find_alt_orig_sym_mate) is a script for determining the best common origin for different molecular replacement solutions, in real space. The coordinates do not need to be identical. The common origin is found via secondary structure matching.

Author

Robert D. Oeffner

Purpose

phaser.famos attempts to find the best superposition of two different molecular replacement solutions of the same dataset termed moving.pdb and fixed.pdb with respect to all symmetry operations and alternate origin shifts permitted by the spacegroup of the crystal. If either of the pdb files contain more chains each chain will be tested for their best match against chains in the other pdb file. It does so by calculating a score value, MLAD (see Algorithm), for all possible symmetry operations and alternate origin shifts of each chain in moving.pdb compared with chains in fixed.pdb. The transformation with the smallest MLAD is retained for that particular pair of chains.

The script returns the best matches between pairs of chains in the two files. This includes the MLAD values, chain IDs, symmetry transformations and alternate origins for the two structures. If the same origin shift is not applied to all pairs of chains a warning will be printed. If the space group of the crystal has a floating origin this generally results in an origin offset between chains that is not a rational number.

The superposed structure and the fixed structure can be visually inspected in a molecular viewer such as Coot or Pymol. Matches with low MLAD scores will have correspondingly good superpositions.

Usage from the command line

phaser.famos uses Phenix PHIL input in either a text file or as keywords like:

 phaser.famos moving.pdb=pdbfile1 fixed.pdb=pdbfile2

or:

 phaser.famos my_phil_input.txt.

The PHIL input scopes, moving and fixed, specifies the MR solutions. Both scopes are programmatically equivalent and must be non-empty. This means that a scope should either specify the parameter xyzfname or the sub-scope mrsolution or the sub-scope pickle.solution.

Examples:

  • Use the pdb file from one of the MR solutions in a scope. This is useful for simple cases such as when the solution comes from the PHENIX MRage GUI or when it is obtained from another MR program than Phaser.
  • Specify a Phaser MR solution by assigning the .sol file name of the solution to the mrsolution.solfname parameter of the sub-scope as well as assigning the IDs of the ensembles and the corresponding MR models to the parameters mrsolution.ensembles.name and mrsolution.ensembles.xyzfname. Multiple search components are specified as multiple ensembles. This way of specifying solutions is useful for solution files produced when Phaser is run from the command line and it produces several solutions of which one is the correct solution.
  • Use a solution from the Phaser-MR GUI in PHENIX. In that case the parameter pickle.solution.pklfname in the sub-scope should be assigned to the solution file that is produced by PHENIX after an MR calculation. The pickle.solution.philfname should then be assigned to the input file for the MR calculation. This way of specifying solutions is useful when MR solutions are available from having run Phaser through the PHENIX interface.
  • If space group and the unit cell dimensions is not available from any of the input files then these need to be specified by assigning the parameter, spacegroupfname, either to a PDB file with a CRYST1 record or to an MTZ file with that information. This should be the data file used for the molecular replacement calculation.


Examples of PHIL input

Unless the input just constitutes of two PDB files a PHIL file is the easiest way to enter input. A few examples of PHIL for phaser.famos are given below.

Testing a command-line Phaser MR solution file against a solution specified as a PDB file:

 AltOrigSymMates.fixed.mrsolution
 {
   solfname = "testdata/MR_3ECI_A0_2P82_A0.sol"
   ensembles
   {
     name = "MR_2P82_A0"
     xyzfname = "testdata/sculpt_2P82_A0.pdb"
   }
   ensembles
   {
     name = "MR_3ECI_A0"
     xyzfname = "testdata/sculpt_3ECI_A0.pdb"
   }
 }
 
 AltOrigSymMates.moving_pdb="testdata/2z0d.pdb"

Testing two set of solution files from the PHENIX Phaser-MR GUI against one another:

 AltOrigSymMates.fixed.pickle_solution
 {
   philfname = "testdata/phaser_mr_13.eff"
   pklfname = "testdata/phaser_mr_13.pkl"
 }
 AltOrigSymMates.moving.pickle_solution
 {
   philfname = "testdata/phaser_mr_11.eff"
   pklfname = "testdata/phaser_mr_11.pkl"
 }

Testing two solution files from the command-line version of Phaser against one another:

 AltOrigSymMates.fixed.mrsolution
 {
   solfname = "testdata/MR_3ECI_A0_2P82_A0.sol"
   ensembles
   {
     name = "MR_2P82_A0"
     xyzfname = "testdata/sculpt_2P82_A0.pdb"
   }
   ensembles
   {
     name = "MR_3ECI_A0"
     xyzfname = "testdata/sculpt_3ECI_A0.pdb"
   }
 }
 AltOrigSymMates.moving.mrsolution
 {
   solfname = "testdata/MR_2ZPN_A0_2P82_A0.sol"
   ensembles
   {
     name = "MR_2P82_A0"
     xyzfname = "testdata/sculpt_2P82_A0.pdb"
   }
   ensembles
   {
     name = "MR_2ZPN_A0"
     xyzfname = "testdata/sculpt_2ZPN_A0.pdb"
   }
 }
 AltOrigSymMates.spacegroupfname = "testdata/2Z0D.mtz"

For more information on the PHIL input see the bottom of this page.

Also move HETATM (hetero atoms)

Invoking this flag will move hetero atoms (ligands, waters, metals, etc.) in conjunction with their associated peptide chain. The program first invokes phenix.sort_hetatms as to associate hetero atoms sensibly with adjacent peptide chains.

After having identified the transformation for a chain yielding the smallest MLAD score it then subjects the associated hetero atoms to the same transformation.

Debug mode

Invoking the debug flag produces individual pdb files of the C-alpha atoms used for each SSM alignment of the moving scope for each permitted symmetry operation and alternative origin. A gold atom is placed at the centroid of the C-alpha atoms. Similar files are produced for the fixed scope. These files are stored in a subfolder named "AltOrigSymMatesFiles".

If a floating origin is present in the space group a table of MLAD values is produced by sliding a copy of the chains in the moving scope along the polar axis in the fractional interval [-0.5, 0.5] for each permitted symmetry operation and alternative origin.

List of all available keywords

See Phenix documentation for phenix.find_alt_orig_sym_mate

Output

The closest match between chains in the moving section to chains in the fixed section is saved with the name of the pdb file in the fixed scope, concatenated with the name of the pdb file in the moving scope but prepended with "MinMLAD_". A log file with the name of the pdb file in the fixed scope, concatenated with the name of the pdb file in the moving scope but prepended with "AltOrigSymMLAD_" is written containing standard output. All files are saved in the current working directory. If MR solutions specified in the PHIL contains multiple solutions then phaser.famos will output mulitple log files corresponding to each MR solution.

Do the MR solutions match?

A good match between two chains usually have a MLAD value below 1.5 whereas a bad match usually have a value above 2.0. This is a rule of thumb and exceptions do occur. It is advisable to visually inspect that the structures superpose one another in a molecular graphics viewing program.

Algorithm

phaser.famos computes configurations by looping over all symmetry operations and alternative origin shifts. An alignment between C-alpha atoms from moving.pdb and fixed.pdb is computed using secondary structure matching (SSM). If that fails or if the MLAD score achieved with SSM is larger than 2.0 an alignment is computed using MMTBX alignment functions which is part of the CCTBX. To estimate the best match a distance measure between the aligned C-alpha atoms is computed for each configuration. The mean log absolute deviation (MLAD) is defined as:

MLAD(dR) = Σ( log(dr·dr/(|dr| + 0.9) + max(0.9, min(dr·dr,1))) - log(0.9)),

where dr is the difference vector between a pair of aligned C-alpha atoms and the sum is taken over all atom pairs in the alignment. The factor log(0.9) is subtracted to ensure that MLAD(0) = 0.0, i.e. that two identical structures produces the value zero.

MLAD can loosely be interpreted as a distance measure between structures. But it is not a metric in a strict mathematical sense since the triangle inequality is not fulfilled. Unlike a plain root mean square deviation the logarithm in the MLAD formula will downplay contributions of atom pairs where the atoms are spatially distant. If an RMSD were employed such atom pairs would contribute on an equal footing with those groups of atom pairs that can be superposed perfectly. This in turn may lead to non-optimal superpositions when the structures tested against one another consists of multiple domains where one domain has undergone a domain motion, i.e. where a subset of atoms in one chain are bound to be spatially distant from the atoms in the other chain they have been paired with.

phaser.famos will for each chain in the fixed scope find the smallest MLAD with a copy of each chain in the moving scope for a given symmetry operation and alternative origin. When all chains in the fixed scope have been tested these copies will be saved to a file. For spacegroups with floating origin the minimum MLAD is found by doing a Golden Sectioning minimization along the polar axis for each copy of chains in moving.pdb.

Caveats

The program tests all solutions present in solution files entered as fixed scope against all solutions present in solution files entered as moving scope. Consequently the execution time is proportional to the number of solutions in fixed scope times the number of solutions in moving scope.

The execution time is proportional to the number of SSM alignments being tested; if SSM identifies 12 alignments the program will take 12 times as long.

Changes

The command-line syntax mentioned in the Computational Crystallography Newsletter 2012 January has been replaced by PHIL syntax.

Literature

Algorithms for deriving crystallographic space-group information R.W. Große-Kunstleve Acta Cryst. A55, 383-395 (1999)

phenix.find_alt_orig_sym_mate Robert D. Oeffner, Gábor Bunkóczi and Randy J. Read Computational Crystallography Newsletter 2012 January, 5-10 (2012)

Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. E. Krissinel and K. Henrick Acta Cryst. D60, 2256-2268 (2004)