https://www.phaser.cimr.cam.ac.uk/api.php?action=feedcontributions&user=Rdo20&feedformat=atomPhaserwiki - User contributions [en]2024-03-28T09:01:06ZUser contributionsMediaWiki 1.31.8https://www.phaser.cimr.cam.ac.uk/index.php?title=Source_Code&diff=2477Source Code2018-10-03T16:24:08Z<p>Rdo20: </p>
<hr />
<div>===Repository===<br />
<br />
A public [https://git.csx.cam.ac.uk/x/cimr-phaser/phaser.git/summary Phaser git repository] is available for '''git clone''' and '''git pull''' only. This mirrors commits to the Phaser SVN respository in real time<br />
<br />
The [http://www-structmed.cimr.cam.ac.uk/svn-cgi-bin/viewvc.cgi/ Phaser SVN repository] is located in Cambridge on the CIMR server (password restricted)<br />
<br />
The Berkeley mirror at cci.lbl.gov is updated at midnight Berkeley time<br />
<br />
:/net/cci/auto_build/repositories/phaser<br />
<br />
===Access===<br />
<br />
*You can download nightly builds of Phenix (binaries), which contain the latest version of Phaser that has passed regression tests<br />
*You can compile code with real-time updates from the git repository. This code may not pass regression tests. The git repository is best used for obtaining instant bugfixes, after communication with one of the Phaser developers<br />
*If you are developing a pipeline using Phaser, we are keen to work with you to add features, fix bugs and help you use Phaser optimally<br />
*Note the University of Cambridge's [[ Licences | Licences for Phaser]] with regards to making Phaser part of a pipeline available online<br />
*Source code modifications are allowed under the University of Cambridge's [[ Licences | Licences for Phaser]], provided they are for internal use only. Distribution would require those changes to be incorporated into our SVN repository. <br />
<br />
===Full Access===<br />
<br />
*Requests for permission to commit to the SVN repository via SSH should emailed to [mailto:cimr-phaser@lists.cam.ac.uk phaser-help]<br />
<br />
<br />
===Building Phaser from source===<br />
Phaser can be built as an executable file for the platforms Linux, MacOS, Windows (using VC++ 9.0) and Windows (using g++ in MinGW-W64). It can also be built as python modules useful for python scripting. There are two ways to achieve this. <br />
<br />
====Building from an existing CTBX Installation====<br />
The quick way is to start from an existing installation of CCTBX (available from http://cci.lbl.gov/cctbx_build/) to allow building the Phaser executable. Install CCTBX for the desired platform on your system first. For MinGW-W64 use the Windows build of CCTBX.<br />
Assuming python2.7 is present on your system Phaser can be built from a CCTBX installation <br />
with the following steps from a Bash shell or a Windows command prompt:<br />
#Change directory to the modules/ folder within the CCTBX installation. Then do <pre>git clone git://git.uis.cam.ac.uk/cimr-phaser/phaser.git </pre><br />
#Change directory to the build/ folder within the CCTBX installation<br />
#Delete all files and folders except '''config_modules.sh''' or '''config_modules.cmd'''<br />
#Edit '''config_modules.sh''' or '''config_modules.cmd''' script to like:<br />
#:*'''Linux or MacOS''':<br />
#:<pre>#!/bin/sh &#10;python ../modules/cctbx_project/libtbx/configure.py phaser --enable_openmp_if_possible=True</pre><br />
#:*'''Windows using Microsoft VC++ 9.0'''<br />
#:<pre>python ..\modules\cctbx_project\libtbx\configure.py phaser --enable_openmp_if_possible=True</pre><br />
#:*'''Windows using MinGW-W64 5.3.0'''<br />
#:<pre>python ..\modules\cctbx_project\libtbx\configure.py phaser --enable_openmp_if_possible=True --compiler=mingw --static_exe</pre><br />
#Execute the '''config_modules.sh''' or '''config_modules.cmd''' script.<br />
#On Linux or MacOS source the file '''setpaths.sh''', on Windows execute the file '''setpaths.bat'''<br />
#On Linux or MacOS do '''libtbx.scons -j nproc exe/phaser''', on Windows do '''libtbx.scons -j nproc exe\phaser.exe'''. Here nproc is the number of available CPUs to do the compilation. This will produce the Phaser executable within the build/exe directory. If Phaser python modules are also desired then omit the '''exe/phaser''' or '''exe\phaser.exe''' argument (does not apply to a MinGW-W64 build unless the installed version of your CCTBX was built for MinGW-W64).<br />
<br />
====Bootstrap build of Phaser====<br />
A simpler but slower way to build Phaser is to run a "bootstrap build". Download the file bootstrap.py (as detailed on https://github.com/cctbx/cctbx_project#installation ) to where you want to build phaser. Assuming that python 2.7 and the compiler is available from the PATH environment variable now run the command: <pre>python bootstrap.py --builder=phaser --nproc=8</pre> from a command prompt or bash shell. This will build a stripped down version of CCTBX in addition to Phaser and its python modules. The Phaser executable is located in the directory build/exe.<br />
<br />
<br />
The steps to build Phaser change from time to time as the developments of required components like CCTBX are moving targets. The steps outlined here may therefore differ from the actual ones at short or no notice.<br />
<br />
===Using Phaser from Python===<br />
Having compiled Phaser as above Phaser can now be accessed from the python interpreter that is part of CCTBX. To invoke it on Linux or MacOS it is necessary first to execute <pre>source build/setpaths.sh</pre> or on Windows <pre>build\setpaths.bat.</pre> From then on you can invoke python with the command <pre>cctbx.python</pre> and run scripts such as [[Python_Example_Scripts]]<br />
<br />
===Nightly builds Regression Tests ===<br />
Regression tests of nightly builds are present on http://www-structmed.cimr.cam.ac.uk/Local/PhaserNightly/contents.html</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Source_Code&diff=2476Source Code2018-09-26T23:33:56Z<p>Rdo20: </p>
<hr />
<div>===Repository===<br />
<br />
A public [https://git.csx.cam.ac.uk/x/cimr-phaser/phaser.git/summary Phaser git repository] is available for '''git clone''' and '''git pull''' only. This mirrors commits to the Phaser SVN respository in real time<br />
<br />
The [http://www-structmed.cimr.cam.ac.uk/svn-cgi-bin/viewvc.cgi/ Phaser SVN repository] is located in Cambridge on the CIMR server (password restricted)<br />
<br />
The Berkeley mirror at cci.lbl.gov is updated at midnight Berkeley time<br />
<br />
:/net/cci/auto_build/repositories/phaser<br />
<br />
===Access===<br />
<br />
*You can download nightly builds of Phenix (binaries), which contain the latest version of Phaser that has passed regression tests<br />
*You can compile code with real-time updates from the git repository. This code may not pass regression tests. The git repository is best used for obtaining instant bugfixes, after communication with one of the Phaser developers<br />
*If you are developing a pipeline using Phaser, we are keen to work with you to add features, fix bugs and help you use Phaser optimally<br />
*Note the University of Cambridge's [[ Licences | Licences for Phaser]] with regards to making Phaser part of a pipeline available online<br />
*Source code modifications are allowed under the University of Cambridge's [[ Licences | Licences for Phaser]], provided they are for internal use only. Distribution would require those changes to be incorporated into our SVN repository. <br />
<br />
===Full Access===<br />
<br />
*Requests for permission to commit to the SVN repository via SSH should emailed to [mailto:cimr-phaser@lists.cam.ac.uk phaser-help]<br />
<br />
<br />
===Building Phaser from source===<br />
Phaser can be built as an executable file for the platforms Linux, MacOS, Windows (using VC++ 9.0) and Windows (using g++ in MinGW-W64). It can also be built as python modules useful for python scripting. There are two ways to achieve this. <br />
<br />
<br />
The quick way is to start from an existing installation of CCTBX (available from http://cci.lbl.gov/cctbx_build/) to allow building the Phaser executable. Install CCTBX for the desired platform on your system first. For MinGW-W64 use the Windows build of CCTBX.<br />
Assuming python2.7 is present on your system Phaser can be built from a CCTBX installation <br />
with the following steps from a Bash shell or a Windows command prompt:<br />
#Change directory to the modules/ folder within the CCTBX installation. Then do <pre>git clone git://git.uis.cam.ac.uk/cimr-phaser/phaser.git </pre><br />
#Change directory to the build/ folder within the CCTBX installation<br />
#Delete all files and folders except '''config_modules.sh''' or '''config_modules.cmd'''<br />
#Edit '''config_modules.sh''' or '''config_modules.cmd''' script to like:<br />
#:*'''Linux or MacOS''':<br />
#:<pre>#!/bin/sh &#10;python ../modules/cctbx_project/libtbx/configure.py phaser --enable_openmp_if_possible=True</pre><br />
#:*'''Windows using Microsoft VC++ 9.0'''<br />
#:<pre>python ..\modules\cctbx_project\libtbx\configure.py phaser --enable_openmp_if_possible=True</pre><br />
#:*'''Windows using MinGW-W64 5.3.0'''<br />
#:<pre>python ..\modules\cctbx_project\libtbx\configure.py phaser --enable_openmp_if_possible=True --compiler=mingw --static_exe</pre><br />
#Execute the '''config_modules.sh''' or '''config_modules.cmd''' script.<br />
#On Linux or MacOS source the file '''setpaths.sh''', on Windows execute the file '''setpaths.bat'''<br />
#On Linux or MacOS do '''libtbx.scons -j nproc exe/phaser''', on Windows do '''libtbx.scons -j nproc exe\phaser.exe'''. Here nproc is the number of available CPUs to do the compilation. This will produce the Phaser executable within the build/exe directory. If Phaser python modules are also desired then omit the '''exe/phaser''' or '''exe\phaser.exe''' argument (does not apply to a MinGW-W64 build unless the installed version of your CCTBX was built for MinGW-W64).<br />
<br />
<br />
A simpler but slower way to build Phaser is to run a "bootstrap build". Download the file bootstrap.py (as detailed on https://github.com/cctbx/cctbx_project#installation ) to where you want to build phaser. Assuming that python 2.7 and the compiler is available from the PATH environment variable now run the command: <pre>python bootstrap.py --builder=phaser --nproc=8</pre> from a command prompt or bash shell. This will build a stripped down version of CCTBX in addition to Phaser and its python modules. The Phaser executable is located in the directory build/exe.<br />
<br />
<br />
The steps to build Phaser change from time to time as the developments of required components like CCTBX are moving targets. The steps outlined here may therefore differ from the actual ones at short or no notice.<br />
<br />
===Using Phaser from Python===<br />
Having compiled Phaser as above Phaser can now be accessed from the python interpreter that is part of CCTBX. To invoke it on Linux or MacOS it is necessary first to execute <pre>source build/setpaths.sh</pre> or on Windows <pre>build\setpaths.bat.</pre> From then on you can invoke python with the command <pre>cctbx.python</pre> and run scripts such as [[Python_Example_Scripts]]<br />
<br />
===Nightly builds Regression Tests ===<br />
Regression tests of nightly builds are present on http://www-structmed.cimr.cam.ac.uk/Local/PhaserNightly/contents.html</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Molecular_Replacement&diff=2475Molecular Replacement2018-07-04T10:35:04Z<p>Rdo20: /* Automated Molecular Replacement */</p>
<hr />
<div><div style="margin-left: 25px; float: right;">__TOC__</div><br />
<br />
'''Quicklink to example scripts''' -> [[MR using keyword input]]<br />
<br />
'''Quicklink to phaser.famos (find_alt_orig_sym_mate) documentation''' -> [[Famos]]<br />
<br />
Phaser should be able to solve most structures with the Automated Molecular Replacement mode, and this is the first mode that you should try. Give Phaser your data ([[#How to Define Data|How to Define Data]]) and your models ([[#How to Define Models|How to Define Models]]), tell Phaser what to search for, and a list of possible spacegroups (in the same point group).<br />
<br />
If this doesn't work (see [[#Has Phaser Solved It?| Has Phaser Solved It?]]), you can try selecting peaks of lower significance in the rotation function in case the real orientation was not within the selection criteria. By default peaks above 75% of the top peak are selected (see [[#How to Select Peaks| How to Select Peaks]]). See [[#What to do in Difficult Cases| What to do in Difficult Cases]] for more hints and tips. If the automated molecular replacement mode doesn't work even with non-default input you need to run the modes of Phaser separately. The possibilities are endless - you can even try exhaustive searches (translations of all orientations) if you want - but experience has shown that most structures that can be solved by Phaser can be solved by relatively simple strategies.<br />
<br />
==Automated Molecular Replacement==<br />
Automated Molecular Replacement combines the anisotropy correction, likelihood enhanced fast rotation function, likelihood enhanced fast translation function, packing and refinement modes for multiple search models and a set of possible spacegroups to automatically solve a structure by molecular replacement. Top solutions are output to the files FILEROOT.sol, FILEROOT.#.mtz and FILEROOT.#.pdb (where "#" refers to the sorted solution number, 1 being the best, and only 1 is output by default). Many structures can be solved by running an automated molecular replacement search with defaults, giving the ensembles that you expect to be easiest to find first.<br />
<br />
At the completion of Molecular Replacement you may wish to place your solutions on a common origin with a previous solution, for which [[Famos | Famos ]] can be used.<br />
<br />
[[Image:Phaser_MR_auto2.png|Flow Diagram for Automated MR|750px]]<br />
<br />
==Should Phaser Solve It?==<br />
The difficulty of a molecular replacement problem depends primarily on two major factors: how well the model will be able to explain the diffraction data (which depends both on the accuracy of the model and on its completeness), and how many reflections can be explained, at least in part. Each reflection provides a piece of information that helps to identify correct MR solutions.<br />
<br />
It is possible to make a reasonable prediction of whether or not a solution will be found. If the quality of the model (its accuracy and completeness) can be estimated, then the expected contribution of each reflection to the total LLG can also be estimated. From a large battery of tests, we know that an LLG of 40 or greater usually indicates a correct solution (at least in the absence of complicating factors such as translational non-crystallographic symmetry, tNCS). Building on this understanding, if it is estimated that the LLG will be 60 or less, then Phaser will assume that the problem is a difficult one, and will implement search procedures optimised for difficult problems.<br />
<br />
==What Resolution of Data Should be Used?==<br />
The signal for a molecular replacement solution should be very clear if the expected value of the LLG is much higher than the minimum required to be fairly certain of a solution. Currently Phaser aims for a minimum LLG of 120 and, if it is possible to achieve an even higher value, given the quality of the model and the quantity of diffraction data, then the resolution for the initial search is limited to the value required to achieve an expected LLG of 120. Data to the full resolution are still used for a final rigid-body refinement, or in a second pass if a clear solution is not found in the first attempt.<br />
<br />
However, if the model is expected to have a large RMS error (based usually on the correlation between sequence identity and RMS error), then data to high resolution will not contribute any significant signal. Regardless of the expected LLG at the highest resolution limit, the resolution used is limited to 1.8 times the estimated RMS error of the model, because this resolution limit gives about 99% of the LLG that could be achieved.<br />
<br />
Because Phaser implements strategies designed to solve structures with as much confidence as possible, as efficiently as possible, it is best to leave the choice of resolution to Phaser, at least in the first instance.<br />
<br />
==Has Phaser Solved It?==<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! TF Z-score !! Have I solved it?<br />
|-<br />
| less than 5 || no<br />
|-<br />
| 5 - 6 || unlikely<br />
|-<br />
| 6 - 7 || possibly<br />
|-<br />
| 7 - 8 || probably<br />
|-<br />
| more than 8* ||definitely<br />
|-<br />
| *''6 for 1st model in monoclinic space groups'' || <br />
|} <br />
<br />
Ideally, a unique solution with a strong signal will be found at the end of the search. If you are searching for multiple components, then ideally the search for each component will also give a strong signal. However if the signal-to-noise of your search is low, there will be noise peaks and multiple ambiguous solutions. Signal-to-noise is judged using the '''Z-score''', which is computed by comparing the LLG values from the rotation or translation search with LLG values for a set of random rotations or translations. The mean and the RMS deviation from the mean are computed from the random set, then the Z-score for a search peak is defined as its LLG minus the mean, all divided by the RMS deviation, ''i.e. '' '''the number of standard deviations above (or below) the mean. '''<br />
<br />
For a rotation function, the correct orientation may be well down the list with a Z-score (number of standard deviations above the mean value, or RFZ) under 4, and it is often not possible to identify the correct orientation until a translation function is performed and yields a clear solution. Note that the signal-to-noise of the rotation function drops with increasing number of primitive symmetry operations (the number of different orientations for symmetry-related molecules), because there is more uncertainty about how the structure factor contributions from symmetry-related copies will add up.<br />
<br />
For a translation function the correct solution will generally have a Z-score (TFZ) over 5 and be well separated from the rest of the solutions. Of course, there will always be exceptions! The table gives a very rough guide to interpreting TFZ scores. This table will be updated, as we learn more from systematic molecular replacement trials.<br />
<br />
When you are searching for multiple components, the signal may be low for the first few components but, as the model becomes more complete, the signal should become stronger. Finding a clear solution for a new component is a good sign that the partial solution to which that component was added was indeed correct.<br />
<br />
You should always at least glance through the summary of the logfile. One thing to look for, in particular, is whether any translation solutions with a high Z-score have been rejected by the packing step. By default up to 5 percent of marker atoms (C-alpha atoms for protein) are allowed to be involved in clashes. A solution with more clashes may still be correct, and the clashes may arise only because of differences in small surface loops. If this happens, repeat the run allowing a suitable number of clashes. Note that, unless there is specific evidence in the logfile that a high TFZ-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond the default will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
<br />
Note that, by default, Phaser will produce a single PDB file corresponding to the top solution found (if any), so finding a single PDB file in your output directory is not an indication that the search succeeded! You have to look, at least, at the summary of the logfile, or at the list of possible solutions in the .sol file that is produced if you run Phaser from ccp4i or command-line scripts.<br />
<br />
==Annotation==<br />
<br />
A highly compact summary of the history of the statistics of a solution is given in the SOLUTION SET in the .sol file. This is a good place to start your analysis of the output. The annotation gives the Z-score of the solution at each rotation and translation function, the number of clashes in the packing, and the refined LLG.<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! Annotation !! Meaning<br />
|-<br />
| RFZ= || Rotation Function Z-score<br />
|-<br />
| TFZ= || Translation Function Z-score<br />
|-<br />
| PAK= || Number of packing clashes<br />
|-<br />
| LLG= || LLG after refinement. Will be repeated when a low resolution refinement is followed by a high resolution refinement.<br />
|-<br />
| TFZ== || Translation Function Z-score equivalent, only calculated for the top solution after refinement (or for the number of top files specified by TOPFILES)<br />
|-<br />
| RF++ || Rotation angle from previous strong solution has been used in the addition of next solution<br />
|-<br />
| RF*0 || Rotation angle 000 identified by low R-factor of input model<br />
|-<br />
| TFZ=* || First molecule in P1 (arbitrary origin, no Translation Function required)<br />
|-<br />
| TF*0 || Translation vector 000 identified by low R-factor of input model<br />
|-<br />
| (&&nbsp;... & ...) || Set of TFZ PAK and LLG values for placements that were amalgamated (more than one placement from a single Translation Function)<br />
|-<br />
| LLG+=(...&nbsp;&&nbsp;...)&nbsp;|| Set of LLG values calculated during amalgamation, which will always be increasing in value<br />
|-<br />
| +TNCS || Components added by Translational NCS relation<br />
|-<br />
| *T=<i>n</i> || Solution matches template solution <i>n</i><br />
|} <br />
<br />
Two versions of TFZ (the translation function Z-score) now appear for each component. The first ("TFZ=") is the Z-score from the actual translation search, which depends on the accuracy of the orientation used for that search. The second ("TFZ==") is the TFZ-equivalent, which indicates what the TFZ score would have been with the correct (refined) orientation. You should see the TFZ-equivalent is high at least for the final components of the solution, and that the LLG (log-likelihood gain) increases as each component of the solution is added. For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU SET RFZ=10.7 TFZ=24.3 PAK=0 LLG=472 TFZ==24.7 RFZ=6.4 TFZ=24.4 PAK=0 LLG=1006 TFZ==29.7 LLG=1006 TFZ==29.7<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
Note that the Euler angles in Phaser follow the same convention as those defined for the Crowther fast rotation function, i.e. z-y-z (rotate around the z-axis, followed by the new y-axis, followed by the new z-axis).<br />
<br />
==History==<br />
<br />
A highly compact summary of the history of the peak positions of a solution is given in the SOLUTION HISTORY in the .sol file. Together with the SOLUTION SET annotation, this is useful in your analysis of the output. <br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! History !! Meaning<br />
|-<br />
| RF/TF(r/t:n) || (r) Rotation Function peak number/(t) Translation Function peak number for the rotation function : (n) number of peak in final merged and sorted list<br />
|-<br />
| PAK(n:m) || (n) input solution number : (m) output solution number after packing condition applied<br />
|-<br />
| RNP(m,a,b,c,... : p) || All input peaks amalgamated after refinement to give output solution number (m and others): (p) output solution number<br />
|-<br />
| FUSE(A,B,C) || Solution numbers merged in amalgamation<br />
|} <br />
<br />
For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU HISTORY RF/TF(1/1:1)PAK(1:1)RNP(1:1)RNP(1:1)<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
A more complicated structure solution may have<br />
<br />
SOLU HISTORY RF/TF(7/1:10)PAK(10:10)RNP(10,12,13,11,17,16,18,25,3,8,22,21,20,7,969,6,5,201,9,4,390,2,1,19:1)RNP(1:1)<br />
<br />
==What to do in Difficult Cases==<br />
<br />
Not every structure can be solved by molecular replacement, but the right strategy can push the limits. What to do when the default jobs fail depends on why your structure is difficult.<br />
*'''Flexible Structure'''<br />
*:The relative orientations of the domains may be different in your crystal than in the model. If that may be the case, break the model into separate PDB files containing rigid-body units, enter these as separate ensembles, and search for them separately. If you find a convincing solution for one domain, but fail to find a solution for the next domain, you can take advantage of the knowledge that its orientation is likely to be similar to that of the first domain. The ROTAte&nbsp;AROUnd option of the brute rotation search can be used to restrict the search to orientations within, say, 30 degrees of that of the known domain. Allow for close approach of the domains by increasing the allowed clashes with the PACK keyword by, say, 1 for each domain break that you introduce. Note that it is possible to use the brute rotation search as part of the automated molecular replacement pipeline, by changing the choice of the type of rotation search. Alternatively, you could try generating a series of models perturbed by normal modes, with the NMAPdb keyword. One of these may duplicate the hinge motion and provide a good single model.<br />
*'''Poor or Incomplete Model'''<br />
*:Signal-to-noise is reduced by coordinate errors or incompleteness of the model. Since the rotation search has lower signal to begin with than the translation search, it is usually more severely affected. For this reason, it can be very useful to use the subsequent translation search as a way to choose among many (say 1000) orientations. THe MR_AUTO FAST search mode automatically reduces the cutoff for accepting peaks from the fast rotation function if the decault pass does not find a solution with a high z-score, but you can manually reduce this further with the PEAKS and PURGE keywords. You can also try turning off the clustering of fast rotation function peaks because the correct orientation may sit on the shoulder of a peak in the rotation function. <br />
*:As shown convincingly by Schwarzenbacher ''et al.'' (Schwarzenbacher, Godzik, Grzechnik &amp; Jaroszewski, ''Acta Cryst.'' D'''60''', 1229-1236, 2004), judicious editing can make a significant difference in the quality of a distant model. In a number of tests with their data on models below 30% sequence identity, we have found that Phaser works best with a "mixed model" (non-identical sidechains longer than Ser replaced by Ser). In agreement with their results, the best models are generally derived using more sophisticated alignment protocols, such as their FFAS protocol. Use [http://www.phenix-online.org/documentation/sculptor.htm phenix.sculptor] to edit your model.<br />
*'''High Degree of Non-crystallographic Symmetry'''<br />
*:If there are clear peaks in the self-rotation function, you can expect orientations to be related by this known NCS. Methods to automatically use such information will be implemented in a future version of Phaser. In the meantime, you can work out for yourself the orientations that would be consistent with NCS and use the ROTAte&nbsp;AROUnd option to sample similar orientations. Alternatively, you may have an oligomeric model and expect similar NCS in the crystal. First search with the oligomeric model; if this fails, search with a monomer. If that succeeds, you can again use the ROTAte&nbsp;AROUnd option to force a subsequent monomer to adopt an orientation similar to the one you expect.<br />
*'''What <u>not</u> to do'''<br />
*:The automated mode of Phaser is fast when Phaser finds a high Z-score solution to your problem. When Phaser cannot find a solution with a significant Z-score, it "thrashes", meaning it maintains a list of 100-1000's of low Z-score potential solutions and tries to improve them. This can lead to exceptionally long Phaser runs (over a week of CPU time). Such runs are possible because the highly automated script allows many consecutive MR jobs to be run without you having to manually set 100-1000's of jobs running and keep track of the results. "Thrashing" generally does not produce a solution: solutions generally appear relatively quickly or not at all. It is more useful to go back and analyse your models and your data to see where improvements can be made. Your system manager will appreciate you terminating these jobs.<br />
*:It is also not a good idea to effectively remove the packing test. Unless there is specific evidence in the logfile that a high TF-function Z-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond a few (e.g. 1-5) will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
*'''Other suggestions'''<br />
*:Phaser has powerful input, output and scripting facilities that allow a large number of possibilities for altering default behaviour and forcing Phaser to do what you think it should. However, you will need to read the information in the manual below to take advantage of these facilities!<br />
<br />
==How to Define Data==<br />
You need to tell Phaser the name of the mtz file containing your data and the columns in the mtz file to be used using the HKLIn and LABIn keywords. Additional keywords (BINS CELL OUTLier RESOlution SPACegroup) define how the data are used.<br />
<br />
==How to Define Models==<br />
Phaser must be given the models that it will use for molecular replacement. A model in Phaser is referred to as an "ensemble", even when it is described by a single file. This is because it is possible to provide a set of aligned structures as an ensemble, from which a statistically-weighted averaged model is calculated. A molecular replacement model is provided either as one or more aligned pdb files, or as an electron density map, entered as structure factors in an mtz file. Each ensemble is treated as a separate type of rigid body to be placed in the molecular replacement solution. An ensemble should only be defined once, even if there are several copies of the molecule in the asymmetric unit.<br />
<br />
Fundamental to the way in which Phaser uses MR models (either from coordinates or maps) is to estimate how the accuracy of the model falls off as a function of resolution, represented by the Sigma(A) curve. To generate the Sigma(A) curve, Phaser needs to know the RMS coordinate error expected for the model and the fraction of the scattering power in the asymmetric unit that this model contributes.<br />
<br />
A Babinet-style correction is used to account for the effects of disordered solvent on the completeness of the model at low resolution.<br />
<br />
Molecular replacement models are defined with the ENSEmble keyword and the COMPosition keyword. The ENSEmble keyword gives (amongst other things) the RMS deviation for the Sigma(A) curve. The COMPosition keyword is used to deduce the fraction of the scattering power in the asymmetric unit that each ensemble contributes. The composition of the asymmetric unit is defined either by entering the molecular weights or sequences of the components in the asymmetric unit, and giving the number of copies of each. Expert users can also enter the fraction of the scattering of each component directly, although the composition must still be entered for the absolute scale calculation. Please note that the composition supplied to Phaser has to include everything in the asymmetric unit, not just what is being looked for in the current search!<br />
<br />
===Building an Ensemble from Coordinates===<br />
The RMS deviation is determined directly from RMS or indirectly from IDENtity in the ENSEmble<br />
keyword using a formula that depends on the sequence identity and the number of residues in the model.<br />
<br />
The RMS deviation estimated from ID may be an underestimate of the true value if there is a slight conformational change between the model and target structures. To find a solution in these cases it may be necessary to increase the RMS from the default value generated from the ID, by say 0.5 Angstroms. On the other hand, when Phaser succeeds in solving a structure from a model with sequence identity much below 30%, it is often found that the fold is preserved better than the average for that level of sequence identity. So it may be worth submitting a run in which the RMS error is set at, say, 1.5, even if the sequence identity is low. The table below can be used as a guide as to the default RMS value corresponding to ID.<br />
<br />
If you construct a model by homology modelling, remember that the RMS error you expect is essentially the error you expect from the template structure (if not worse!). So specify the sequence identity of the template, not of the homology model.<br />
<br />
Only the model with the highest sequence identity is reported in the output pdb file. Also, HETATM cards in the input pdb file are ignored in the calculation of the structure factors for the ensemble, but are carried through to the output pdb file. Thus, the phases on the output mtz file (which come from the structure factors of the ensemble) do not correspond to those that would be calculated from the output pdb file, when there is more than one pdb file in an ensemble and/or the pdbfile(s) have HETATM records.<br />
<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|+ '''Initial estimate of RMS deviation in Angstrom: Number of residues in model (upper row) versus sequence identity (left column)'''<br />
|-<br />
! !! #50 !! #100 !! #200 !! #300 !! #400 !! #600 !! #850 !! #1000 !! #1500 !! #2000<br />
|-<br />
|'''ID=0%''' || 1.579 || 1.689 || 1.875 || 2.030 || 2.164 || 2.391 || 2.625 || 2.748 || 3.093 || 3.375<br />
|-<br />
|'''ID=10%''' || 1.356 || 1.451 || 1.610 || 1.743 || 1.858 || 2.053 || 2.255 || 2.360 || 2.657 || 2.899<br />
|-<br />
|'''ID=20%''' || 1.165 || 1.246 || 1.383 || 1.497 || 1.596 || 1.764 || 1.936 || 2.027 || 2.281 || 2.489<br />
|-<br />
|'''ID=30%''' || 1.000 || 1.070 || 1.188 || 1.286 || 1.371 || 1.515 || 1.663 || 1.741 || 1.959 || 2.138<br />
|-<br />
|'''ID=40%''' || 0.859 || 0.919 || 1.020 || 1.104 || 1.177 || 1.301 || 1.428 || 1.495 || 1.683 || 1.836<br />
|-<br />
|'''ID=50%''' || 0.738 || 0.789 || 0.876 || 0.948 || 1.011 || 1.117 || 1.227 || 1.284 || 1.445 || 1.577<br />
|-<br />
|'''ID=60%''' || 0.634 || 0.678 || 0.752 || 0.814 || 0.868 || 0.959 || 1.053 || 1.103 || 1.241 || 1.354<br />
|-<br />
|'''ID=70%''' || 0.544 || 0.582 || 0.646 || 0.699 || 0.746 || 0.824 || 0.905 || 0.947 || 1.066 || 1.163<br />
|-<br />
|'''ID=80%''' || 0.467 || 0.500 || 0.555 || 0.601 || 0.640 || 0.708 || 0.777 || 0.813 || 0.915 || 0.999<br />
|-<br />
|'''ID=90%''' || 0.401 || 0.429 || 0.477 || 0.516 || 0.550 || 0.608 || 0.667 || 0.698 || 0.786 || 0.858<br />
|-<br />
|'''ID=100%''' || 0.345 || 0.369 || 0.409 || 0.443 || 0.472 || 0.522 || 0.573 || 0.600 || 0.675 || 0.737<br />
|}<br />
<br />
<br />
====Coordinate Editing====<br />
=====HETATM/LIGANDS=====<br />
Phaser ignores the scattering from HETATM records. The HETATM records are carried though to output with occupancy set to zero. Ligands will therefore not contribute to the scattering used for molecular replacement. The exceptions to this rule are the HETATM records for MSE (seleno-methionine) MSO (seleno-methionine selenoxide) CSE (seleno-cysteine) CSO (seleno-cysteine selenoxide) ALY (acetyllysine) MLY (n-dimethyl-lysine) and MLZ (n-methyl-lysine) which are used in the scattering and carried through to output with their original occupancy. If you wish to include any HETATM records in the scattering the record name use the keyword ENSE modlid HETATOM ON<br />
<br />
=====WATER=====<br />
Water molecules (identified by the residue name OW WAT HOH H2O OH2 MOH WTR or TIP) are deleted from the pdb file on input, are not used in the scattering and are not carried through to file output. If you want to retain water molecules you will need to change the residue name to something other than this (e.g. WWW) so that the atoms are not identified as water. To include the water molecules in the scattering, the HETATM records will also have to be changed to ATOM records as described above.<br />
<br />
===Building an Ensemble from Electron Density===<br />
When using density as a model, it is necessary to specify both the extent (x,y,z limits) of the cut-out region of density, and the centre of this region. With coordinates, Phaser can work this out by itself. This information is needed, for instance, to decide how large rotational steps can be in the rotation search and to carry out the molecular transform interpolation correctly. In the case of electron density, the RMS value does not have the same physical meaning that it has when the model is specified by atomic coordinates, but it is used to judge how the accuracy of the calculated structure factors drops off with resolution. A suitable value for RMS can be obtained, in the case of density from an experimentally-phased map, by choosing a value that makes the SigmaA curve fall off with resolution similarly to the mean figures-of-merit. In the case of density from an EM image reconstruction, the RMS value should make the SigmaA curve fall off similarly to a Fourier correlation curve used to judge the resolution of the EM image.<br />
<br />
For detailed information, including a tutorial with example scripts, see<br />
[[Using Electron Density as a Model| Using density as a model]]<br />
<br />
==How to Define Composition==<br />
The composition defines the total amount of protein and nucleic acid that you have in the asymmetric unit. It is very important to include everything in the composition, not just the components that you are searching for, because Phaser needs to know what fraction of the total scattering is accounted for by each model. For the options that specify the size of a particular component (sequence, number of residues, molecular weight), you can separately define several components of the composition of the asymmetric unit and Phaser will just add them up. Note that, for these options, you can specify the composition of one copy of a component and also say how many copies of that component are expected to be present. You can also mix compositions entered by sequence, number of residues and molecular weight. When the composition is checked, Phaser will check for the plausibility of the composition you have specified, as well as multiples of that composition.<br />
<br />
===Default Composition===<br />
For convenience, the composition defaults to 50% protein scattering by volume (the average for protein crystals). It is better to enter it explicitly, even if only to check that you have correctly deduced the probable content of your crystal. If your crystal has higher or lower solvent content than this, or contains nucleic acid, then the composition should be entered explicitly.<br />
===Composition by Sequence===<br />
The composition is calculated from the amino acid sequence of the protein and the base sequence of the nucleic acid in fasta format.<br />
===Composition by Atom===<br />
Individual atoms can be added to the composition. This allows the explicit addition of heavy atoms in the structure, e.g. Fe atoms.<br />
===Composition by Solvent Content===<br />
Scattering is determined from the solvent content of the crystal, assuming that the crystal contains protein only, and the average distribution of amino acids in protein. If your crystal contains nucleic acid or your protein has an unusual amino acid distribution then the composition should be entered explicitly using the MW or sequence options.<br />
===Composition by Number of Residues in ASU===<br />
Scattering is determined from the number of residues in the asymmetric unit, assuming that the crystal contains protein only or nucleic acid only, and assuming an average distribution of residues for either. If your crystal contains a mixture then the composition should be entered explicitly using the MW or sequence options. If your crystal has an unusual residue distribution then the composition should be entered explicitly using the sequence options.<br />
===Composition by Molecular Weight===<br />
The composition is calculated from the molecular weight of the protein and nucleic acid assuming the protein and nucleic acid have the average distribution of amino acids and bases. If your protein or nucleic acid has an unusual amino acid or base distribution the composition should be entered by sequence. You can mix compositions entered by molecular weight with those entered by sequence.<br />
===Composition by Percentage Scattering===<br />
The fraction scattering of each ensemble can be entered directly. The fraction scattering of each ensemble is normally automatically worked out from the average scattering from each ensemble (calculated from the pdb files if entered as coordinates, or from the protein and nucleic acid molecular weights if entered as a map) divided by the total scattering given by the composition, but entering the fraction scattering directly overrides this calculation. This option is for use when the pdb files of the models in the ensemble are unusual e.g. consist only of C-alpha atoms, or only of hydrogen atoms (as in the CLOUDS method for NMR).<br />
<br />
==How to Define Searches==<br />
Phaser does not compare sequences you specify in the composition with the models you specify as ensembles, so you have to specify separately the number of copies of a particular sequence that you expect to be found in the asymmetric unit of your crystal and the number of copies of each ensemble you want to place in the asymmetric unit. By default, Phaser will search first for ensembles expected to yield the highest signal in the MR search (as judged by the expected LLG or eLLG calculation); if that fails to result in a clear solution, different search orders will be tested automatically. For that reason, it does not normally matter which order you use to specify the searches. There is an option to override Phaser's automatic choice of search order, but this will only rarely be useful. It is best to specify the searches for everything that you hope to find in the MR calculation in one job, as that gives Phaser the greatest scope to optimise the calculation. Note that if your crystal possesses translational non-crystallographic symmetry (tNCS), you should be searching for a number of copies of each ensemble divisible by the order of the tNCS (i.e. the number of molecules that should be related by repeated application of a translation vector).<br />
<br />
==How to Define Solutions==<br />
Phaser writes out files ending in ".sol" and ".rlist" that contain the solution information from the job. The root of the files is given by the ROOT keyword. By default, the root filename is PHASER. These files can be read back into subsequent runs of Phaser to build up solutions containing more than one molecule in the asymmetric unit.<br />
<br />
"PHASER.sol" files are generated by all modes (rotation function modes with VERBOSE output), and contain the current idea of potential molecular replacement solutions.<br />
<br />
"PHASER.rlist" files are generated by the rotation function modes, and are used as input for performing translation functions.<br />
<br />
For simple MR cases you don't really need to know how to define molecular replacement solutions. However, for difficult cases you might need to edit the files "PHASER.sol" and "PHASER.rlist" files manually<br />
<br />
=== "sol" Files===<br />
SOLUtion 6DIM keywords describe Ensembles that have been oriented by a rotation search and positioned by a translation search. Each Ensemble in the asymmetric unit has its own SOLUtion keyword. When more than one (potential) molecular replacement solution is present, the solutions are separated with the SOLUTION SET keywords.<br />
<br />
==="rlist" Files===<br />
These files define a rotation function list. The peak list is given with a series of SOLUtion TRIAl keywords.<br />
<br />
If a partial solution is already known, then the information for the currently "known" parts of the asymmetric unit is given in the form used for the PHASER.sol file, followed by the list of trial orientations for which a translation function is to be performed.<br />
<br />
===Fixed partial structure===<br />
If you have the coordinates of a partial solution with the pdb coordinates of the known structure in the correct orientation and position, then you can force Phaser to use these coordinates. Use the SOLUTION keyword to fix a rotation of 0 0 0 and a position of 0 0 0 for these coordinates.<br />
<br />
==How to Select Peaks==<br />
<br />
<br />
<br />
The selection of peaks saved for output in the rotation and translation functions can be done in four different ways.<br />
*'''Select by Percentage'''<br />
*: Percentage of the top peak, where the value of the top peak is defined as 100% and the value of the mean is defined as 0%.<br />
*: Default, cutoff=75%. This criteria has the advantange that at least one peak (the top peak) always survives the selection. If the top solution is clear, then only the one solution will be output, but if the distribution of peaks is rather flat, then many peaks will be output for testing in the next part of the MR procedure (e.g. many peaks selected from the rotation function for testing with a translation function). <br />
*'''Select by Z-score'''<br />
*: Number of standard deviations (sigmas) over the mean (the Z-score). <br />
*: Absolute significance test. Not all searches will produce output if the cutoff value is too high (e.g. 5 sigma). <br />
*'''Select by Number'''<br />
*: Number of top peaks to select. <br />
*: If the distribution is very flat then it might be better to select a fixed large number (e.g. 1000) of top rotation peaks for testing in the translation function.<br />
*'''No selection'''<br />
*: All peaks are selected. <br />
*: Enables full 6 dimensional searches, where all the solutions from the rotation function are output for testing in the translation function. This should never be necessary; it would be much faster and probably just as likely to work if the top 1000 peaks were used in this way.<br />
<br />
[[Image:Phaser_selection.gif| Selection criteria]]<br />
<br />
Peaks can also be clustered or not clustered prior to selection in steps 1 and 2.<br />
*'''Clustering Off'''<br />
: All high peaks on the search grid are selected<br />
*'''Clustering On'''<br />
: Points on the search grid with higher neighbouring points are removed from the selection<br />
<br />
<br />
[[Image:Phaser_clustering.gif| Clustering]]<br />
<br />
==How to Control Output==<br />
The output of Phaser can be controlled with optional keywords. <br />
<br />
The ROOT keyword is not compulsory (the default root filename is "PHASER"), but should always be given, so that your jobs have separate and meaningful output filenames.<br />
<br />
The TOPFiles keyword controls the number of potential MR solutions for which PDB and (in the appropriate modes) MTZ files are produced.<br />
<br />
For the MR_AUTO, MR_RNP and MR_LLG modes, unless HKLOut OFF is given as an optional keyword, Phaser produces an MTZ file with "SigmaA" type weighted Fourier map coefficients for producing electron density maps for rebuilding.<br />
<br />
{| class="wikitable" style="text-align:left" width=100%<br />
|-<br />
! MTZ Column Labels !! Description<br />
|-<br />
| FWT/PHWT || Amplitude and phase for 2''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| DELFWT/PHDELWT || Amplitude and phase for ''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| FOM || ''m'', analogous to the "Sim" weight, to estimate the reliability of &alpha;<sub>calc</sub><br />
|-<br />
| HLA/HLB/HLC/HLD || Hendrickson-Lattman coefficients encoding the phase probability distribution<br />
|}<br />
<br />
==Translational Non-crystallographic Symmetry==<br />
<br />
<span style="color:crimson">'''*Warning*''' Solution by MR in the presence of translational non-crystallographic symmetry is not fully automated.</span><br />
<br />
Phaser calculates correction factors for the expected intensities in the presence of translational non-crystallographic symmetry (tNCS), and is able to solve structures with complex patterns of tNCS. '''However, the use of Phaser in the presence of tNCS requires the nature of the tNCS to be understood by the user.''' In simple cases, solution is no more difficult than solution without tNCS, but in complex cases, separate Phaser runs with tNCS turned on and off, and/or the use of different tNCS vectors, may be necessary.<br />
<br />
The output of Phaser will help the user in detecting and understanding the tNCS, but '''the tNCS is not completely characterised by Phaser'''. The default behaviour may or may not be correct for the particular crystal under study.<br />
<br />
Characterization of the tNCS involves understanding the number of copies of the molecule in the asymmetric unit and the translation vectors between them. Molecules related by a tNCS vector will have an associated peak in the native Patterson. Phaser calculates the native Patterson (MODE TNCS) and lists the peaks that are more than 20% of the origin peak. Any given crystal with tNCS may have one or more peaks meeting this criteria.<br />
<br />
===Default tNCS detection and correction===<br />
<span style="color:crimson">Documentation for Phaser-2.7.16 and above</span><br />
<br />
====No tNCS====<br />
No tNCS correction is applied by default if there is<br />
# no peak in the native Patterson <br />
# more than one peak in the native Patterson over 20% of the origin and these peaks are not all the result of a commensurate modulation<br />
<br />
====Pairs of molecules====<br />
By default, if Phaser detects a peak in the native Patterson then Phaser will search for molecules in pairs related by the tNCS vector given by the peak in the native Patterson.<br />
<br />
This will be the correct behaviour if and only if there are an even number of copies of the molecule in the asymmetric unit, clustered into two groups related by a single tNCS vector. There will only be one significant peak in the native Patterson. Fortunately, this is a reasonably common scenario.<br />
<br />
Phaser refines the relative orientation of the molecules in the two groups (rotations of up to 10 degrees will still give rise to a significant native Patterson peak) and uses this information to generate expected intensity factors for the reflections. Solution should be straightforward, with the usual caveat for MR that there is a sufficiently good model.<br />
<br />
Where there is a single peak in the native Patterson, it is often located at a position half way along a unit cell axis or diagonal, representing a pseudo-halving of the unit cell dimensions. However, Phaser is by no means restricted to these sorts of pseudo-cells in its handling of two-fold tNCS, and the tNCS vector can be in a general position.<br />
<br />
===Non-default tNCS correction===<br />
====Higher order tNCS====<br />
Frequently, tNCS does not associate 2 clusters of molecules in the asymmetric unit, but rather there are 3 or more (n) clusters of molecules associated by a series of vectors that are multiples of 1, 2, 3 ... (n-1) times a basic translation vector. Where n times the basic translation vector equates to (very close to) integer multiples of unit cell axes, the tNCS represents a pseudo-cell, and this case is known as commensurate modulation. <br />
<br />
Phaser attempts to automatically detect commensurate modulation. The peaks of the native Patterson are analyzed to find the n-fold relationship. The series will not generally have all peaks the same height. Lower peaks in the series represent relationships where the relative rotations between related molecules are larger. Missing peaks in the series may be below the default 20% of origin cut-off. This can be lowered with TNCS PATT PERCENT <x><br />
<br />
Phaser then sets TNCS NMOL <n> and the vector for the tNCS, and searches for ensembles in multiples of NMOL.<br />
<br />
When there are more than two molecules related by tNCS, Phaser does not refine the orientations between the molecules related by the tNCS.<br />
<br />
However, as for two-fold tNCS, Phaser is not restricted to these sorts of pseudo-cells and the basic tNCS vector can be in a general position, as can the number of copies.<br />
<br />
'''The automatic detection may not give the true tNCS relationship'''. For example, the true commensurate modulation may be a factor of the NMOL automatically detected by Phaser, or there may not be commensurate modulation at all, or commensurate modulation may not be found with the default Pattesron peak height cutoff. In difficult cases, please inspect the Patterson for peaks.<br />
<br />
====Complex tNCS====<br />
If there are many molecules in the asymmetric unit but they are not all related by tNCS, or there are sub-groups of molecules related by different tNCS vectors, then the modulations of the expected intensities due to the tNCS will be much less significant than the cases described above. '''In these cases it is possible that structure solution will be achieved without any tNCS correction factors being applied.''' Indeed, searching for all the copies as tNCS-related multiples when some molecules are not related by tNCS will cause structure solution to fail. To turn off the automatic detection and use of tNCS use the keyword TNCS USE OFF.<br />
<br />
If turning off the TNCS correction factors fails to give a solution, then a good approach is to proceed step-wise. Consider the highest native Patterson peak first and determine that nature of the tNCS associated with it. Use the appropriate correction factors to locate all the molecules with this tNCS. Then take the second independent native Patterson peak and apply the correction factors associated with it to find the second set of molecules, fixing the first, etc. Finally, turn TNCS off to find any orphan molecules.</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Molecular_Replacement&diff=2474Molecular Replacement2018-07-04T10:34:44Z<p>Rdo20: /* Automated Molecular Replacement */</p>
<hr />
<div><div style="margin-left: 25px; float: right;">__TOC__</div><br />
<br />
'''Quicklink to example scripts''' -> [[MR using keyword input]]<br />
<br />
'''Quicklink to phaser.famos (find_alt_orig_sym_mate) documentation''' -> [[Famos]]<br />
<br />
Phaser should be able to solve most structures with the Automated Molecular Replacement mode, and this is the first mode that you should try. Give Phaser your data ([[#How to Define Data|How to Define Data]]) and your models ([[#How to Define Models|How to Define Models]]), tell Phaser what to search for, and a list of possible spacegroups (in the same point group).<br />
<br />
If this doesn't work (see [[#Has Phaser Solved It?| Has Phaser Solved It?]]), you can try selecting peaks of lower significance in the rotation function in case the real orientation was not within the selection criteria. By default peaks above 75% of the top peak are selected (see [[#How to Select Peaks| How to Select Peaks]]). See [[#What to do in Difficult Cases| What to do in Difficult Cases]] for more hints and tips. If the automated molecular replacement mode doesn't work even with non-default input you need to run the modes of Phaser separately. The possibilities are endless - you can even try exhaustive searches (translations of all orientations) if you want - but experience has shown that most structures that can be solved by Phaser can be solved by relatively simple strategies.<br />
<br />
==Automated Molecular Replacement==<br />
Automated Molecular Replacement combines the anisotropy correction, likelihood enhanced fast rotation function, likelihood enhanced fast translation function, packing and refinement modes for multiple search models and a set of possible spacegroups to automatically solve a structure by molecular replacement. Top solutions are output to the files FILEROOT.sol, FILEROOT.#.mtz and FILEROOT.#.pdb (where "#" refers to the sorted solution number, 1 being the best, and only 1 is output by default). Many structures can be solved by running an automated molecular replacement search with defaults, giving the ensembles that you expect to be easiest to find first.<br />
<br />
At the completion of Molecular Replacement you may wish to place your solutions on a common origin with a previous solution, for which [[Famos | Famos ]] can be used.<br />
<br />
[[Image:Phaser_MR_auto2.png|Flow Diagram for Automated MR|700px]]<br />
<br />
==Should Phaser Solve It?==<br />
The difficulty of a molecular replacement problem depends primarily on two major factors: how well the model will be able to explain the diffraction data (which depends both on the accuracy of the model and on its completeness), and how many reflections can be explained, at least in part. Each reflection provides a piece of information that helps to identify correct MR solutions.<br />
<br />
It is possible to make a reasonable prediction of whether or not a solution will be found. If the quality of the model (its accuracy and completeness) can be estimated, then the expected contribution of each reflection to the total LLG can also be estimated. From a large battery of tests, we know that an LLG of 40 or greater usually indicates a correct solution (at least in the absence of complicating factors such as translational non-crystallographic symmetry, tNCS). Building on this understanding, if it is estimated that the LLG will be 60 or less, then Phaser will assume that the problem is a difficult one, and will implement search procedures optimised for difficult problems.<br />
<br />
==What Resolution of Data Should be Used?==<br />
The signal for a molecular replacement solution should be very clear if the expected value of the LLG is much higher than the minimum required to be fairly certain of a solution. Currently Phaser aims for a minimum LLG of 120 and, if it is possible to achieve an even higher value, given the quality of the model and the quantity of diffraction data, then the resolution for the initial search is limited to the value required to achieve an expected LLG of 120. Data to the full resolution are still used for a final rigid-body refinement, or in a second pass if a clear solution is not found in the first attempt.<br />
<br />
However, if the model is expected to have a large RMS error (based usually on the correlation between sequence identity and RMS error), then data to high resolution will not contribute any significant signal. Regardless of the expected LLG at the highest resolution limit, the resolution used is limited to 1.8 times the estimated RMS error of the model, because this resolution limit gives about 99% of the LLG that could be achieved.<br />
<br />
Because Phaser implements strategies designed to solve structures with as much confidence as possible, as efficiently as possible, it is best to leave the choice of resolution to Phaser, at least in the first instance.<br />
<br />
==Has Phaser Solved It?==<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! TF Z-score !! Have I solved it?<br />
|-<br />
| less than 5 || no<br />
|-<br />
| 5 - 6 || unlikely<br />
|-<br />
| 6 - 7 || possibly<br />
|-<br />
| 7 - 8 || probably<br />
|-<br />
| more than 8* ||definitely<br />
|-<br />
| *''6 for 1st model in monoclinic space groups'' || <br />
|} <br />
<br />
Ideally, a unique solution with a strong signal will be found at the end of the search. If you are searching for multiple components, then ideally the search for each component will also give a strong signal. However if the signal-to-noise of your search is low, there will be noise peaks and multiple ambiguous solutions. Signal-to-noise is judged using the '''Z-score''', which is computed by comparing the LLG values from the rotation or translation search with LLG values for a set of random rotations or translations. The mean and the RMS deviation from the mean are computed from the random set, then the Z-score for a search peak is defined as its LLG minus the mean, all divided by the RMS deviation, ''i.e. '' '''the number of standard deviations above (or below) the mean. '''<br />
<br />
For a rotation function, the correct orientation may be well down the list with a Z-score (number of standard deviations above the mean value, or RFZ) under 4, and it is often not possible to identify the correct orientation until a translation function is performed and yields a clear solution. Note that the signal-to-noise of the rotation function drops with increasing number of primitive symmetry operations (the number of different orientations for symmetry-related molecules), because there is more uncertainty about how the structure factor contributions from symmetry-related copies will add up.<br />
<br />
For a translation function the correct solution will generally have a Z-score (TFZ) over 5 and be well separated from the rest of the solutions. Of course, there will always be exceptions! The table gives a very rough guide to interpreting TFZ scores. This table will be updated, as we learn more from systematic molecular replacement trials.<br />
<br />
When you are searching for multiple components, the signal may be low for the first few components but, as the model becomes more complete, the signal should become stronger. Finding a clear solution for a new component is a good sign that the partial solution to which that component was added was indeed correct.<br />
<br />
You should always at least glance through the summary of the logfile. One thing to look for, in particular, is whether any translation solutions with a high Z-score have been rejected by the packing step. By default up to 5 percent of marker atoms (C-alpha atoms for protein) are allowed to be involved in clashes. A solution with more clashes may still be correct, and the clashes may arise only because of differences in small surface loops. If this happens, repeat the run allowing a suitable number of clashes. Note that, unless there is specific evidence in the logfile that a high TFZ-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond the default will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
<br />
Note that, by default, Phaser will produce a single PDB file corresponding to the top solution found (if any), so finding a single PDB file in your output directory is not an indication that the search succeeded! You have to look, at least, at the summary of the logfile, or at the list of possible solutions in the .sol file that is produced if you run Phaser from ccp4i or command-line scripts.<br />
<br />
==Annotation==<br />
<br />
A highly compact summary of the history of the statistics of a solution is given in the SOLUTION SET in the .sol file. This is a good place to start your analysis of the output. The annotation gives the Z-score of the solution at each rotation and translation function, the number of clashes in the packing, and the refined LLG.<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! Annotation !! Meaning<br />
|-<br />
| RFZ= || Rotation Function Z-score<br />
|-<br />
| TFZ= || Translation Function Z-score<br />
|-<br />
| PAK= || Number of packing clashes<br />
|-<br />
| LLG= || LLG after refinement. Will be repeated when a low resolution refinement is followed by a high resolution refinement.<br />
|-<br />
| TFZ== || Translation Function Z-score equivalent, only calculated for the top solution after refinement (or for the number of top files specified by TOPFILES)<br />
|-<br />
| RF++ || Rotation angle from previous strong solution has been used in the addition of next solution<br />
|-<br />
| RF*0 || Rotation angle 000 identified by low R-factor of input model<br />
|-<br />
| TFZ=* || First molecule in P1 (arbitrary origin, no Translation Function required)<br />
|-<br />
| TF*0 || Translation vector 000 identified by low R-factor of input model<br />
|-<br />
| (&&nbsp;... & ...) || Set of TFZ PAK and LLG values for placements that were amalgamated (more than one placement from a single Translation Function)<br />
|-<br />
| LLG+=(...&nbsp;&&nbsp;...)&nbsp;|| Set of LLG values calculated during amalgamation, which will always be increasing in value<br />
|-<br />
| +TNCS || Components added by Translational NCS relation<br />
|-<br />
| *T=<i>n</i> || Solution matches template solution <i>n</i><br />
|} <br />
<br />
Two versions of TFZ (the translation function Z-score) now appear for each component. The first ("TFZ=") is the Z-score from the actual translation search, which depends on the accuracy of the orientation used for that search. The second ("TFZ==") is the TFZ-equivalent, which indicates what the TFZ score would have been with the correct (refined) orientation. You should see the TFZ-equivalent is high at least for the final components of the solution, and that the LLG (log-likelihood gain) increases as each component of the solution is added. For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU SET RFZ=10.7 TFZ=24.3 PAK=0 LLG=472 TFZ==24.7 RFZ=6.4 TFZ=24.4 PAK=0 LLG=1006 TFZ==29.7 LLG=1006 TFZ==29.7<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
Note that the Euler angles in Phaser follow the same convention as those defined for the Crowther fast rotation function, i.e. z-y-z (rotate around the z-axis, followed by the new y-axis, followed by the new z-axis).<br />
<br />
==History==<br />
<br />
A highly compact summary of the history of the peak positions of a solution is given in the SOLUTION HISTORY in the .sol file. Together with the SOLUTION SET annotation, this is useful in your analysis of the output. <br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! History !! Meaning<br />
|-<br />
| RF/TF(r/t:n) || (r) Rotation Function peak number/(t) Translation Function peak number for the rotation function : (n) number of peak in final merged and sorted list<br />
|-<br />
| PAK(n:m) || (n) input solution number : (m) output solution number after packing condition applied<br />
|-<br />
| RNP(m,a,b,c,... : p) || All input peaks amalgamated after refinement to give output solution number (m and others): (p) output solution number<br />
|-<br />
| FUSE(A,B,C) || Solution numbers merged in amalgamation<br />
|} <br />
<br />
For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU HISTORY RF/TF(1/1:1)PAK(1:1)RNP(1:1)RNP(1:1)<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
A more complicated structure solution may have<br />
<br />
SOLU HISTORY RF/TF(7/1:10)PAK(10:10)RNP(10,12,13,11,17,16,18,25,3,8,22,21,20,7,969,6,5,201,9,4,390,2,1,19:1)RNP(1:1)<br />
<br />
==What to do in Difficult Cases==<br />
<br />
Not every structure can be solved by molecular replacement, but the right strategy can push the limits. What to do when the default jobs fail depends on why your structure is difficult.<br />
*'''Flexible Structure'''<br />
*:The relative orientations of the domains may be different in your crystal than in the model. If that may be the case, break the model into separate PDB files containing rigid-body units, enter these as separate ensembles, and search for them separately. If you find a convincing solution for one domain, but fail to find a solution for the next domain, you can take advantage of the knowledge that its orientation is likely to be similar to that of the first domain. The ROTAte&nbsp;AROUnd option of the brute rotation search can be used to restrict the search to orientations within, say, 30 degrees of that of the known domain. Allow for close approach of the domains by increasing the allowed clashes with the PACK keyword by, say, 1 for each domain break that you introduce. Note that it is possible to use the brute rotation search as part of the automated molecular replacement pipeline, by changing the choice of the type of rotation search. Alternatively, you could try generating a series of models perturbed by normal modes, with the NMAPdb keyword. One of these may duplicate the hinge motion and provide a good single model.<br />
*'''Poor or Incomplete Model'''<br />
*:Signal-to-noise is reduced by coordinate errors or incompleteness of the model. Since the rotation search has lower signal to begin with than the translation search, it is usually more severely affected. For this reason, it can be very useful to use the subsequent translation search as a way to choose among many (say 1000) orientations. THe MR_AUTO FAST search mode automatically reduces the cutoff for accepting peaks from the fast rotation function if the decault pass does not find a solution with a high z-score, but you can manually reduce this further with the PEAKS and PURGE keywords. You can also try turning off the clustering of fast rotation function peaks because the correct orientation may sit on the shoulder of a peak in the rotation function. <br />
*:As shown convincingly by Schwarzenbacher ''et al.'' (Schwarzenbacher, Godzik, Grzechnik &amp; Jaroszewski, ''Acta Cryst.'' D'''60''', 1229-1236, 2004), judicious editing can make a significant difference in the quality of a distant model. In a number of tests with their data on models below 30% sequence identity, we have found that Phaser works best with a "mixed model" (non-identical sidechains longer than Ser replaced by Ser). In agreement with their results, the best models are generally derived using more sophisticated alignment protocols, such as their FFAS protocol. Use [http://www.phenix-online.org/documentation/sculptor.htm phenix.sculptor] to edit your model.<br />
*'''High Degree of Non-crystallographic Symmetry'''<br />
*:If there are clear peaks in the self-rotation function, you can expect orientations to be related by this known NCS. Methods to automatically use such information will be implemented in a future version of Phaser. In the meantime, you can work out for yourself the orientations that would be consistent with NCS and use the ROTAte&nbsp;AROUnd option to sample similar orientations. Alternatively, you may have an oligomeric model and expect similar NCS in the crystal. First search with the oligomeric model; if this fails, search with a monomer. If that succeeds, you can again use the ROTAte&nbsp;AROUnd option to force a subsequent monomer to adopt an orientation similar to the one you expect.<br />
*'''What <u>not</u> to do'''<br />
*:The automated mode of Phaser is fast when Phaser finds a high Z-score solution to your problem. When Phaser cannot find a solution with a significant Z-score, it "thrashes", meaning it maintains a list of 100-1000's of low Z-score potential solutions and tries to improve them. This can lead to exceptionally long Phaser runs (over a week of CPU time). Such runs are possible because the highly automated script allows many consecutive MR jobs to be run without you having to manually set 100-1000's of jobs running and keep track of the results. "Thrashing" generally does not produce a solution: solutions generally appear relatively quickly or not at all. It is more useful to go back and analyse your models and your data to see where improvements can be made. Your system manager will appreciate you terminating these jobs.<br />
*:It is also not a good idea to effectively remove the packing test. Unless there is specific evidence in the logfile that a high TF-function Z-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond a few (e.g. 1-5) will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
*'''Other suggestions'''<br />
*:Phaser has powerful input, output and scripting facilities that allow a large number of possibilities for altering default behaviour and forcing Phaser to do what you think it should. However, you will need to read the information in the manual below to take advantage of these facilities!<br />
<br />
==How to Define Data==<br />
You need to tell Phaser the name of the mtz file containing your data and the columns in the mtz file to be used using the HKLIn and LABIn keywords. Additional keywords (BINS CELL OUTLier RESOlution SPACegroup) define how the data are used.<br />
<br />
==How to Define Models==<br />
Phaser must be given the models that it will use for molecular replacement. A model in Phaser is referred to as an "ensemble", even when it is described by a single file. This is because it is possible to provide a set of aligned structures as an ensemble, from which a statistically-weighted averaged model is calculated. A molecular replacement model is provided either as one or more aligned pdb files, or as an electron density map, entered as structure factors in an mtz file. Each ensemble is treated as a separate type of rigid body to be placed in the molecular replacement solution. An ensemble should only be defined once, even if there are several copies of the molecule in the asymmetric unit.<br />
<br />
Fundamental to the way in which Phaser uses MR models (either from coordinates or maps) is to estimate how the accuracy of the model falls off as a function of resolution, represented by the Sigma(A) curve. To generate the Sigma(A) curve, Phaser needs to know the RMS coordinate error expected for the model and the fraction of the scattering power in the asymmetric unit that this model contributes.<br />
<br />
A Babinet-style correction is used to account for the effects of disordered solvent on the completeness of the model at low resolution.<br />
<br />
Molecular replacement models are defined with the ENSEmble keyword and the COMPosition keyword. The ENSEmble keyword gives (amongst other things) the RMS deviation for the Sigma(A) curve. The COMPosition keyword is used to deduce the fraction of the scattering power in the asymmetric unit that each ensemble contributes. The composition of the asymmetric unit is defined either by entering the molecular weights or sequences of the components in the asymmetric unit, and giving the number of copies of each. Expert users can also enter the fraction of the scattering of each component directly, although the composition must still be entered for the absolute scale calculation. Please note that the composition supplied to Phaser has to include everything in the asymmetric unit, not just what is being looked for in the current search!<br />
<br />
===Building an Ensemble from Coordinates===<br />
The RMS deviation is determined directly from RMS or indirectly from IDENtity in the ENSEmble<br />
keyword using a formula that depends on the sequence identity and the number of residues in the model.<br />
<br />
The RMS deviation estimated from ID may be an underestimate of the true value if there is a slight conformational change between the model and target structures. To find a solution in these cases it may be necessary to increase the RMS from the default value generated from the ID, by say 0.5 Angstroms. On the other hand, when Phaser succeeds in solving a structure from a model with sequence identity much below 30%, it is often found that the fold is preserved better than the average for that level of sequence identity. So it may be worth submitting a run in which the RMS error is set at, say, 1.5, even if the sequence identity is low. The table below can be used as a guide as to the default RMS value corresponding to ID.<br />
<br />
If you construct a model by homology modelling, remember that the RMS error you expect is essentially the error you expect from the template structure (if not worse!). So specify the sequence identity of the template, not of the homology model.<br />
<br />
Only the model with the highest sequence identity is reported in the output pdb file. Also, HETATM cards in the input pdb file are ignored in the calculation of the structure factors for the ensemble, but are carried through to the output pdb file. Thus, the phases on the output mtz file (which come from the structure factors of the ensemble) do not correspond to those that would be calculated from the output pdb file, when there is more than one pdb file in an ensemble and/or the pdbfile(s) have HETATM records.<br />
<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|+ '''Initial estimate of RMS deviation in Angstrom: Number of residues in model (upper row) versus sequence identity (left column)'''<br />
|-<br />
! !! #50 !! #100 !! #200 !! #300 !! #400 !! #600 !! #850 !! #1000 !! #1500 !! #2000<br />
|-<br />
|'''ID=0%''' || 1.579 || 1.689 || 1.875 || 2.030 || 2.164 || 2.391 || 2.625 || 2.748 || 3.093 || 3.375<br />
|-<br />
|'''ID=10%''' || 1.356 || 1.451 || 1.610 || 1.743 || 1.858 || 2.053 || 2.255 || 2.360 || 2.657 || 2.899<br />
|-<br />
|'''ID=20%''' || 1.165 || 1.246 || 1.383 || 1.497 || 1.596 || 1.764 || 1.936 || 2.027 || 2.281 || 2.489<br />
|-<br />
|'''ID=30%''' || 1.000 || 1.070 || 1.188 || 1.286 || 1.371 || 1.515 || 1.663 || 1.741 || 1.959 || 2.138<br />
|-<br />
|'''ID=40%''' || 0.859 || 0.919 || 1.020 || 1.104 || 1.177 || 1.301 || 1.428 || 1.495 || 1.683 || 1.836<br />
|-<br />
|'''ID=50%''' || 0.738 || 0.789 || 0.876 || 0.948 || 1.011 || 1.117 || 1.227 || 1.284 || 1.445 || 1.577<br />
|-<br />
|'''ID=60%''' || 0.634 || 0.678 || 0.752 || 0.814 || 0.868 || 0.959 || 1.053 || 1.103 || 1.241 || 1.354<br />
|-<br />
|'''ID=70%''' || 0.544 || 0.582 || 0.646 || 0.699 || 0.746 || 0.824 || 0.905 || 0.947 || 1.066 || 1.163<br />
|-<br />
|'''ID=80%''' || 0.467 || 0.500 || 0.555 || 0.601 || 0.640 || 0.708 || 0.777 || 0.813 || 0.915 || 0.999<br />
|-<br />
|'''ID=90%''' || 0.401 || 0.429 || 0.477 || 0.516 || 0.550 || 0.608 || 0.667 || 0.698 || 0.786 || 0.858<br />
|-<br />
|'''ID=100%''' || 0.345 || 0.369 || 0.409 || 0.443 || 0.472 || 0.522 || 0.573 || 0.600 || 0.675 || 0.737<br />
|}<br />
<br />
<br />
====Coordinate Editing====<br />
=====HETATM/LIGANDS=====<br />
Phaser ignores the scattering from HETATM records. The HETATM records are carried though to output with occupancy set to zero. Ligands will therefore not contribute to the scattering used for molecular replacement. The exceptions to this rule are the HETATM records for MSE (seleno-methionine) MSO (seleno-methionine selenoxide) CSE (seleno-cysteine) CSO (seleno-cysteine selenoxide) ALY (acetyllysine) MLY (n-dimethyl-lysine) and MLZ (n-methyl-lysine) which are used in the scattering and carried through to output with their original occupancy. If you wish to include any HETATM records in the scattering the record name use the keyword ENSE modlid HETATOM ON<br />
<br />
=====WATER=====<br />
Water molecules (identified by the residue name OW WAT HOH H2O OH2 MOH WTR or TIP) are deleted from the pdb file on input, are not used in the scattering and are not carried through to file output. If you want to retain water molecules you will need to change the residue name to something other than this (e.g. WWW) so that the atoms are not identified as water. To include the water molecules in the scattering, the HETATM records will also have to be changed to ATOM records as described above.<br />
<br />
===Building an Ensemble from Electron Density===<br />
When using density as a model, it is necessary to specify both the extent (x,y,z limits) of the cut-out region of density, and the centre of this region. With coordinates, Phaser can work this out by itself. This information is needed, for instance, to decide how large rotational steps can be in the rotation search and to carry out the molecular transform interpolation correctly. In the case of electron density, the RMS value does not have the same physical meaning that it has when the model is specified by atomic coordinates, but it is used to judge how the accuracy of the calculated structure factors drops off with resolution. A suitable value for RMS can be obtained, in the case of density from an experimentally-phased map, by choosing a value that makes the SigmaA curve fall off with resolution similarly to the mean figures-of-merit. In the case of density from an EM image reconstruction, the RMS value should make the SigmaA curve fall off similarly to a Fourier correlation curve used to judge the resolution of the EM image.<br />
<br />
For detailed information, including a tutorial with example scripts, see<br />
[[Using Electron Density as a Model| Using density as a model]]<br />
<br />
==How to Define Composition==<br />
The composition defines the total amount of protein and nucleic acid that you have in the asymmetric unit. It is very important to include everything in the composition, not just the components that you are searching for, because Phaser needs to know what fraction of the total scattering is accounted for by each model. For the options that specify the size of a particular component (sequence, number of residues, molecular weight), you can separately define several components of the composition of the asymmetric unit and Phaser will just add them up. Note that, for these options, you can specify the composition of one copy of a component and also say how many copies of that component are expected to be present. You can also mix compositions entered by sequence, number of residues and molecular weight. When the composition is checked, Phaser will check for the plausibility of the composition you have specified, as well as multiples of that composition.<br />
<br />
===Default Composition===<br />
For convenience, the composition defaults to 50% protein scattering by volume (the average for protein crystals). It is better to enter it explicitly, even if only to check that you have correctly deduced the probable content of your crystal. If your crystal has higher or lower solvent content than this, or contains nucleic acid, then the composition should be entered explicitly.<br />
===Composition by Sequence===<br />
The composition is calculated from the amino acid sequence of the protein and the base sequence of the nucleic acid in fasta format.<br />
===Composition by Atom===<br />
Individual atoms can be added to the composition. This allows the explicit addition of heavy atoms in the structure, e.g. Fe atoms.<br />
===Composition by Solvent Content===<br />
Scattering is determined from the solvent content of the crystal, assuming that the crystal contains protein only, and the average distribution of amino acids in protein. If your crystal contains nucleic acid or your protein has an unusual amino acid distribution then the composition should be entered explicitly using the MW or sequence options.<br />
===Composition by Number of Residues in ASU===<br />
Scattering is determined from the number of residues in the asymmetric unit, assuming that the crystal contains protein only or nucleic acid only, and assuming an average distribution of residues for either. If your crystal contains a mixture then the composition should be entered explicitly using the MW or sequence options. If your crystal has an unusual residue distribution then the composition should be entered explicitly using the sequence options.<br />
===Composition by Molecular Weight===<br />
The composition is calculated from the molecular weight of the protein and nucleic acid assuming the protein and nucleic acid have the average distribution of amino acids and bases. If your protein or nucleic acid has an unusual amino acid or base distribution the composition should be entered by sequence. You can mix compositions entered by molecular weight with those entered by sequence.<br />
===Composition by Percentage Scattering===<br />
The fraction scattering of each ensemble can be entered directly. The fraction scattering of each ensemble is normally automatically worked out from the average scattering from each ensemble (calculated from the pdb files if entered as coordinates, or from the protein and nucleic acid molecular weights if entered as a map) divided by the total scattering given by the composition, but entering the fraction scattering directly overrides this calculation. This option is for use when the pdb files of the models in the ensemble are unusual e.g. consist only of C-alpha atoms, or only of hydrogen atoms (as in the CLOUDS method for NMR).<br />
<br />
==How to Define Searches==<br />
Phaser does not compare sequences you specify in the composition with the models you specify as ensembles, so you have to specify separately the number of copies of a particular sequence that you expect to be found in the asymmetric unit of your crystal and the number of copies of each ensemble you want to place in the asymmetric unit. By default, Phaser will search first for ensembles expected to yield the highest signal in the MR search (as judged by the expected LLG or eLLG calculation); if that fails to result in a clear solution, different search orders will be tested automatically. For that reason, it does not normally matter which order you use to specify the searches. There is an option to override Phaser's automatic choice of search order, but this will only rarely be useful. It is best to specify the searches for everything that you hope to find in the MR calculation in one job, as that gives Phaser the greatest scope to optimise the calculation. Note that if your crystal possesses translational non-crystallographic symmetry (tNCS), you should be searching for a number of copies of each ensemble divisible by the order of the tNCS (i.e. the number of molecules that should be related by repeated application of a translation vector).<br />
<br />
==How to Define Solutions==<br />
Phaser writes out files ending in ".sol" and ".rlist" that contain the solution information from the job. The root of the files is given by the ROOT keyword. By default, the root filename is PHASER. These files can be read back into subsequent runs of Phaser to build up solutions containing more than one molecule in the asymmetric unit.<br />
<br />
"PHASER.sol" files are generated by all modes (rotation function modes with VERBOSE output), and contain the current idea of potential molecular replacement solutions.<br />
<br />
"PHASER.rlist" files are generated by the rotation function modes, and are used as input for performing translation functions.<br />
<br />
For simple MR cases you don't really need to know how to define molecular replacement solutions. However, for difficult cases you might need to edit the files "PHASER.sol" and "PHASER.rlist" files manually<br />
<br />
=== "sol" Files===<br />
SOLUtion 6DIM keywords describe Ensembles that have been oriented by a rotation search and positioned by a translation search. Each Ensemble in the asymmetric unit has its own SOLUtion keyword. When more than one (potential) molecular replacement solution is present, the solutions are separated with the SOLUTION SET keywords.<br />
<br />
==="rlist" Files===<br />
These files define a rotation function list. The peak list is given with a series of SOLUtion TRIAl keywords.<br />
<br />
If a partial solution is already known, then the information for the currently "known" parts of the asymmetric unit is given in the form used for the PHASER.sol file, followed by the list of trial orientations for which a translation function is to be performed.<br />
<br />
===Fixed partial structure===<br />
If you have the coordinates of a partial solution with the pdb coordinates of the known structure in the correct orientation and position, then you can force Phaser to use these coordinates. Use the SOLUTION keyword to fix a rotation of 0 0 0 and a position of 0 0 0 for these coordinates.<br />
<br />
==How to Select Peaks==<br />
<br />
<br />
<br />
The selection of peaks saved for output in the rotation and translation functions can be done in four different ways.<br />
*'''Select by Percentage'''<br />
*: Percentage of the top peak, where the value of the top peak is defined as 100% and the value of the mean is defined as 0%.<br />
*: Default, cutoff=75%. This criteria has the advantange that at least one peak (the top peak) always survives the selection. If the top solution is clear, then only the one solution will be output, but if the distribution of peaks is rather flat, then many peaks will be output for testing in the next part of the MR procedure (e.g. many peaks selected from the rotation function for testing with a translation function). <br />
*'''Select by Z-score'''<br />
*: Number of standard deviations (sigmas) over the mean (the Z-score). <br />
*: Absolute significance test. Not all searches will produce output if the cutoff value is too high (e.g. 5 sigma). <br />
*'''Select by Number'''<br />
*: Number of top peaks to select. <br />
*: If the distribution is very flat then it might be better to select a fixed large number (e.g. 1000) of top rotation peaks for testing in the translation function.<br />
*'''No selection'''<br />
*: All peaks are selected. <br />
*: Enables full 6 dimensional searches, where all the solutions from the rotation function are output for testing in the translation function. This should never be necessary; it would be much faster and probably just as likely to work if the top 1000 peaks were used in this way.<br />
<br />
[[Image:Phaser_selection.gif| Selection criteria]]<br />
<br />
Peaks can also be clustered or not clustered prior to selection in steps 1 and 2.<br />
*'''Clustering Off'''<br />
: All high peaks on the search grid are selected<br />
*'''Clustering On'''<br />
: Points on the search grid with higher neighbouring points are removed from the selection<br />
<br />
<br />
[[Image:Phaser_clustering.gif| Clustering]]<br />
<br />
==How to Control Output==<br />
The output of Phaser can be controlled with optional keywords. <br />
<br />
The ROOT keyword is not compulsory (the default root filename is "PHASER"), but should always be given, so that your jobs have separate and meaningful output filenames.<br />
<br />
The TOPFiles keyword controls the number of potential MR solutions for which PDB and (in the appropriate modes) MTZ files are produced.<br />
<br />
For the MR_AUTO, MR_RNP and MR_LLG modes, unless HKLOut OFF is given as an optional keyword, Phaser produces an MTZ file with "SigmaA" type weighted Fourier map coefficients for producing electron density maps for rebuilding.<br />
<br />
{| class="wikitable" style="text-align:left" width=100%<br />
|-<br />
! MTZ Column Labels !! Description<br />
|-<br />
| FWT/PHWT || Amplitude and phase for 2''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| DELFWT/PHDELWT || Amplitude and phase for ''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| FOM || ''m'', analogous to the "Sim" weight, to estimate the reliability of &alpha;<sub>calc</sub><br />
|-<br />
| HLA/HLB/HLC/HLD || Hendrickson-Lattman coefficients encoding the phase probability distribution<br />
|}<br />
<br />
==Translational Non-crystallographic Symmetry==<br />
<br />
<span style="color:crimson">'''*Warning*''' Solution by MR in the presence of translational non-crystallographic symmetry is not fully automated.</span><br />
<br />
Phaser calculates correction factors for the expected intensities in the presence of translational non-crystallographic symmetry (tNCS), and is able to solve structures with complex patterns of tNCS. '''However, the use of Phaser in the presence of tNCS requires the nature of the tNCS to be understood by the user.''' In simple cases, solution is no more difficult than solution without tNCS, but in complex cases, separate Phaser runs with tNCS turned on and off, and/or the use of different tNCS vectors, may be necessary.<br />
<br />
The output of Phaser will help the user in detecting and understanding the tNCS, but '''the tNCS is not completely characterised by Phaser'''. The default behaviour may or may not be correct for the particular crystal under study.<br />
<br />
Characterization of the tNCS involves understanding the number of copies of the molecule in the asymmetric unit and the translation vectors between them. Molecules related by a tNCS vector will have an associated peak in the native Patterson. Phaser calculates the native Patterson (MODE TNCS) and lists the peaks that are more than 20% of the origin peak. Any given crystal with tNCS may have one or more peaks meeting this criteria.<br />
<br />
===Default tNCS detection and correction===<br />
<span style="color:crimson">Documentation for Phaser-2.7.16 and above</span><br />
<br />
====No tNCS====<br />
No tNCS correction is applied by default if there is<br />
# no peak in the native Patterson <br />
# more than one peak in the native Patterson over 20% of the origin and these peaks are not all the result of a commensurate modulation<br />
<br />
====Pairs of molecules====<br />
By default, if Phaser detects a peak in the native Patterson then Phaser will search for molecules in pairs related by the tNCS vector given by the peak in the native Patterson.<br />
<br />
This will be the correct behaviour if and only if there are an even number of copies of the molecule in the asymmetric unit, clustered into two groups related by a single tNCS vector. There will only be one significant peak in the native Patterson. Fortunately, this is a reasonably common scenario.<br />
<br />
Phaser refines the relative orientation of the molecules in the two groups (rotations of up to 10 degrees will still give rise to a significant native Patterson peak) and uses this information to generate expected intensity factors for the reflections. Solution should be straightforward, with the usual caveat for MR that there is a sufficiently good model.<br />
<br />
Where there is a single peak in the native Patterson, it is often located at a position half way along a unit cell axis or diagonal, representing a pseudo-halving of the unit cell dimensions. However, Phaser is by no means restricted to these sorts of pseudo-cells in its handling of two-fold tNCS, and the tNCS vector can be in a general position.<br />
<br />
===Non-default tNCS correction===<br />
====Higher order tNCS====<br />
Frequently, tNCS does not associate 2 clusters of molecules in the asymmetric unit, but rather there are 3 or more (n) clusters of molecules associated by a series of vectors that are multiples of 1, 2, 3 ... (n-1) times a basic translation vector. Where n times the basic translation vector equates to (very close to) integer multiples of unit cell axes, the tNCS represents a pseudo-cell, and this case is known as commensurate modulation. <br />
<br />
Phaser attempts to automatically detect commensurate modulation. The peaks of the native Patterson are analyzed to find the n-fold relationship. The series will not generally have all peaks the same height. Lower peaks in the series represent relationships where the relative rotations between related molecules are larger. Missing peaks in the series may be below the default 20% of origin cut-off. This can be lowered with TNCS PATT PERCENT <x><br />
<br />
Phaser then sets TNCS NMOL <n> and the vector for the tNCS, and searches for ensembles in multiples of NMOL.<br />
<br />
When there are more than two molecules related by tNCS, Phaser does not refine the orientations between the molecules related by the tNCS.<br />
<br />
However, as for two-fold tNCS, Phaser is not restricted to these sorts of pseudo-cells and the basic tNCS vector can be in a general position, as can the number of copies.<br />
<br />
'''The automatic detection may not give the true tNCS relationship'''. For example, the true commensurate modulation may be a factor of the NMOL automatically detected by Phaser, or there may not be commensurate modulation at all, or commensurate modulation may not be found with the default Pattesron peak height cutoff. In difficult cases, please inspect the Patterson for peaks.<br />
<br />
====Complex tNCS====<br />
If there are many molecules in the asymmetric unit but they are not all related by tNCS, or there are sub-groups of molecules related by different tNCS vectors, then the modulations of the expected intensities due to the tNCS will be much less significant than the cases described above. '''In these cases it is possible that structure solution will be achieved without any tNCS correction factors being applied.''' Indeed, searching for all the copies as tNCS-related multiples when some molecules are not related by tNCS will cause structure solution to fail. To turn off the automatic detection and use of tNCS use the keyword TNCS USE OFF.<br />
<br />
If turning off the TNCS correction factors fails to give a solution, then a good approach is to proceed step-wise. Consider the highest native Patterson peak first and determine that nature of the tNCS associated with it. Use the appropriate correction factors to locate all the molecules with this tNCS. Then take the second independent native Patterson peak and apply the correction factors associated with it to find the second set of molecules, fixing the first, etc. Finally, turn TNCS off to find any orphan molecules.</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Molecular_Replacement&diff=2473Molecular Replacement2018-07-04T10:33:52Z<p>Rdo20: /* Automated Molecular Replacement */</p>
<hr />
<div><div style="margin-left: 25px; float: right;">__TOC__</div><br />
<br />
'''Quicklink to example scripts''' -> [[MR using keyword input]]<br />
<br />
'''Quicklink to phaser.famos (find_alt_orig_sym_mate) documentation''' -> [[Famos]]<br />
<br />
Phaser should be able to solve most structures with the Automated Molecular Replacement mode, and this is the first mode that you should try. Give Phaser your data ([[#How to Define Data|How to Define Data]]) and your models ([[#How to Define Models|How to Define Models]]), tell Phaser what to search for, and a list of possible spacegroups (in the same point group).<br />
<br />
If this doesn't work (see [[#Has Phaser Solved It?| Has Phaser Solved It?]]), you can try selecting peaks of lower significance in the rotation function in case the real orientation was not within the selection criteria. By default peaks above 75% of the top peak are selected (see [[#How to Select Peaks| How to Select Peaks]]). See [[#What to do in Difficult Cases| What to do in Difficult Cases]] for more hints and tips. If the automated molecular replacement mode doesn't work even with non-default input you need to run the modes of Phaser separately. The possibilities are endless - you can even try exhaustive searches (translations of all orientations) if you want - but experience has shown that most structures that can be solved by Phaser can be solved by relatively simple strategies.<br />
<br />
==Automated Molecular Replacement==<br />
Automated Molecular Replacement combines the anisotropy correction, likelihood enhanced fast rotation function, likelihood enhanced fast translation function, packing and refinement modes for multiple search models and a set of possible spacegroups to automatically solve a structure by molecular replacement. Top solutions are output to the files FILEROOT.sol, FILEROOT.#.mtz and FILEROOT.#.pdb (where "#" refers to the sorted solution number, 1 being the best, and only 1 is output by default). Many structures can be solved by running an automated molecular replacement search with defaults, giving the ensembles that you expect to be easiest to find first.<br />
<br />
At the completion of Molecular Replacement you may wish to place your solutions on a common origin with a previous solution, for which [[Famos | Famos ]] can be used.<br />
<br />
[[Image:Phaser_MR_auto2.png|Flow Diagram for Automated MR|800px]]<br />
<br />
==Should Phaser Solve It?==<br />
The difficulty of a molecular replacement problem depends primarily on two major factors: how well the model will be able to explain the diffraction data (which depends both on the accuracy of the model and on its completeness), and how many reflections can be explained, at least in part. Each reflection provides a piece of information that helps to identify correct MR solutions.<br />
<br />
It is possible to make a reasonable prediction of whether or not a solution will be found. If the quality of the model (its accuracy and completeness) can be estimated, then the expected contribution of each reflection to the total LLG can also be estimated. From a large battery of tests, we know that an LLG of 40 or greater usually indicates a correct solution (at least in the absence of complicating factors such as translational non-crystallographic symmetry, tNCS). Building on this understanding, if it is estimated that the LLG will be 60 or less, then Phaser will assume that the problem is a difficult one, and will implement search procedures optimised for difficult problems.<br />
<br />
==What Resolution of Data Should be Used?==<br />
The signal for a molecular replacement solution should be very clear if the expected value of the LLG is much higher than the minimum required to be fairly certain of a solution. Currently Phaser aims for a minimum LLG of 120 and, if it is possible to achieve an even higher value, given the quality of the model and the quantity of diffraction data, then the resolution for the initial search is limited to the value required to achieve an expected LLG of 120. Data to the full resolution are still used for a final rigid-body refinement, or in a second pass if a clear solution is not found in the first attempt.<br />
<br />
However, if the model is expected to have a large RMS error (based usually on the correlation between sequence identity and RMS error), then data to high resolution will not contribute any significant signal. Regardless of the expected LLG at the highest resolution limit, the resolution used is limited to 1.8 times the estimated RMS error of the model, because this resolution limit gives about 99% of the LLG that could be achieved.<br />
<br />
Because Phaser implements strategies designed to solve structures with as much confidence as possible, as efficiently as possible, it is best to leave the choice of resolution to Phaser, at least in the first instance.<br />
<br />
==Has Phaser Solved It?==<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! TF Z-score !! Have I solved it?<br />
|-<br />
| less than 5 || no<br />
|-<br />
| 5 - 6 || unlikely<br />
|-<br />
| 6 - 7 || possibly<br />
|-<br />
| 7 - 8 || probably<br />
|-<br />
| more than 8* ||definitely<br />
|-<br />
| *''6 for 1st model in monoclinic space groups'' || <br />
|} <br />
<br />
Ideally, a unique solution with a strong signal will be found at the end of the search. If you are searching for multiple components, then ideally the search for each component will also give a strong signal. However if the signal-to-noise of your search is low, there will be noise peaks and multiple ambiguous solutions. Signal-to-noise is judged using the '''Z-score''', which is computed by comparing the LLG values from the rotation or translation search with LLG values for a set of random rotations or translations. The mean and the RMS deviation from the mean are computed from the random set, then the Z-score for a search peak is defined as its LLG minus the mean, all divided by the RMS deviation, ''i.e. '' '''the number of standard deviations above (or below) the mean. '''<br />
<br />
For a rotation function, the correct orientation may be well down the list with a Z-score (number of standard deviations above the mean value, or RFZ) under 4, and it is often not possible to identify the correct orientation until a translation function is performed and yields a clear solution. Note that the signal-to-noise of the rotation function drops with increasing number of primitive symmetry operations (the number of different orientations for symmetry-related molecules), because there is more uncertainty about how the structure factor contributions from symmetry-related copies will add up.<br />
<br />
For a translation function the correct solution will generally have a Z-score (TFZ) over 5 and be well separated from the rest of the solutions. Of course, there will always be exceptions! The table gives a very rough guide to interpreting TFZ scores. This table will be updated, as we learn more from systematic molecular replacement trials.<br />
<br />
When you are searching for multiple components, the signal may be low for the first few components but, as the model becomes more complete, the signal should become stronger. Finding a clear solution for a new component is a good sign that the partial solution to which that component was added was indeed correct.<br />
<br />
You should always at least glance through the summary of the logfile. One thing to look for, in particular, is whether any translation solutions with a high Z-score have been rejected by the packing step. By default up to 5 percent of marker atoms (C-alpha atoms for protein) are allowed to be involved in clashes. A solution with more clashes may still be correct, and the clashes may arise only because of differences in small surface loops. If this happens, repeat the run allowing a suitable number of clashes. Note that, unless there is specific evidence in the logfile that a high TFZ-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond the default will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
<br />
Note that, by default, Phaser will produce a single PDB file corresponding to the top solution found (if any), so finding a single PDB file in your output directory is not an indication that the search succeeded! You have to look, at least, at the summary of the logfile, or at the list of possible solutions in the .sol file that is produced if you run Phaser from ccp4i or command-line scripts.<br />
<br />
==Annotation==<br />
<br />
A highly compact summary of the history of the statistics of a solution is given in the SOLUTION SET in the .sol file. This is a good place to start your analysis of the output. The annotation gives the Z-score of the solution at each rotation and translation function, the number of clashes in the packing, and the refined LLG.<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! Annotation !! Meaning<br />
|-<br />
| RFZ= || Rotation Function Z-score<br />
|-<br />
| TFZ= || Translation Function Z-score<br />
|-<br />
| PAK= || Number of packing clashes<br />
|-<br />
| LLG= || LLG after refinement. Will be repeated when a low resolution refinement is followed by a high resolution refinement.<br />
|-<br />
| TFZ== || Translation Function Z-score equivalent, only calculated for the top solution after refinement (or for the number of top files specified by TOPFILES)<br />
|-<br />
| RF++ || Rotation angle from previous strong solution has been used in the addition of next solution<br />
|-<br />
| RF*0 || Rotation angle 000 identified by low R-factor of input model<br />
|-<br />
| TFZ=* || First molecule in P1 (arbitrary origin, no Translation Function required)<br />
|-<br />
| TF*0 || Translation vector 000 identified by low R-factor of input model<br />
|-<br />
| (&&nbsp;... & ...) || Set of TFZ PAK and LLG values for placements that were amalgamated (more than one placement from a single Translation Function)<br />
|-<br />
| LLG+=(...&nbsp;&&nbsp;...)&nbsp;|| Set of LLG values calculated during amalgamation, which will always be increasing in value<br />
|-<br />
| +TNCS || Components added by Translational NCS relation<br />
|-<br />
| *T=<i>n</i> || Solution matches template solution <i>n</i><br />
|} <br />
<br />
Two versions of TFZ (the translation function Z-score) now appear for each component. The first ("TFZ=") is the Z-score from the actual translation search, which depends on the accuracy of the orientation used for that search. The second ("TFZ==") is the TFZ-equivalent, which indicates what the TFZ score would have been with the correct (refined) orientation. You should see the TFZ-equivalent is high at least for the final components of the solution, and that the LLG (log-likelihood gain) increases as each component of the solution is added. For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU SET RFZ=10.7 TFZ=24.3 PAK=0 LLG=472 TFZ==24.7 RFZ=6.4 TFZ=24.4 PAK=0 LLG=1006 TFZ==29.7 LLG=1006 TFZ==29.7<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
Note that the Euler angles in Phaser follow the same convention as those defined for the Crowther fast rotation function, i.e. z-y-z (rotate around the z-axis, followed by the new y-axis, followed by the new z-axis).<br />
<br />
==History==<br />
<br />
A highly compact summary of the history of the peak positions of a solution is given in the SOLUTION HISTORY in the .sol file. Together with the SOLUTION SET annotation, this is useful in your analysis of the output. <br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! History !! Meaning<br />
|-<br />
| RF/TF(r/t:n) || (r) Rotation Function peak number/(t) Translation Function peak number for the rotation function : (n) number of peak in final merged and sorted list<br />
|-<br />
| PAK(n:m) || (n) input solution number : (m) output solution number after packing condition applied<br />
|-<br />
| RNP(m,a,b,c,... : p) || All input peaks amalgamated after refinement to give output solution number (m and others): (p) output solution number<br />
|-<br />
| FUSE(A,B,C) || Solution numbers merged in amalgamation<br />
|} <br />
<br />
For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU HISTORY RF/TF(1/1:1)PAK(1:1)RNP(1:1)RNP(1:1)<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
A more complicated structure solution may have<br />
<br />
SOLU HISTORY RF/TF(7/1:10)PAK(10:10)RNP(10,12,13,11,17,16,18,25,3,8,22,21,20,7,969,6,5,201,9,4,390,2,1,19:1)RNP(1:1)<br />
<br />
==What to do in Difficult Cases==<br />
<br />
Not every structure can be solved by molecular replacement, but the right strategy can push the limits. What to do when the default jobs fail depends on why your structure is difficult.<br />
*'''Flexible Structure'''<br />
*:The relative orientations of the domains may be different in your crystal than in the model. If that may be the case, break the model into separate PDB files containing rigid-body units, enter these as separate ensembles, and search for them separately. If you find a convincing solution for one domain, but fail to find a solution for the next domain, you can take advantage of the knowledge that its orientation is likely to be similar to that of the first domain. The ROTAte&nbsp;AROUnd option of the brute rotation search can be used to restrict the search to orientations within, say, 30 degrees of that of the known domain. Allow for close approach of the domains by increasing the allowed clashes with the PACK keyword by, say, 1 for each domain break that you introduce. Note that it is possible to use the brute rotation search as part of the automated molecular replacement pipeline, by changing the choice of the type of rotation search. Alternatively, you could try generating a series of models perturbed by normal modes, with the NMAPdb keyword. One of these may duplicate the hinge motion and provide a good single model.<br />
*'''Poor or Incomplete Model'''<br />
*:Signal-to-noise is reduced by coordinate errors or incompleteness of the model. Since the rotation search has lower signal to begin with than the translation search, it is usually more severely affected. For this reason, it can be very useful to use the subsequent translation search as a way to choose among many (say 1000) orientations. THe MR_AUTO FAST search mode automatically reduces the cutoff for accepting peaks from the fast rotation function if the decault pass does not find a solution with a high z-score, but you can manually reduce this further with the PEAKS and PURGE keywords. You can also try turning off the clustering of fast rotation function peaks because the correct orientation may sit on the shoulder of a peak in the rotation function. <br />
*:As shown convincingly by Schwarzenbacher ''et al.'' (Schwarzenbacher, Godzik, Grzechnik &amp; Jaroszewski, ''Acta Cryst.'' D'''60''', 1229-1236, 2004), judicious editing can make a significant difference in the quality of a distant model. In a number of tests with their data on models below 30% sequence identity, we have found that Phaser works best with a "mixed model" (non-identical sidechains longer than Ser replaced by Ser). In agreement with their results, the best models are generally derived using more sophisticated alignment protocols, such as their FFAS protocol. Use [http://www.phenix-online.org/documentation/sculptor.htm phenix.sculptor] to edit your model.<br />
*'''High Degree of Non-crystallographic Symmetry'''<br />
*:If there are clear peaks in the self-rotation function, you can expect orientations to be related by this known NCS. Methods to automatically use such information will be implemented in a future version of Phaser. In the meantime, you can work out for yourself the orientations that would be consistent with NCS and use the ROTAte&nbsp;AROUnd option to sample similar orientations. Alternatively, you may have an oligomeric model and expect similar NCS in the crystal. First search with the oligomeric model; if this fails, search with a monomer. If that succeeds, you can again use the ROTAte&nbsp;AROUnd option to force a subsequent monomer to adopt an orientation similar to the one you expect.<br />
*'''What <u>not</u> to do'''<br />
*:The automated mode of Phaser is fast when Phaser finds a high Z-score solution to your problem. When Phaser cannot find a solution with a significant Z-score, it "thrashes", meaning it maintains a list of 100-1000's of low Z-score potential solutions and tries to improve them. This can lead to exceptionally long Phaser runs (over a week of CPU time). Such runs are possible because the highly automated script allows many consecutive MR jobs to be run without you having to manually set 100-1000's of jobs running and keep track of the results. "Thrashing" generally does not produce a solution: solutions generally appear relatively quickly or not at all. It is more useful to go back and analyse your models and your data to see where improvements can be made. Your system manager will appreciate you terminating these jobs.<br />
*:It is also not a good idea to effectively remove the packing test. Unless there is specific evidence in the logfile that a high TF-function Z-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond a few (e.g. 1-5) will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
*'''Other suggestions'''<br />
*:Phaser has powerful input, output and scripting facilities that allow a large number of possibilities for altering default behaviour and forcing Phaser to do what you think it should. However, you will need to read the information in the manual below to take advantage of these facilities!<br />
<br />
==How to Define Data==<br />
You need to tell Phaser the name of the mtz file containing your data and the columns in the mtz file to be used using the HKLIn and LABIn keywords. Additional keywords (BINS CELL OUTLier RESOlution SPACegroup) define how the data are used.<br />
<br />
==How to Define Models==<br />
Phaser must be given the models that it will use for molecular replacement. A model in Phaser is referred to as an "ensemble", even when it is described by a single file. This is because it is possible to provide a set of aligned structures as an ensemble, from which a statistically-weighted averaged model is calculated. A molecular replacement model is provided either as one or more aligned pdb files, or as an electron density map, entered as structure factors in an mtz file. Each ensemble is treated as a separate type of rigid body to be placed in the molecular replacement solution. An ensemble should only be defined once, even if there are several copies of the molecule in the asymmetric unit.<br />
<br />
Fundamental to the way in which Phaser uses MR models (either from coordinates or maps) is to estimate how the accuracy of the model falls off as a function of resolution, represented by the Sigma(A) curve. To generate the Sigma(A) curve, Phaser needs to know the RMS coordinate error expected for the model and the fraction of the scattering power in the asymmetric unit that this model contributes.<br />
<br />
A Babinet-style correction is used to account for the effects of disordered solvent on the completeness of the model at low resolution.<br />
<br />
Molecular replacement models are defined with the ENSEmble keyword and the COMPosition keyword. The ENSEmble keyword gives (amongst other things) the RMS deviation for the Sigma(A) curve. The COMPosition keyword is used to deduce the fraction of the scattering power in the asymmetric unit that each ensemble contributes. The composition of the asymmetric unit is defined either by entering the molecular weights or sequences of the components in the asymmetric unit, and giving the number of copies of each. Expert users can also enter the fraction of the scattering of each component directly, although the composition must still be entered for the absolute scale calculation. Please note that the composition supplied to Phaser has to include everything in the asymmetric unit, not just what is being looked for in the current search!<br />
<br />
===Building an Ensemble from Coordinates===<br />
The RMS deviation is determined directly from RMS or indirectly from IDENtity in the ENSEmble<br />
keyword using a formula that depends on the sequence identity and the number of residues in the model.<br />
<br />
The RMS deviation estimated from ID may be an underestimate of the true value if there is a slight conformational change between the model and target structures. To find a solution in these cases it may be necessary to increase the RMS from the default value generated from the ID, by say 0.5 Angstroms. On the other hand, when Phaser succeeds in solving a structure from a model with sequence identity much below 30%, it is often found that the fold is preserved better than the average for that level of sequence identity. So it may be worth submitting a run in which the RMS error is set at, say, 1.5, even if the sequence identity is low. The table below can be used as a guide as to the default RMS value corresponding to ID.<br />
<br />
If you construct a model by homology modelling, remember that the RMS error you expect is essentially the error you expect from the template structure (if not worse!). So specify the sequence identity of the template, not of the homology model.<br />
<br />
Only the model with the highest sequence identity is reported in the output pdb file. Also, HETATM cards in the input pdb file are ignored in the calculation of the structure factors for the ensemble, but are carried through to the output pdb file. Thus, the phases on the output mtz file (which come from the structure factors of the ensemble) do not correspond to those that would be calculated from the output pdb file, when there is more than one pdb file in an ensemble and/or the pdbfile(s) have HETATM records.<br />
<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|+ '''Initial estimate of RMS deviation in Angstrom: Number of residues in model (upper row) versus sequence identity (left column)'''<br />
|-<br />
! !! #50 !! #100 !! #200 !! #300 !! #400 !! #600 !! #850 !! #1000 !! #1500 !! #2000<br />
|-<br />
|'''ID=0%''' || 1.579 || 1.689 || 1.875 || 2.030 || 2.164 || 2.391 || 2.625 || 2.748 || 3.093 || 3.375<br />
|-<br />
|'''ID=10%''' || 1.356 || 1.451 || 1.610 || 1.743 || 1.858 || 2.053 || 2.255 || 2.360 || 2.657 || 2.899<br />
|-<br />
|'''ID=20%''' || 1.165 || 1.246 || 1.383 || 1.497 || 1.596 || 1.764 || 1.936 || 2.027 || 2.281 || 2.489<br />
|-<br />
|'''ID=30%''' || 1.000 || 1.070 || 1.188 || 1.286 || 1.371 || 1.515 || 1.663 || 1.741 || 1.959 || 2.138<br />
|-<br />
|'''ID=40%''' || 0.859 || 0.919 || 1.020 || 1.104 || 1.177 || 1.301 || 1.428 || 1.495 || 1.683 || 1.836<br />
|-<br />
|'''ID=50%''' || 0.738 || 0.789 || 0.876 || 0.948 || 1.011 || 1.117 || 1.227 || 1.284 || 1.445 || 1.577<br />
|-<br />
|'''ID=60%''' || 0.634 || 0.678 || 0.752 || 0.814 || 0.868 || 0.959 || 1.053 || 1.103 || 1.241 || 1.354<br />
|-<br />
|'''ID=70%''' || 0.544 || 0.582 || 0.646 || 0.699 || 0.746 || 0.824 || 0.905 || 0.947 || 1.066 || 1.163<br />
|-<br />
|'''ID=80%''' || 0.467 || 0.500 || 0.555 || 0.601 || 0.640 || 0.708 || 0.777 || 0.813 || 0.915 || 0.999<br />
|-<br />
|'''ID=90%''' || 0.401 || 0.429 || 0.477 || 0.516 || 0.550 || 0.608 || 0.667 || 0.698 || 0.786 || 0.858<br />
|-<br />
|'''ID=100%''' || 0.345 || 0.369 || 0.409 || 0.443 || 0.472 || 0.522 || 0.573 || 0.600 || 0.675 || 0.737<br />
|}<br />
<br />
<br />
====Coordinate Editing====<br />
=====HETATM/LIGANDS=====<br />
Phaser ignores the scattering from HETATM records. The HETATM records are carried though to output with occupancy set to zero. Ligands will therefore not contribute to the scattering used for molecular replacement. The exceptions to this rule are the HETATM records for MSE (seleno-methionine) MSO (seleno-methionine selenoxide) CSE (seleno-cysteine) CSO (seleno-cysteine selenoxide) ALY (acetyllysine) MLY (n-dimethyl-lysine) and MLZ (n-methyl-lysine) which are used in the scattering and carried through to output with their original occupancy. If you wish to include any HETATM records in the scattering the record name use the keyword ENSE modlid HETATOM ON<br />
<br />
=====WATER=====<br />
Water molecules (identified by the residue name OW WAT HOH H2O OH2 MOH WTR or TIP) are deleted from the pdb file on input, are not used in the scattering and are not carried through to file output. If you want to retain water molecules you will need to change the residue name to something other than this (e.g. WWW) so that the atoms are not identified as water. To include the water molecules in the scattering, the HETATM records will also have to be changed to ATOM records as described above.<br />
<br />
===Building an Ensemble from Electron Density===<br />
When using density as a model, it is necessary to specify both the extent (x,y,z limits) of the cut-out region of density, and the centre of this region. With coordinates, Phaser can work this out by itself. This information is needed, for instance, to decide how large rotational steps can be in the rotation search and to carry out the molecular transform interpolation correctly. In the case of electron density, the RMS value does not have the same physical meaning that it has when the model is specified by atomic coordinates, but it is used to judge how the accuracy of the calculated structure factors drops off with resolution. A suitable value for RMS can be obtained, in the case of density from an experimentally-phased map, by choosing a value that makes the SigmaA curve fall off with resolution similarly to the mean figures-of-merit. In the case of density from an EM image reconstruction, the RMS value should make the SigmaA curve fall off similarly to a Fourier correlation curve used to judge the resolution of the EM image.<br />
<br />
For detailed information, including a tutorial with example scripts, see<br />
[[Using Electron Density as a Model| Using density as a model]]<br />
<br />
==How to Define Composition==<br />
The composition defines the total amount of protein and nucleic acid that you have in the asymmetric unit. It is very important to include everything in the composition, not just the components that you are searching for, because Phaser needs to know what fraction of the total scattering is accounted for by each model. For the options that specify the size of a particular component (sequence, number of residues, molecular weight), you can separately define several components of the composition of the asymmetric unit and Phaser will just add them up. Note that, for these options, you can specify the composition of one copy of a component and also say how many copies of that component are expected to be present. You can also mix compositions entered by sequence, number of residues and molecular weight. When the composition is checked, Phaser will check for the plausibility of the composition you have specified, as well as multiples of that composition.<br />
<br />
===Default Composition===<br />
For convenience, the composition defaults to 50% protein scattering by volume (the average for protein crystals). It is better to enter it explicitly, even if only to check that you have correctly deduced the probable content of your crystal. If your crystal has higher or lower solvent content than this, or contains nucleic acid, then the composition should be entered explicitly.<br />
===Composition by Sequence===<br />
The composition is calculated from the amino acid sequence of the protein and the base sequence of the nucleic acid in fasta format.<br />
===Composition by Atom===<br />
Individual atoms can be added to the composition. This allows the explicit addition of heavy atoms in the structure, e.g. Fe atoms.<br />
===Composition by Solvent Content===<br />
Scattering is determined from the solvent content of the crystal, assuming that the crystal contains protein only, and the average distribution of amino acids in protein. If your crystal contains nucleic acid or your protein has an unusual amino acid distribution then the composition should be entered explicitly using the MW or sequence options.<br />
===Composition by Number of Residues in ASU===<br />
Scattering is determined from the number of residues in the asymmetric unit, assuming that the crystal contains protein only or nucleic acid only, and assuming an average distribution of residues for either. If your crystal contains a mixture then the composition should be entered explicitly using the MW or sequence options. If your crystal has an unusual residue distribution then the composition should be entered explicitly using the sequence options.<br />
===Composition by Molecular Weight===<br />
The composition is calculated from the molecular weight of the protein and nucleic acid assuming the protein and nucleic acid have the average distribution of amino acids and bases. If your protein or nucleic acid has an unusual amino acid or base distribution the composition should be entered by sequence. You can mix compositions entered by molecular weight with those entered by sequence.<br />
===Composition by Percentage Scattering===<br />
The fraction scattering of each ensemble can be entered directly. The fraction scattering of each ensemble is normally automatically worked out from the average scattering from each ensemble (calculated from the pdb files if entered as coordinates, or from the protein and nucleic acid molecular weights if entered as a map) divided by the total scattering given by the composition, but entering the fraction scattering directly overrides this calculation. This option is for use when the pdb files of the models in the ensemble are unusual e.g. consist only of C-alpha atoms, or only of hydrogen atoms (as in the CLOUDS method for NMR).<br />
<br />
==How to Define Searches==<br />
Phaser does not compare sequences you specify in the composition with the models you specify as ensembles, so you have to specify separately the number of copies of a particular sequence that you expect to be found in the asymmetric unit of your crystal and the number of copies of each ensemble you want to place in the asymmetric unit. By default, Phaser will search first for ensembles expected to yield the highest signal in the MR search (as judged by the expected LLG or eLLG calculation); if that fails to result in a clear solution, different search orders will be tested automatically. For that reason, it does not normally matter which order you use to specify the searches. There is an option to override Phaser's automatic choice of search order, but this will only rarely be useful. It is best to specify the searches for everything that you hope to find in the MR calculation in one job, as that gives Phaser the greatest scope to optimise the calculation. Note that if your crystal possesses translational non-crystallographic symmetry (tNCS), you should be searching for a number of copies of each ensemble divisible by the order of the tNCS (i.e. the number of molecules that should be related by repeated application of a translation vector).<br />
<br />
==How to Define Solutions==<br />
Phaser writes out files ending in ".sol" and ".rlist" that contain the solution information from the job. The root of the files is given by the ROOT keyword. By default, the root filename is PHASER. These files can be read back into subsequent runs of Phaser to build up solutions containing more than one molecule in the asymmetric unit.<br />
<br />
"PHASER.sol" files are generated by all modes (rotation function modes with VERBOSE output), and contain the current idea of potential molecular replacement solutions.<br />
<br />
"PHASER.rlist" files are generated by the rotation function modes, and are used as input for performing translation functions.<br />
<br />
For simple MR cases you don't really need to know how to define molecular replacement solutions. However, for difficult cases you might need to edit the files "PHASER.sol" and "PHASER.rlist" files manually<br />
<br />
=== "sol" Files===<br />
SOLUtion 6DIM keywords describe Ensembles that have been oriented by a rotation search and positioned by a translation search. Each Ensemble in the asymmetric unit has its own SOLUtion keyword. When more than one (potential) molecular replacement solution is present, the solutions are separated with the SOLUTION SET keywords.<br />
<br />
==="rlist" Files===<br />
These files define a rotation function list. The peak list is given with a series of SOLUtion TRIAl keywords.<br />
<br />
If a partial solution is already known, then the information for the currently "known" parts of the asymmetric unit is given in the form used for the PHASER.sol file, followed by the list of trial orientations for which a translation function is to be performed.<br />
<br />
===Fixed partial structure===<br />
If you have the coordinates of a partial solution with the pdb coordinates of the known structure in the correct orientation and position, then you can force Phaser to use these coordinates. Use the SOLUTION keyword to fix a rotation of 0 0 0 and a position of 0 0 0 for these coordinates.<br />
<br />
==How to Select Peaks==<br />
<br />
<br />
<br />
The selection of peaks saved for output in the rotation and translation functions can be done in four different ways.<br />
*'''Select by Percentage'''<br />
*: Percentage of the top peak, where the value of the top peak is defined as 100% and the value of the mean is defined as 0%.<br />
*: Default, cutoff=75%. This criteria has the advantange that at least one peak (the top peak) always survives the selection. If the top solution is clear, then only the one solution will be output, but if the distribution of peaks is rather flat, then many peaks will be output for testing in the next part of the MR procedure (e.g. many peaks selected from the rotation function for testing with a translation function). <br />
*'''Select by Z-score'''<br />
*: Number of standard deviations (sigmas) over the mean (the Z-score). <br />
*: Absolute significance test. Not all searches will produce output if the cutoff value is too high (e.g. 5 sigma). <br />
*'''Select by Number'''<br />
*: Number of top peaks to select. <br />
*: If the distribution is very flat then it might be better to select a fixed large number (e.g. 1000) of top rotation peaks for testing in the translation function.<br />
*'''No selection'''<br />
*: All peaks are selected. <br />
*: Enables full 6 dimensional searches, where all the solutions from the rotation function are output for testing in the translation function. This should never be necessary; it would be much faster and probably just as likely to work if the top 1000 peaks were used in this way.<br />
<br />
[[Image:Phaser_selection.gif| Selection criteria]]<br />
<br />
Peaks can also be clustered or not clustered prior to selection in steps 1 and 2.<br />
*'''Clustering Off'''<br />
: All high peaks on the search grid are selected<br />
*'''Clustering On'''<br />
: Points on the search grid with higher neighbouring points are removed from the selection<br />
<br />
<br />
[[Image:Phaser_clustering.gif| Clustering]]<br />
<br />
==How to Control Output==<br />
The output of Phaser can be controlled with optional keywords. <br />
<br />
The ROOT keyword is not compulsory (the default root filename is "PHASER"), but should always be given, so that your jobs have separate and meaningful output filenames.<br />
<br />
The TOPFiles keyword controls the number of potential MR solutions for which PDB and (in the appropriate modes) MTZ files are produced.<br />
<br />
For the MR_AUTO, MR_RNP and MR_LLG modes, unless HKLOut OFF is given as an optional keyword, Phaser produces an MTZ file with "SigmaA" type weighted Fourier map coefficients for producing electron density maps for rebuilding.<br />
<br />
{| class="wikitable" style="text-align:left" width=100%<br />
|-<br />
! MTZ Column Labels !! Description<br />
|-<br />
| FWT/PHWT || Amplitude and phase for 2''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| DELFWT/PHDELWT || Amplitude and phase for ''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| FOM || ''m'', analogous to the "Sim" weight, to estimate the reliability of &alpha;<sub>calc</sub><br />
|-<br />
| HLA/HLB/HLC/HLD || Hendrickson-Lattman coefficients encoding the phase probability distribution<br />
|}<br />
<br />
==Translational Non-crystallographic Symmetry==<br />
<br />
<span style="color:crimson">'''*Warning*''' Solution by MR in the presence of translational non-crystallographic symmetry is not fully automated.</span><br />
<br />
Phaser calculates correction factors for the expected intensities in the presence of translational non-crystallographic symmetry (tNCS), and is able to solve structures with complex patterns of tNCS. '''However, the use of Phaser in the presence of tNCS requires the nature of the tNCS to be understood by the user.''' In simple cases, solution is no more difficult than solution without tNCS, but in complex cases, separate Phaser runs with tNCS turned on and off, and/or the use of different tNCS vectors, may be necessary.<br />
<br />
The output of Phaser will help the user in detecting and understanding the tNCS, but '''the tNCS is not completely characterised by Phaser'''. The default behaviour may or may not be correct for the particular crystal under study.<br />
<br />
Characterization of the tNCS involves understanding the number of copies of the molecule in the asymmetric unit and the translation vectors between them. Molecules related by a tNCS vector will have an associated peak in the native Patterson. Phaser calculates the native Patterson (MODE TNCS) and lists the peaks that are more than 20% of the origin peak. Any given crystal with tNCS may have one or more peaks meeting this criteria.<br />
<br />
===Default tNCS detection and correction===<br />
<span style="color:crimson">Documentation for Phaser-2.7.16 and above</span><br />
<br />
====No tNCS====<br />
No tNCS correction is applied by default if there is<br />
# no peak in the native Patterson <br />
# more than one peak in the native Patterson over 20% of the origin and these peaks are not all the result of a commensurate modulation<br />
<br />
====Pairs of molecules====<br />
By default, if Phaser detects a peak in the native Patterson then Phaser will search for molecules in pairs related by the tNCS vector given by the peak in the native Patterson.<br />
<br />
This will be the correct behaviour if and only if there are an even number of copies of the molecule in the asymmetric unit, clustered into two groups related by a single tNCS vector. There will only be one significant peak in the native Patterson. Fortunately, this is a reasonably common scenario.<br />
<br />
Phaser refines the relative orientation of the molecules in the two groups (rotations of up to 10 degrees will still give rise to a significant native Patterson peak) and uses this information to generate expected intensity factors for the reflections. Solution should be straightforward, with the usual caveat for MR that there is a sufficiently good model.<br />
<br />
Where there is a single peak in the native Patterson, it is often located at a position half way along a unit cell axis or diagonal, representing a pseudo-halving of the unit cell dimensions. However, Phaser is by no means restricted to these sorts of pseudo-cells in its handling of two-fold tNCS, and the tNCS vector can be in a general position.<br />
<br />
===Non-default tNCS correction===<br />
====Higher order tNCS====<br />
Frequently, tNCS does not associate 2 clusters of molecules in the asymmetric unit, but rather there are 3 or more (n) clusters of molecules associated by a series of vectors that are multiples of 1, 2, 3 ... (n-1) times a basic translation vector. Where n times the basic translation vector equates to (very close to) integer multiples of unit cell axes, the tNCS represents a pseudo-cell, and this case is known as commensurate modulation. <br />
<br />
Phaser attempts to automatically detect commensurate modulation. The peaks of the native Patterson are analyzed to find the n-fold relationship. The series will not generally have all peaks the same height. Lower peaks in the series represent relationships where the relative rotations between related molecules are larger. Missing peaks in the series may be below the default 20% of origin cut-off. This can be lowered with TNCS PATT PERCENT <x><br />
<br />
Phaser then sets TNCS NMOL <n> and the vector for the tNCS, and searches for ensembles in multiples of NMOL.<br />
<br />
When there are more than two molecules related by tNCS, Phaser does not refine the orientations between the molecules related by the tNCS.<br />
<br />
However, as for two-fold tNCS, Phaser is not restricted to these sorts of pseudo-cells and the basic tNCS vector can be in a general position, as can the number of copies.<br />
<br />
'''The automatic detection may not give the true tNCS relationship'''. For example, the true commensurate modulation may be a factor of the NMOL automatically detected by Phaser, or there may not be commensurate modulation at all, or commensurate modulation may not be found with the default Pattesron peak height cutoff. In difficult cases, please inspect the Patterson for peaks.<br />
<br />
====Complex tNCS====<br />
If there are many molecules in the asymmetric unit but they are not all related by tNCS, or there are sub-groups of molecules related by different tNCS vectors, then the modulations of the expected intensities due to the tNCS will be much less significant than the cases described above. '''In these cases it is possible that structure solution will be achieved without any tNCS correction factors being applied.''' Indeed, searching for all the copies as tNCS-related multiples when some molecules are not related by tNCS will cause structure solution to fail. To turn off the automatic detection and use of tNCS use the keyword TNCS USE OFF.<br />
<br />
If turning off the TNCS correction factors fails to give a solution, then a good approach is to proceed step-wise. Consider the highest native Patterson peak first and determine that nature of the tNCS associated with it. Use the appropriate correction factors to locate all the molecules with this tNCS. Then take the second independent native Patterson peak and apply the correction factors associated with it to find the second set of molecules, fixing the first, etc. Finally, turn TNCS off to find any orphan molecules.</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Molecular_Replacement&diff=2472Molecular Replacement2018-07-04T10:33:04Z<p>Rdo20: /* Automated Molecular Replacement */</p>
<hr />
<div><div style="margin-left: 25px; float: right;">__TOC__</div><br />
<br />
'''Quicklink to example scripts''' -> [[MR using keyword input]]<br />
<br />
'''Quicklink to phaser.famos (find_alt_orig_sym_mate) documentation''' -> [[Famos]]<br />
<br />
Phaser should be able to solve most structures with the Automated Molecular Replacement mode, and this is the first mode that you should try. Give Phaser your data ([[#How to Define Data|How to Define Data]]) and your models ([[#How to Define Models|How to Define Models]]), tell Phaser what to search for, and a list of possible spacegroups (in the same point group).<br />
<br />
If this doesn't work (see [[#Has Phaser Solved It?| Has Phaser Solved It?]]), you can try selecting peaks of lower significance in the rotation function in case the real orientation was not within the selection criteria. By default peaks above 75% of the top peak are selected (see [[#How to Select Peaks| How to Select Peaks]]). See [[#What to do in Difficult Cases| What to do in Difficult Cases]] for more hints and tips. If the automated molecular replacement mode doesn't work even with non-default input you need to run the modes of Phaser separately. The possibilities are endless - you can even try exhaustive searches (translations of all orientations) if you want - but experience has shown that most structures that can be solved by Phaser can be solved by relatively simple strategies.<br />
<br />
==Automated Molecular Replacement==<br />
Automated Molecular Replacement combines the anisotropy correction, likelihood enhanced fast rotation function, likelihood enhanced fast translation function, packing and refinement modes for multiple search models and a set of possible spacegroups to automatically solve a structure by molecular replacement. Top solutions are output to the files FILEROOT.sol, FILEROOT.#.mtz and FILEROOT.#.pdb (where "#" refers to the sorted solution number, 1 being the best, and only 1 is output by default). Many structures can be solved by running an automated molecular replacement search with defaults, giving the ensembles that you expect to be easiest to find first.<br />
<br />
At the completion of Molecular Replacement you may wish to place your solutions on a common origin with a previous solution, for which [[Famos | Famos ]] can be used.<br />
<br />
[[Image:Phaser_MR_auto2.png|Flow Diagram for Automated MR|700px]]<br />
<br />
==Should Phaser Solve It?==<br />
The difficulty of a molecular replacement problem depends primarily on two major factors: how well the model will be able to explain the diffraction data (which depends both on the accuracy of the model and on its completeness), and how many reflections can be explained, at least in part. Each reflection provides a piece of information that helps to identify correct MR solutions.<br />
<br />
It is possible to make a reasonable prediction of whether or not a solution will be found. If the quality of the model (its accuracy and completeness) can be estimated, then the expected contribution of each reflection to the total LLG can also be estimated. From a large battery of tests, we know that an LLG of 40 or greater usually indicates a correct solution (at least in the absence of complicating factors such as translational non-crystallographic symmetry, tNCS). Building on this understanding, if it is estimated that the LLG will be 60 or less, then Phaser will assume that the problem is a difficult one, and will implement search procedures optimised for difficult problems.<br />
<br />
==What Resolution of Data Should be Used?==<br />
The signal for a molecular replacement solution should be very clear if the expected value of the LLG is much higher than the minimum required to be fairly certain of a solution. Currently Phaser aims for a minimum LLG of 120 and, if it is possible to achieve an even higher value, given the quality of the model and the quantity of diffraction data, then the resolution for the initial search is limited to the value required to achieve an expected LLG of 120. Data to the full resolution are still used for a final rigid-body refinement, or in a second pass if a clear solution is not found in the first attempt.<br />
<br />
However, if the model is expected to have a large RMS error (based usually on the correlation between sequence identity and RMS error), then data to high resolution will not contribute any significant signal. Regardless of the expected LLG at the highest resolution limit, the resolution used is limited to 1.8 times the estimated RMS error of the model, because this resolution limit gives about 99% of the LLG that could be achieved.<br />
<br />
Because Phaser implements strategies designed to solve structures with as much confidence as possible, as efficiently as possible, it is best to leave the choice of resolution to Phaser, at least in the first instance.<br />
<br />
==Has Phaser Solved It?==<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! TF Z-score !! Have I solved it?<br />
|-<br />
| less than 5 || no<br />
|-<br />
| 5 - 6 || unlikely<br />
|-<br />
| 6 - 7 || possibly<br />
|-<br />
| 7 - 8 || probably<br />
|-<br />
| more than 8* ||definitely<br />
|-<br />
| *''6 for 1st model in monoclinic space groups'' || <br />
|} <br />
<br />
Ideally, a unique solution with a strong signal will be found at the end of the search. If you are searching for multiple components, then ideally the search for each component will also give a strong signal. However if the signal-to-noise of your search is low, there will be noise peaks and multiple ambiguous solutions. Signal-to-noise is judged using the '''Z-score''', which is computed by comparing the LLG values from the rotation or translation search with LLG values for a set of random rotations or translations. The mean and the RMS deviation from the mean are computed from the random set, then the Z-score for a search peak is defined as its LLG minus the mean, all divided by the RMS deviation, ''i.e. '' '''the number of standard deviations above (or below) the mean. '''<br />
<br />
For a rotation function, the correct orientation may be well down the list with a Z-score (number of standard deviations above the mean value, or RFZ) under 4, and it is often not possible to identify the correct orientation until a translation function is performed and yields a clear solution. Note that the signal-to-noise of the rotation function drops with increasing number of primitive symmetry operations (the number of different orientations for symmetry-related molecules), because there is more uncertainty about how the structure factor contributions from symmetry-related copies will add up.<br />
<br />
For a translation function the correct solution will generally have a Z-score (TFZ) over 5 and be well separated from the rest of the solutions. Of course, there will always be exceptions! The table gives a very rough guide to interpreting TFZ scores. This table will be updated, as we learn more from systematic molecular replacement trials.<br />
<br />
When you are searching for multiple components, the signal may be low for the first few components but, as the model becomes more complete, the signal should become stronger. Finding a clear solution for a new component is a good sign that the partial solution to which that component was added was indeed correct.<br />
<br />
You should always at least glance through the summary of the logfile. One thing to look for, in particular, is whether any translation solutions with a high Z-score have been rejected by the packing step. By default up to 5 percent of marker atoms (C-alpha atoms for protein) are allowed to be involved in clashes. A solution with more clashes may still be correct, and the clashes may arise only because of differences in small surface loops. If this happens, repeat the run allowing a suitable number of clashes. Note that, unless there is specific evidence in the logfile that a high TFZ-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond the default will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
<br />
Note that, by default, Phaser will produce a single PDB file corresponding to the top solution found (if any), so finding a single PDB file in your output directory is not an indication that the search succeeded! You have to look, at least, at the summary of the logfile, or at the list of possible solutions in the .sol file that is produced if you run Phaser from ccp4i or command-line scripts.<br />
<br />
==Annotation==<br />
<br />
A highly compact summary of the history of the statistics of a solution is given in the SOLUTION SET in the .sol file. This is a good place to start your analysis of the output. The annotation gives the Z-score of the solution at each rotation and translation function, the number of clashes in the packing, and the refined LLG.<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! Annotation !! Meaning<br />
|-<br />
| RFZ= || Rotation Function Z-score<br />
|-<br />
| TFZ= || Translation Function Z-score<br />
|-<br />
| PAK= || Number of packing clashes<br />
|-<br />
| LLG= || LLG after refinement. Will be repeated when a low resolution refinement is followed by a high resolution refinement.<br />
|-<br />
| TFZ== || Translation Function Z-score equivalent, only calculated for the top solution after refinement (or for the number of top files specified by TOPFILES)<br />
|-<br />
| RF++ || Rotation angle from previous strong solution has been used in the addition of next solution<br />
|-<br />
| RF*0 || Rotation angle 000 identified by low R-factor of input model<br />
|-<br />
| TFZ=* || First molecule in P1 (arbitrary origin, no Translation Function required)<br />
|-<br />
| TF*0 || Translation vector 000 identified by low R-factor of input model<br />
|-<br />
| (&&nbsp;... & ...) || Set of TFZ PAK and LLG values for placements that were amalgamated (more than one placement from a single Translation Function)<br />
|-<br />
| LLG+=(...&nbsp;&&nbsp;...)&nbsp;|| Set of LLG values calculated during amalgamation, which will always be increasing in value<br />
|-<br />
| +TNCS || Components added by Translational NCS relation<br />
|-<br />
| *T=<i>n</i> || Solution matches template solution <i>n</i><br />
|} <br />
<br />
Two versions of TFZ (the translation function Z-score) now appear for each component. The first ("TFZ=") is the Z-score from the actual translation search, which depends on the accuracy of the orientation used for that search. The second ("TFZ==") is the TFZ-equivalent, which indicates what the TFZ score would have been with the correct (refined) orientation. You should see the TFZ-equivalent is high at least for the final components of the solution, and that the LLG (log-likelihood gain) increases as each component of the solution is added. For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU SET RFZ=10.7 TFZ=24.3 PAK=0 LLG=472 TFZ==24.7 RFZ=6.4 TFZ=24.4 PAK=0 LLG=1006 TFZ==29.7 LLG=1006 TFZ==29.7<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
Note that the Euler angles in Phaser follow the same convention as those defined for the Crowther fast rotation function, i.e. z-y-z (rotate around the z-axis, followed by the new y-axis, followed by the new z-axis).<br />
<br />
==History==<br />
<br />
A highly compact summary of the history of the peak positions of a solution is given in the SOLUTION HISTORY in the .sol file. Together with the SOLUTION SET annotation, this is useful in your analysis of the output. <br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! History !! Meaning<br />
|-<br />
| RF/TF(r/t:n) || (r) Rotation Function peak number/(t) Translation Function peak number for the rotation function : (n) number of peak in final merged and sorted list<br />
|-<br />
| PAK(n:m) || (n) input solution number : (m) output solution number after packing condition applied<br />
|-<br />
| RNP(m,a,b,c,... : p) || All input peaks amalgamated after refinement to give output solution number (m and others): (p) output solution number<br />
|-<br />
| FUSE(A,B,C) || Solution numbers merged in amalgamation<br />
|} <br />
<br />
For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU HISTORY RF/TF(1/1:1)PAK(1:1)RNP(1:1)RNP(1:1)<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
A more complicated structure solution may have<br />
<br />
SOLU HISTORY RF/TF(7/1:10)PAK(10:10)RNP(10,12,13,11,17,16,18,25,3,8,22,21,20,7,969,6,5,201,9,4,390,2,1,19:1)RNP(1:1)<br />
<br />
==What to do in Difficult Cases==<br />
<br />
Not every structure can be solved by molecular replacement, but the right strategy can push the limits. What to do when the default jobs fail depends on why your structure is difficult.<br />
*'''Flexible Structure'''<br />
*:The relative orientations of the domains may be different in your crystal than in the model. If that may be the case, break the model into separate PDB files containing rigid-body units, enter these as separate ensembles, and search for them separately. If you find a convincing solution for one domain, but fail to find a solution for the next domain, you can take advantage of the knowledge that its orientation is likely to be similar to that of the first domain. The ROTAte&nbsp;AROUnd option of the brute rotation search can be used to restrict the search to orientations within, say, 30 degrees of that of the known domain. Allow for close approach of the domains by increasing the allowed clashes with the PACK keyword by, say, 1 for each domain break that you introduce. Note that it is possible to use the brute rotation search as part of the automated molecular replacement pipeline, by changing the choice of the type of rotation search. Alternatively, you could try generating a series of models perturbed by normal modes, with the NMAPdb keyword. One of these may duplicate the hinge motion and provide a good single model.<br />
*'''Poor or Incomplete Model'''<br />
*:Signal-to-noise is reduced by coordinate errors or incompleteness of the model. Since the rotation search has lower signal to begin with than the translation search, it is usually more severely affected. For this reason, it can be very useful to use the subsequent translation search as a way to choose among many (say 1000) orientations. THe MR_AUTO FAST search mode automatically reduces the cutoff for accepting peaks from the fast rotation function if the decault pass does not find a solution with a high z-score, but you can manually reduce this further with the PEAKS and PURGE keywords. You can also try turning off the clustering of fast rotation function peaks because the correct orientation may sit on the shoulder of a peak in the rotation function. <br />
*:As shown convincingly by Schwarzenbacher ''et al.'' (Schwarzenbacher, Godzik, Grzechnik &amp; Jaroszewski, ''Acta Cryst.'' D'''60''', 1229-1236, 2004), judicious editing can make a significant difference in the quality of a distant model. In a number of tests with their data on models below 30% sequence identity, we have found that Phaser works best with a "mixed model" (non-identical sidechains longer than Ser replaced by Ser). In agreement with their results, the best models are generally derived using more sophisticated alignment protocols, such as their FFAS protocol. Use [http://www.phenix-online.org/documentation/sculptor.htm phenix.sculptor] to edit your model.<br />
*'''High Degree of Non-crystallographic Symmetry'''<br />
*:If there are clear peaks in the self-rotation function, you can expect orientations to be related by this known NCS. Methods to automatically use such information will be implemented in a future version of Phaser. In the meantime, you can work out for yourself the orientations that would be consistent with NCS and use the ROTAte&nbsp;AROUnd option to sample similar orientations. Alternatively, you may have an oligomeric model and expect similar NCS in the crystal. First search with the oligomeric model; if this fails, search with a monomer. If that succeeds, you can again use the ROTAte&nbsp;AROUnd option to force a subsequent monomer to adopt an orientation similar to the one you expect.<br />
*'''What <u>not</u> to do'''<br />
*:The automated mode of Phaser is fast when Phaser finds a high Z-score solution to your problem. When Phaser cannot find a solution with a significant Z-score, it "thrashes", meaning it maintains a list of 100-1000's of low Z-score potential solutions and tries to improve them. This can lead to exceptionally long Phaser runs (over a week of CPU time). Such runs are possible because the highly automated script allows many consecutive MR jobs to be run without you having to manually set 100-1000's of jobs running and keep track of the results. "Thrashing" generally does not produce a solution: solutions generally appear relatively quickly or not at all. It is more useful to go back and analyse your models and your data to see where improvements can be made. Your system manager will appreciate you terminating these jobs.<br />
*:It is also not a good idea to effectively remove the packing test. Unless there is specific evidence in the logfile that a high TF-function Z-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond a few (e.g. 1-5) will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
*'''Other suggestions'''<br />
*:Phaser has powerful input, output and scripting facilities that allow a large number of possibilities for altering default behaviour and forcing Phaser to do what you think it should. However, you will need to read the information in the manual below to take advantage of these facilities!<br />
<br />
==How to Define Data==<br />
You need to tell Phaser the name of the mtz file containing your data and the columns in the mtz file to be used using the HKLIn and LABIn keywords. Additional keywords (BINS CELL OUTLier RESOlution SPACegroup) define how the data are used.<br />
<br />
==How to Define Models==<br />
Phaser must be given the models that it will use for molecular replacement. A model in Phaser is referred to as an "ensemble", even when it is described by a single file. This is because it is possible to provide a set of aligned structures as an ensemble, from which a statistically-weighted averaged model is calculated. A molecular replacement model is provided either as one or more aligned pdb files, or as an electron density map, entered as structure factors in an mtz file. Each ensemble is treated as a separate type of rigid body to be placed in the molecular replacement solution. An ensemble should only be defined once, even if there are several copies of the molecule in the asymmetric unit.<br />
<br />
Fundamental to the way in which Phaser uses MR models (either from coordinates or maps) is to estimate how the accuracy of the model falls off as a function of resolution, represented by the Sigma(A) curve. To generate the Sigma(A) curve, Phaser needs to know the RMS coordinate error expected for the model and the fraction of the scattering power in the asymmetric unit that this model contributes.<br />
<br />
A Babinet-style correction is used to account for the effects of disordered solvent on the completeness of the model at low resolution.<br />
<br />
Molecular replacement models are defined with the ENSEmble keyword and the COMPosition keyword. The ENSEmble keyword gives (amongst other things) the RMS deviation for the Sigma(A) curve. The COMPosition keyword is used to deduce the fraction of the scattering power in the asymmetric unit that each ensemble contributes. The composition of the asymmetric unit is defined either by entering the molecular weights or sequences of the components in the asymmetric unit, and giving the number of copies of each. Expert users can also enter the fraction of the scattering of each component directly, although the composition must still be entered for the absolute scale calculation. Please note that the composition supplied to Phaser has to include everything in the asymmetric unit, not just what is being looked for in the current search!<br />
<br />
===Building an Ensemble from Coordinates===<br />
The RMS deviation is determined directly from RMS or indirectly from IDENtity in the ENSEmble<br />
keyword using a formula that depends on the sequence identity and the number of residues in the model.<br />
<br />
The RMS deviation estimated from ID may be an underestimate of the true value if there is a slight conformational change between the model and target structures. To find a solution in these cases it may be necessary to increase the RMS from the default value generated from the ID, by say 0.5 Angstroms. On the other hand, when Phaser succeeds in solving a structure from a model with sequence identity much below 30%, it is often found that the fold is preserved better than the average for that level of sequence identity. So it may be worth submitting a run in which the RMS error is set at, say, 1.5, even if the sequence identity is low. The table below can be used as a guide as to the default RMS value corresponding to ID.<br />
<br />
If you construct a model by homology modelling, remember that the RMS error you expect is essentially the error you expect from the template structure (if not worse!). So specify the sequence identity of the template, not of the homology model.<br />
<br />
Only the model with the highest sequence identity is reported in the output pdb file. Also, HETATM cards in the input pdb file are ignored in the calculation of the structure factors for the ensemble, but are carried through to the output pdb file. Thus, the phases on the output mtz file (which come from the structure factors of the ensemble) do not correspond to those that would be calculated from the output pdb file, when there is more than one pdb file in an ensemble and/or the pdbfile(s) have HETATM records.<br />
<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|+ '''Initial estimate of RMS deviation in Angstrom: Number of residues in model (upper row) versus sequence identity (left column)'''<br />
|-<br />
! !! #50 !! #100 !! #200 !! #300 !! #400 !! #600 !! #850 !! #1000 !! #1500 !! #2000<br />
|-<br />
|'''ID=0%''' || 1.579 || 1.689 || 1.875 || 2.030 || 2.164 || 2.391 || 2.625 || 2.748 || 3.093 || 3.375<br />
|-<br />
|'''ID=10%''' || 1.356 || 1.451 || 1.610 || 1.743 || 1.858 || 2.053 || 2.255 || 2.360 || 2.657 || 2.899<br />
|-<br />
|'''ID=20%''' || 1.165 || 1.246 || 1.383 || 1.497 || 1.596 || 1.764 || 1.936 || 2.027 || 2.281 || 2.489<br />
|-<br />
|'''ID=30%''' || 1.000 || 1.070 || 1.188 || 1.286 || 1.371 || 1.515 || 1.663 || 1.741 || 1.959 || 2.138<br />
|-<br />
|'''ID=40%''' || 0.859 || 0.919 || 1.020 || 1.104 || 1.177 || 1.301 || 1.428 || 1.495 || 1.683 || 1.836<br />
|-<br />
|'''ID=50%''' || 0.738 || 0.789 || 0.876 || 0.948 || 1.011 || 1.117 || 1.227 || 1.284 || 1.445 || 1.577<br />
|-<br />
|'''ID=60%''' || 0.634 || 0.678 || 0.752 || 0.814 || 0.868 || 0.959 || 1.053 || 1.103 || 1.241 || 1.354<br />
|-<br />
|'''ID=70%''' || 0.544 || 0.582 || 0.646 || 0.699 || 0.746 || 0.824 || 0.905 || 0.947 || 1.066 || 1.163<br />
|-<br />
|'''ID=80%''' || 0.467 || 0.500 || 0.555 || 0.601 || 0.640 || 0.708 || 0.777 || 0.813 || 0.915 || 0.999<br />
|-<br />
|'''ID=90%''' || 0.401 || 0.429 || 0.477 || 0.516 || 0.550 || 0.608 || 0.667 || 0.698 || 0.786 || 0.858<br />
|-<br />
|'''ID=100%''' || 0.345 || 0.369 || 0.409 || 0.443 || 0.472 || 0.522 || 0.573 || 0.600 || 0.675 || 0.737<br />
|}<br />
<br />
<br />
====Coordinate Editing====<br />
=====HETATM/LIGANDS=====<br />
Phaser ignores the scattering from HETATM records. The HETATM records are carried though to output with occupancy set to zero. Ligands will therefore not contribute to the scattering used for molecular replacement. The exceptions to this rule are the HETATM records for MSE (seleno-methionine) MSO (seleno-methionine selenoxide) CSE (seleno-cysteine) CSO (seleno-cysteine selenoxide) ALY (acetyllysine) MLY (n-dimethyl-lysine) and MLZ (n-methyl-lysine) which are used in the scattering and carried through to output with their original occupancy. If you wish to include any HETATM records in the scattering the record name use the keyword ENSE modlid HETATOM ON<br />
<br />
=====WATER=====<br />
Water molecules (identified by the residue name OW WAT HOH H2O OH2 MOH WTR or TIP) are deleted from the pdb file on input, are not used in the scattering and are not carried through to file output. If you want to retain water molecules you will need to change the residue name to something other than this (e.g. WWW) so that the atoms are not identified as water. To include the water molecules in the scattering, the HETATM records will also have to be changed to ATOM records as described above.<br />
<br />
===Building an Ensemble from Electron Density===<br />
When using density as a model, it is necessary to specify both the extent (x,y,z limits) of the cut-out region of density, and the centre of this region. With coordinates, Phaser can work this out by itself. This information is needed, for instance, to decide how large rotational steps can be in the rotation search and to carry out the molecular transform interpolation correctly. In the case of electron density, the RMS value does not have the same physical meaning that it has when the model is specified by atomic coordinates, but it is used to judge how the accuracy of the calculated structure factors drops off with resolution. A suitable value for RMS can be obtained, in the case of density from an experimentally-phased map, by choosing a value that makes the SigmaA curve fall off with resolution similarly to the mean figures-of-merit. In the case of density from an EM image reconstruction, the RMS value should make the SigmaA curve fall off similarly to a Fourier correlation curve used to judge the resolution of the EM image.<br />
<br />
For detailed information, including a tutorial with example scripts, see<br />
[[Using Electron Density as a Model| Using density as a model]]<br />
<br />
==How to Define Composition==<br />
The composition defines the total amount of protein and nucleic acid that you have in the asymmetric unit. It is very important to include everything in the composition, not just the components that you are searching for, because Phaser needs to know what fraction of the total scattering is accounted for by each model. For the options that specify the size of a particular component (sequence, number of residues, molecular weight), you can separately define several components of the composition of the asymmetric unit and Phaser will just add them up. Note that, for these options, you can specify the composition of one copy of a component and also say how many copies of that component are expected to be present. You can also mix compositions entered by sequence, number of residues and molecular weight. When the composition is checked, Phaser will check for the plausibility of the composition you have specified, as well as multiples of that composition.<br />
<br />
===Default Composition===<br />
For convenience, the composition defaults to 50% protein scattering by volume (the average for protein crystals). It is better to enter it explicitly, even if only to check that you have correctly deduced the probable content of your crystal. If your crystal has higher or lower solvent content than this, or contains nucleic acid, then the composition should be entered explicitly.<br />
===Composition by Sequence===<br />
The composition is calculated from the amino acid sequence of the protein and the base sequence of the nucleic acid in fasta format.<br />
===Composition by Atom===<br />
Individual atoms can be added to the composition. This allows the explicit addition of heavy atoms in the structure, e.g. Fe atoms.<br />
===Composition by Solvent Content===<br />
Scattering is determined from the solvent content of the crystal, assuming that the crystal contains protein only, and the average distribution of amino acids in protein. If your crystal contains nucleic acid or your protein has an unusual amino acid distribution then the composition should be entered explicitly using the MW or sequence options.<br />
===Composition by Number of Residues in ASU===<br />
Scattering is determined from the number of residues in the asymmetric unit, assuming that the crystal contains protein only or nucleic acid only, and assuming an average distribution of residues for either. If your crystal contains a mixture then the composition should be entered explicitly using the MW or sequence options. If your crystal has an unusual residue distribution then the composition should be entered explicitly using the sequence options.<br />
===Composition by Molecular Weight===<br />
The composition is calculated from the molecular weight of the protein and nucleic acid assuming the protein and nucleic acid have the average distribution of amino acids and bases. If your protein or nucleic acid has an unusual amino acid or base distribution the composition should be entered by sequence. You can mix compositions entered by molecular weight with those entered by sequence.<br />
===Composition by Percentage Scattering===<br />
The fraction scattering of each ensemble can be entered directly. The fraction scattering of each ensemble is normally automatically worked out from the average scattering from each ensemble (calculated from the pdb files if entered as coordinates, or from the protein and nucleic acid molecular weights if entered as a map) divided by the total scattering given by the composition, but entering the fraction scattering directly overrides this calculation. This option is for use when the pdb files of the models in the ensemble are unusual e.g. consist only of C-alpha atoms, or only of hydrogen atoms (as in the CLOUDS method for NMR).<br />
<br />
==How to Define Searches==<br />
Phaser does not compare sequences you specify in the composition with the models you specify as ensembles, so you have to specify separately the number of copies of a particular sequence that you expect to be found in the asymmetric unit of your crystal and the number of copies of each ensemble you want to place in the asymmetric unit. By default, Phaser will search first for ensembles expected to yield the highest signal in the MR search (as judged by the expected LLG or eLLG calculation); if that fails to result in a clear solution, different search orders will be tested automatically. For that reason, it does not normally matter which order you use to specify the searches. There is an option to override Phaser's automatic choice of search order, but this will only rarely be useful. It is best to specify the searches for everything that you hope to find in the MR calculation in one job, as that gives Phaser the greatest scope to optimise the calculation. Note that if your crystal possesses translational non-crystallographic symmetry (tNCS), you should be searching for a number of copies of each ensemble divisible by the order of the tNCS (i.e. the number of molecules that should be related by repeated application of a translation vector).<br />
<br />
==How to Define Solutions==<br />
Phaser writes out files ending in ".sol" and ".rlist" that contain the solution information from the job. The root of the files is given by the ROOT keyword. By default, the root filename is PHASER. These files can be read back into subsequent runs of Phaser to build up solutions containing more than one molecule in the asymmetric unit.<br />
<br />
"PHASER.sol" files are generated by all modes (rotation function modes with VERBOSE output), and contain the current idea of potential molecular replacement solutions.<br />
<br />
"PHASER.rlist" files are generated by the rotation function modes, and are used as input for performing translation functions.<br />
<br />
For simple MR cases you don't really need to know how to define molecular replacement solutions. However, for difficult cases you might need to edit the files "PHASER.sol" and "PHASER.rlist" files manually<br />
<br />
=== "sol" Files===<br />
SOLUtion 6DIM keywords describe Ensembles that have been oriented by a rotation search and positioned by a translation search. Each Ensemble in the asymmetric unit has its own SOLUtion keyword. When more than one (potential) molecular replacement solution is present, the solutions are separated with the SOLUTION SET keywords.<br />
<br />
==="rlist" Files===<br />
These files define a rotation function list. The peak list is given with a series of SOLUtion TRIAl keywords.<br />
<br />
If a partial solution is already known, then the information for the currently "known" parts of the asymmetric unit is given in the form used for the PHASER.sol file, followed by the list of trial orientations for which a translation function is to be performed.<br />
<br />
===Fixed partial structure===<br />
If you have the coordinates of a partial solution with the pdb coordinates of the known structure in the correct orientation and position, then you can force Phaser to use these coordinates. Use the SOLUTION keyword to fix a rotation of 0 0 0 and a position of 0 0 0 for these coordinates.<br />
<br />
==How to Select Peaks==<br />
<br />
<br />
<br />
The selection of peaks saved for output in the rotation and translation functions can be done in four different ways.<br />
*'''Select by Percentage'''<br />
*: Percentage of the top peak, where the value of the top peak is defined as 100% and the value of the mean is defined as 0%.<br />
*: Default, cutoff=75%. This criteria has the advantange that at least one peak (the top peak) always survives the selection. If the top solution is clear, then only the one solution will be output, but if the distribution of peaks is rather flat, then many peaks will be output for testing in the next part of the MR procedure (e.g. many peaks selected from the rotation function for testing with a translation function). <br />
*'''Select by Z-score'''<br />
*: Number of standard deviations (sigmas) over the mean (the Z-score). <br />
*: Absolute significance test. Not all searches will produce output if the cutoff value is too high (e.g. 5 sigma). <br />
*'''Select by Number'''<br />
*: Number of top peaks to select. <br />
*: If the distribution is very flat then it might be better to select a fixed large number (e.g. 1000) of top rotation peaks for testing in the translation function.<br />
*'''No selection'''<br />
*: All peaks are selected. <br />
*: Enables full 6 dimensional searches, where all the solutions from the rotation function are output for testing in the translation function. This should never be necessary; it would be much faster and probably just as likely to work if the top 1000 peaks were used in this way.<br />
<br />
[[Image:Phaser_selection.gif| Selection criteria]]<br />
<br />
Peaks can also be clustered or not clustered prior to selection in steps 1 and 2.<br />
*'''Clustering Off'''<br />
: All high peaks on the search grid are selected<br />
*'''Clustering On'''<br />
: Points on the search grid with higher neighbouring points are removed from the selection<br />
<br />
<br />
[[Image:Phaser_clustering.gif| Clustering]]<br />
<br />
==How to Control Output==<br />
The output of Phaser can be controlled with optional keywords. <br />
<br />
The ROOT keyword is not compulsory (the default root filename is "PHASER"), but should always be given, so that your jobs have separate and meaningful output filenames.<br />
<br />
The TOPFiles keyword controls the number of potential MR solutions for which PDB and (in the appropriate modes) MTZ files are produced.<br />
<br />
For the MR_AUTO, MR_RNP and MR_LLG modes, unless HKLOut OFF is given as an optional keyword, Phaser produces an MTZ file with "SigmaA" type weighted Fourier map coefficients for producing electron density maps for rebuilding.<br />
<br />
{| class="wikitable" style="text-align:left" width=100%<br />
|-<br />
! MTZ Column Labels !! Description<br />
|-<br />
| FWT/PHWT || Amplitude and phase for 2''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| DELFWT/PHDELWT || Amplitude and phase for ''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| FOM || ''m'', analogous to the "Sim" weight, to estimate the reliability of &alpha;<sub>calc</sub><br />
|-<br />
| HLA/HLB/HLC/HLD || Hendrickson-Lattman coefficients encoding the phase probability distribution<br />
|}<br />
<br />
==Translational Non-crystallographic Symmetry==<br />
<br />
<span style="color:crimson">'''*Warning*''' Solution by MR in the presence of translational non-crystallographic symmetry is not fully automated.</span><br />
<br />
Phaser calculates correction factors for the expected intensities in the presence of translational non-crystallographic symmetry (tNCS), and is able to solve structures with complex patterns of tNCS. '''However, the use of Phaser in the presence of tNCS requires the nature of the tNCS to be understood by the user.''' In simple cases, solution is no more difficult than solution without tNCS, but in complex cases, separate Phaser runs with tNCS turned on and off, and/or the use of different tNCS vectors, may be necessary.<br />
<br />
The output of Phaser will help the user in detecting and understanding the tNCS, but '''the tNCS is not completely characterised by Phaser'''. The default behaviour may or may not be correct for the particular crystal under study.<br />
<br />
Characterization of the tNCS involves understanding the number of copies of the molecule in the asymmetric unit and the translation vectors between them. Molecules related by a tNCS vector will have an associated peak in the native Patterson. Phaser calculates the native Patterson (MODE TNCS) and lists the peaks that are more than 20% of the origin peak. Any given crystal with tNCS may have one or more peaks meeting this criteria.<br />
<br />
===Default tNCS detection and correction===<br />
<span style="color:crimson">Documentation for Phaser-2.7.16 and above</span><br />
<br />
====No tNCS====<br />
No tNCS correction is applied by default if there is<br />
# no peak in the native Patterson <br />
# more than one peak in the native Patterson over 20% of the origin and these peaks are not all the result of a commensurate modulation<br />
<br />
====Pairs of molecules====<br />
By default, if Phaser detects a peak in the native Patterson then Phaser will search for molecules in pairs related by the tNCS vector given by the peak in the native Patterson.<br />
<br />
This will be the correct behaviour if and only if there are an even number of copies of the molecule in the asymmetric unit, clustered into two groups related by a single tNCS vector. There will only be one significant peak in the native Patterson. Fortunately, this is a reasonably common scenario.<br />
<br />
Phaser refines the relative orientation of the molecules in the two groups (rotations of up to 10 degrees will still give rise to a significant native Patterson peak) and uses this information to generate expected intensity factors for the reflections. Solution should be straightforward, with the usual caveat for MR that there is a sufficiently good model.<br />
<br />
Where there is a single peak in the native Patterson, it is often located at a position half way along a unit cell axis or diagonal, representing a pseudo-halving of the unit cell dimensions. However, Phaser is by no means restricted to these sorts of pseudo-cells in its handling of two-fold tNCS, and the tNCS vector can be in a general position.<br />
<br />
===Non-default tNCS correction===<br />
====Higher order tNCS====<br />
Frequently, tNCS does not associate 2 clusters of molecules in the asymmetric unit, but rather there are 3 or more (n) clusters of molecules associated by a series of vectors that are multiples of 1, 2, 3 ... (n-1) times a basic translation vector. Where n times the basic translation vector equates to (very close to) integer multiples of unit cell axes, the tNCS represents a pseudo-cell, and this case is known as commensurate modulation. <br />
<br />
Phaser attempts to automatically detect commensurate modulation. The peaks of the native Patterson are analyzed to find the n-fold relationship. The series will not generally have all peaks the same height. Lower peaks in the series represent relationships where the relative rotations between related molecules are larger. Missing peaks in the series may be below the default 20% of origin cut-off. This can be lowered with TNCS PATT PERCENT <x><br />
<br />
Phaser then sets TNCS NMOL <n> and the vector for the tNCS, and searches for ensembles in multiples of NMOL.<br />
<br />
When there are more than two molecules related by tNCS, Phaser does not refine the orientations between the molecules related by the tNCS.<br />
<br />
However, as for two-fold tNCS, Phaser is not restricted to these sorts of pseudo-cells and the basic tNCS vector can be in a general position, as can the number of copies.<br />
<br />
'''The automatic detection may not give the true tNCS relationship'''. For example, the true commensurate modulation may be a factor of the NMOL automatically detected by Phaser, or there may not be commensurate modulation at all, or commensurate modulation may not be found with the default Pattesron peak height cutoff. In difficult cases, please inspect the Patterson for peaks.<br />
<br />
====Complex tNCS====<br />
If there are many molecules in the asymmetric unit but they are not all related by tNCS, or there are sub-groups of molecules related by different tNCS vectors, then the modulations of the expected intensities due to the tNCS will be much less significant than the cases described above. '''In these cases it is possible that structure solution will be achieved without any tNCS correction factors being applied.''' Indeed, searching for all the copies as tNCS-related multiples when some molecules are not related by tNCS will cause structure solution to fail. To turn off the automatic detection and use of tNCS use the keyword TNCS USE OFF.<br />
<br />
If turning off the TNCS correction factors fails to give a solution, then a good approach is to proceed step-wise. Consider the highest native Patterson peak first and determine that nature of the tNCS associated with it. Use the appropriate correction factors to locate all the molecules with this tNCS. Then take the second independent native Patterson peak and apply the correction factors associated with it to find the second set of molecules, fixing the first, etc. Finally, turn TNCS off to find any orphan molecules.</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Molecular_Replacement&diff=2471Molecular Replacement2018-07-04T10:32:48Z<p>Rdo20: /* Automated Molecular Replacement */</p>
<hr />
<div><div style="margin-left: 25px; float: right;">__TOC__</div><br />
<br />
'''Quicklink to example scripts''' -> [[MR using keyword input]]<br />
<br />
'''Quicklink to phaser.famos (find_alt_orig_sym_mate) documentation''' -> [[Famos]]<br />
<br />
Phaser should be able to solve most structures with the Automated Molecular Replacement mode, and this is the first mode that you should try. Give Phaser your data ([[#How to Define Data|How to Define Data]]) and your models ([[#How to Define Models|How to Define Models]]), tell Phaser what to search for, and a list of possible spacegroups (in the same point group).<br />
<br />
If this doesn't work (see [[#Has Phaser Solved It?| Has Phaser Solved It?]]), you can try selecting peaks of lower significance in the rotation function in case the real orientation was not within the selection criteria. By default peaks above 75% of the top peak are selected (see [[#How to Select Peaks| How to Select Peaks]]). See [[#What to do in Difficult Cases| What to do in Difficult Cases]] for more hints and tips. If the automated molecular replacement mode doesn't work even with non-default input you need to run the modes of Phaser separately. The possibilities are endless - you can even try exhaustive searches (translations of all orientations) if you want - but experience has shown that most structures that can be solved by Phaser can be solved by relatively simple strategies.<br />
<br />
==Automated Molecular Replacement==<br />
Automated Molecular Replacement combines the anisotropy correction, likelihood enhanced fast rotation function, likelihood enhanced fast translation function, packing and refinement modes for multiple search models and a set of possible spacegroups to automatically solve a structure by molecular replacement. Top solutions are output to the files FILEROOT.sol, FILEROOT.#.mtz and FILEROOT.#.pdb (where "#" refers to the sorted solution number, 1 being the best, and only 1 is output by default). Many structures can be solved by running an automated molecular replacement search with defaults, giving the ensembles that you expect to be easiest to find first.<br />
<br />
At the completion of Molecular Replacement you may wish to place your solutions on a common origin with a previous solution, for which [[Famos | Famos ]] can be used.<br />
<br />
[[Image:Phaser_MR_auto2.png|Flow Diagram for Automated MR|600px]]<br />
<br />
==Should Phaser Solve It?==<br />
The difficulty of a molecular replacement problem depends primarily on two major factors: how well the model will be able to explain the diffraction data (which depends both on the accuracy of the model and on its completeness), and how many reflections can be explained, at least in part. Each reflection provides a piece of information that helps to identify correct MR solutions.<br />
<br />
It is possible to make a reasonable prediction of whether or not a solution will be found. If the quality of the model (its accuracy and completeness) can be estimated, then the expected contribution of each reflection to the total LLG can also be estimated. From a large battery of tests, we know that an LLG of 40 or greater usually indicates a correct solution (at least in the absence of complicating factors such as translational non-crystallographic symmetry, tNCS). Building on this understanding, if it is estimated that the LLG will be 60 or less, then Phaser will assume that the problem is a difficult one, and will implement search procedures optimised for difficult problems.<br />
<br />
==What Resolution of Data Should be Used?==<br />
The signal for a molecular replacement solution should be very clear if the expected value of the LLG is much higher than the minimum required to be fairly certain of a solution. Currently Phaser aims for a minimum LLG of 120 and, if it is possible to achieve an even higher value, given the quality of the model and the quantity of diffraction data, then the resolution for the initial search is limited to the value required to achieve an expected LLG of 120. Data to the full resolution are still used for a final rigid-body refinement, or in a second pass if a clear solution is not found in the first attempt.<br />
<br />
However, if the model is expected to have a large RMS error (based usually on the correlation between sequence identity and RMS error), then data to high resolution will not contribute any significant signal. Regardless of the expected LLG at the highest resolution limit, the resolution used is limited to 1.8 times the estimated RMS error of the model, because this resolution limit gives about 99% of the LLG that could be achieved.<br />
<br />
Because Phaser implements strategies designed to solve structures with as much confidence as possible, as efficiently as possible, it is best to leave the choice of resolution to Phaser, at least in the first instance.<br />
<br />
==Has Phaser Solved It?==<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! TF Z-score !! Have I solved it?<br />
|-<br />
| less than 5 || no<br />
|-<br />
| 5 - 6 || unlikely<br />
|-<br />
| 6 - 7 || possibly<br />
|-<br />
| 7 - 8 || probably<br />
|-<br />
| more than 8* ||definitely<br />
|-<br />
| *''6 for 1st model in monoclinic space groups'' || <br />
|} <br />
<br />
Ideally, a unique solution with a strong signal will be found at the end of the search. If you are searching for multiple components, then ideally the search for each component will also give a strong signal. However if the signal-to-noise of your search is low, there will be noise peaks and multiple ambiguous solutions. Signal-to-noise is judged using the '''Z-score''', which is computed by comparing the LLG values from the rotation or translation search with LLG values for a set of random rotations or translations. The mean and the RMS deviation from the mean are computed from the random set, then the Z-score for a search peak is defined as its LLG minus the mean, all divided by the RMS deviation, ''i.e. '' '''the number of standard deviations above (or below) the mean. '''<br />
<br />
For a rotation function, the correct orientation may be well down the list with a Z-score (number of standard deviations above the mean value, or RFZ) under 4, and it is often not possible to identify the correct orientation until a translation function is performed and yields a clear solution. Note that the signal-to-noise of the rotation function drops with increasing number of primitive symmetry operations (the number of different orientations for symmetry-related molecules), because there is more uncertainty about how the structure factor contributions from symmetry-related copies will add up.<br />
<br />
For a translation function the correct solution will generally have a Z-score (TFZ) over 5 and be well separated from the rest of the solutions. Of course, there will always be exceptions! The table gives a very rough guide to interpreting TFZ scores. This table will be updated, as we learn more from systematic molecular replacement trials.<br />
<br />
When you are searching for multiple components, the signal may be low for the first few components but, as the model becomes more complete, the signal should become stronger. Finding a clear solution for a new component is a good sign that the partial solution to which that component was added was indeed correct.<br />
<br />
You should always at least glance through the summary of the logfile. One thing to look for, in particular, is whether any translation solutions with a high Z-score have been rejected by the packing step. By default up to 5 percent of marker atoms (C-alpha atoms for protein) are allowed to be involved in clashes. A solution with more clashes may still be correct, and the clashes may arise only because of differences in small surface loops. If this happens, repeat the run allowing a suitable number of clashes. Note that, unless there is specific evidence in the logfile that a high TFZ-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond the default will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
<br />
Note that, by default, Phaser will produce a single PDB file corresponding to the top solution found (if any), so finding a single PDB file in your output directory is not an indication that the search succeeded! You have to look, at least, at the summary of the logfile, or at the list of possible solutions in the .sol file that is produced if you run Phaser from ccp4i or command-line scripts.<br />
<br />
==Annotation==<br />
<br />
A highly compact summary of the history of the statistics of a solution is given in the SOLUTION SET in the .sol file. This is a good place to start your analysis of the output. The annotation gives the Z-score of the solution at each rotation and translation function, the number of clashes in the packing, and the refined LLG.<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! Annotation !! Meaning<br />
|-<br />
| RFZ= || Rotation Function Z-score<br />
|-<br />
| TFZ= || Translation Function Z-score<br />
|-<br />
| PAK= || Number of packing clashes<br />
|-<br />
| LLG= || LLG after refinement. Will be repeated when a low resolution refinement is followed by a high resolution refinement.<br />
|-<br />
| TFZ== || Translation Function Z-score equivalent, only calculated for the top solution after refinement (or for the number of top files specified by TOPFILES)<br />
|-<br />
| RF++ || Rotation angle from previous strong solution has been used in the addition of next solution<br />
|-<br />
| RF*0 || Rotation angle 000 identified by low R-factor of input model<br />
|-<br />
| TFZ=* || First molecule in P1 (arbitrary origin, no Translation Function required)<br />
|-<br />
| TF*0 || Translation vector 000 identified by low R-factor of input model<br />
|-<br />
| (&&nbsp;... & ...) || Set of TFZ PAK and LLG values for placements that were amalgamated (more than one placement from a single Translation Function)<br />
|-<br />
| LLG+=(...&nbsp;&&nbsp;...)&nbsp;|| Set of LLG values calculated during amalgamation, which will always be increasing in value<br />
|-<br />
| +TNCS || Components added by Translational NCS relation<br />
|-<br />
| *T=<i>n</i> || Solution matches template solution <i>n</i><br />
|} <br />
<br />
Two versions of TFZ (the translation function Z-score) now appear for each component. The first ("TFZ=") is the Z-score from the actual translation search, which depends on the accuracy of the orientation used for that search. The second ("TFZ==") is the TFZ-equivalent, which indicates what the TFZ score would have been with the correct (refined) orientation. You should see the TFZ-equivalent is high at least for the final components of the solution, and that the LLG (log-likelihood gain) increases as each component of the solution is added. For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU SET RFZ=10.7 TFZ=24.3 PAK=0 LLG=472 TFZ==24.7 RFZ=6.4 TFZ=24.4 PAK=0 LLG=1006 TFZ==29.7 LLG=1006 TFZ==29.7<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
Note that the Euler angles in Phaser follow the same convention as those defined for the Crowther fast rotation function, i.e. z-y-z (rotate around the z-axis, followed by the new y-axis, followed by the new z-axis).<br />
<br />
==History==<br />
<br />
A highly compact summary of the history of the peak positions of a solution is given in the SOLUTION HISTORY in the .sol file. Together with the SOLUTION SET annotation, this is useful in your analysis of the output. <br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! History !! Meaning<br />
|-<br />
| RF/TF(r/t:n) || (r) Rotation Function peak number/(t) Translation Function peak number for the rotation function : (n) number of peak in final merged and sorted list<br />
|-<br />
| PAK(n:m) || (n) input solution number : (m) output solution number after packing condition applied<br />
|-<br />
| RNP(m,a,b,c,... : p) || All input peaks amalgamated after refinement to give output solution number (m and others): (p) output solution number<br />
|-<br />
| FUSE(A,B,C) || Solution numbers merged in amalgamation<br />
|} <br />
<br />
For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU HISTORY RF/TF(1/1:1)PAK(1:1)RNP(1:1)RNP(1:1)<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
A more complicated structure solution may have<br />
<br />
SOLU HISTORY RF/TF(7/1:10)PAK(10:10)RNP(10,12,13,11,17,16,18,25,3,8,22,21,20,7,969,6,5,201,9,4,390,2,1,19:1)RNP(1:1)<br />
<br />
==What to do in Difficult Cases==<br />
<br />
Not every structure can be solved by molecular replacement, but the right strategy can push the limits. What to do when the default jobs fail depends on why your structure is difficult.<br />
*'''Flexible Structure'''<br />
*:The relative orientations of the domains may be different in your crystal than in the model. If that may be the case, break the model into separate PDB files containing rigid-body units, enter these as separate ensembles, and search for them separately. If you find a convincing solution for one domain, but fail to find a solution for the next domain, you can take advantage of the knowledge that its orientation is likely to be similar to that of the first domain. The ROTAte&nbsp;AROUnd option of the brute rotation search can be used to restrict the search to orientations within, say, 30 degrees of that of the known domain. Allow for close approach of the domains by increasing the allowed clashes with the PACK keyword by, say, 1 for each domain break that you introduce. Note that it is possible to use the brute rotation search as part of the automated molecular replacement pipeline, by changing the choice of the type of rotation search. Alternatively, you could try generating a series of models perturbed by normal modes, with the NMAPdb keyword. One of these may duplicate the hinge motion and provide a good single model.<br />
*'''Poor or Incomplete Model'''<br />
*:Signal-to-noise is reduced by coordinate errors or incompleteness of the model. Since the rotation search has lower signal to begin with than the translation search, it is usually more severely affected. For this reason, it can be very useful to use the subsequent translation search as a way to choose among many (say 1000) orientations. THe MR_AUTO FAST search mode automatically reduces the cutoff for accepting peaks from the fast rotation function if the decault pass does not find a solution with a high z-score, but you can manually reduce this further with the PEAKS and PURGE keywords. You can also try turning off the clustering of fast rotation function peaks because the correct orientation may sit on the shoulder of a peak in the rotation function. <br />
*:As shown convincingly by Schwarzenbacher ''et al.'' (Schwarzenbacher, Godzik, Grzechnik &amp; Jaroszewski, ''Acta Cryst.'' D'''60''', 1229-1236, 2004), judicious editing can make a significant difference in the quality of a distant model. In a number of tests with their data on models below 30% sequence identity, we have found that Phaser works best with a "mixed model" (non-identical sidechains longer than Ser replaced by Ser). In agreement with their results, the best models are generally derived using more sophisticated alignment protocols, such as their FFAS protocol. Use [http://www.phenix-online.org/documentation/sculptor.htm phenix.sculptor] to edit your model.<br />
*'''High Degree of Non-crystallographic Symmetry'''<br />
*:If there are clear peaks in the self-rotation function, you can expect orientations to be related by this known NCS. Methods to automatically use such information will be implemented in a future version of Phaser. In the meantime, you can work out for yourself the orientations that would be consistent with NCS and use the ROTAte&nbsp;AROUnd option to sample similar orientations. Alternatively, you may have an oligomeric model and expect similar NCS in the crystal. First search with the oligomeric model; if this fails, search with a monomer. If that succeeds, you can again use the ROTAte&nbsp;AROUnd option to force a subsequent monomer to adopt an orientation similar to the one you expect.<br />
*'''What <u>not</u> to do'''<br />
*:The automated mode of Phaser is fast when Phaser finds a high Z-score solution to your problem. When Phaser cannot find a solution with a significant Z-score, it "thrashes", meaning it maintains a list of 100-1000's of low Z-score potential solutions and tries to improve them. This can lead to exceptionally long Phaser runs (over a week of CPU time). Such runs are possible because the highly automated script allows many consecutive MR jobs to be run without you having to manually set 100-1000's of jobs running and keep track of the results. "Thrashing" generally does not produce a solution: solutions generally appear relatively quickly or not at all. It is more useful to go back and analyse your models and your data to see where improvements can be made. Your system manager will appreciate you terminating these jobs.<br />
*:It is also not a good idea to effectively remove the packing test. Unless there is specific evidence in the logfile that a high TF-function Z-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond a few (e.g. 1-5) will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
*'''Other suggestions'''<br />
*:Phaser has powerful input, output and scripting facilities that allow a large number of possibilities for altering default behaviour and forcing Phaser to do what you think it should. However, you will need to read the information in the manual below to take advantage of these facilities!<br />
<br />
==How to Define Data==<br />
You need to tell Phaser the name of the mtz file containing your data and the columns in the mtz file to be used using the HKLIn and LABIn keywords. Additional keywords (BINS CELL OUTLier RESOlution SPACegroup) define how the data are used.<br />
<br />
==How to Define Models==<br />
Phaser must be given the models that it will use for molecular replacement. A model in Phaser is referred to as an "ensemble", even when it is described by a single file. This is because it is possible to provide a set of aligned structures as an ensemble, from which a statistically-weighted averaged model is calculated. A molecular replacement model is provided either as one or more aligned pdb files, or as an electron density map, entered as structure factors in an mtz file. Each ensemble is treated as a separate type of rigid body to be placed in the molecular replacement solution. An ensemble should only be defined once, even if there are several copies of the molecule in the asymmetric unit.<br />
<br />
Fundamental to the way in which Phaser uses MR models (either from coordinates or maps) is to estimate how the accuracy of the model falls off as a function of resolution, represented by the Sigma(A) curve. To generate the Sigma(A) curve, Phaser needs to know the RMS coordinate error expected for the model and the fraction of the scattering power in the asymmetric unit that this model contributes.<br />
<br />
A Babinet-style correction is used to account for the effects of disordered solvent on the completeness of the model at low resolution.<br />
<br />
Molecular replacement models are defined with the ENSEmble keyword and the COMPosition keyword. The ENSEmble keyword gives (amongst other things) the RMS deviation for the Sigma(A) curve. The COMPosition keyword is used to deduce the fraction of the scattering power in the asymmetric unit that each ensemble contributes. The composition of the asymmetric unit is defined either by entering the molecular weights or sequences of the components in the asymmetric unit, and giving the number of copies of each. Expert users can also enter the fraction of the scattering of each component directly, although the composition must still be entered for the absolute scale calculation. Please note that the composition supplied to Phaser has to include everything in the asymmetric unit, not just what is being looked for in the current search!<br />
<br />
===Building an Ensemble from Coordinates===<br />
The RMS deviation is determined directly from RMS or indirectly from IDENtity in the ENSEmble<br />
keyword using a formula that depends on the sequence identity and the number of residues in the model.<br />
<br />
The RMS deviation estimated from ID may be an underestimate of the true value if there is a slight conformational change between the model and target structures. To find a solution in these cases it may be necessary to increase the RMS from the default value generated from the ID, by say 0.5 Angstroms. On the other hand, when Phaser succeeds in solving a structure from a model with sequence identity much below 30%, it is often found that the fold is preserved better than the average for that level of sequence identity. So it may be worth submitting a run in which the RMS error is set at, say, 1.5, even if the sequence identity is low. The table below can be used as a guide as to the default RMS value corresponding to ID.<br />
<br />
If you construct a model by homology modelling, remember that the RMS error you expect is essentially the error you expect from the template structure (if not worse!). So specify the sequence identity of the template, not of the homology model.<br />
<br />
Only the model with the highest sequence identity is reported in the output pdb file. Also, HETATM cards in the input pdb file are ignored in the calculation of the structure factors for the ensemble, but are carried through to the output pdb file. Thus, the phases on the output mtz file (which come from the structure factors of the ensemble) do not correspond to those that would be calculated from the output pdb file, when there is more than one pdb file in an ensemble and/or the pdbfile(s) have HETATM records.<br />
<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|+ '''Initial estimate of RMS deviation in Angstrom: Number of residues in model (upper row) versus sequence identity (left column)'''<br />
|-<br />
! !! #50 !! #100 !! #200 !! #300 !! #400 !! #600 !! #850 !! #1000 !! #1500 !! #2000<br />
|-<br />
|'''ID=0%''' || 1.579 || 1.689 || 1.875 || 2.030 || 2.164 || 2.391 || 2.625 || 2.748 || 3.093 || 3.375<br />
|-<br />
|'''ID=10%''' || 1.356 || 1.451 || 1.610 || 1.743 || 1.858 || 2.053 || 2.255 || 2.360 || 2.657 || 2.899<br />
|-<br />
|'''ID=20%''' || 1.165 || 1.246 || 1.383 || 1.497 || 1.596 || 1.764 || 1.936 || 2.027 || 2.281 || 2.489<br />
|-<br />
|'''ID=30%''' || 1.000 || 1.070 || 1.188 || 1.286 || 1.371 || 1.515 || 1.663 || 1.741 || 1.959 || 2.138<br />
|-<br />
|'''ID=40%''' || 0.859 || 0.919 || 1.020 || 1.104 || 1.177 || 1.301 || 1.428 || 1.495 || 1.683 || 1.836<br />
|-<br />
|'''ID=50%''' || 0.738 || 0.789 || 0.876 || 0.948 || 1.011 || 1.117 || 1.227 || 1.284 || 1.445 || 1.577<br />
|-<br />
|'''ID=60%''' || 0.634 || 0.678 || 0.752 || 0.814 || 0.868 || 0.959 || 1.053 || 1.103 || 1.241 || 1.354<br />
|-<br />
|'''ID=70%''' || 0.544 || 0.582 || 0.646 || 0.699 || 0.746 || 0.824 || 0.905 || 0.947 || 1.066 || 1.163<br />
|-<br />
|'''ID=80%''' || 0.467 || 0.500 || 0.555 || 0.601 || 0.640 || 0.708 || 0.777 || 0.813 || 0.915 || 0.999<br />
|-<br />
|'''ID=90%''' || 0.401 || 0.429 || 0.477 || 0.516 || 0.550 || 0.608 || 0.667 || 0.698 || 0.786 || 0.858<br />
|-<br />
|'''ID=100%''' || 0.345 || 0.369 || 0.409 || 0.443 || 0.472 || 0.522 || 0.573 || 0.600 || 0.675 || 0.737<br />
|}<br />
<br />
<br />
====Coordinate Editing====<br />
=====HETATM/LIGANDS=====<br />
Phaser ignores the scattering from HETATM records. The HETATM records are carried though to output with occupancy set to zero. Ligands will therefore not contribute to the scattering used for molecular replacement. The exceptions to this rule are the HETATM records for MSE (seleno-methionine) MSO (seleno-methionine selenoxide) CSE (seleno-cysteine) CSO (seleno-cysteine selenoxide) ALY (acetyllysine) MLY (n-dimethyl-lysine) and MLZ (n-methyl-lysine) which are used in the scattering and carried through to output with their original occupancy. If you wish to include any HETATM records in the scattering the record name use the keyword ENSE modlid HETATOM ON<br />
<br />
=====WATER=====<br />
Water molecules (identified by the residue name OW WAT HOH H2O OH2 MOH WTR or TIP) are deleted from the pdb file on input, are not used in the scattering and are not carried through to file output. If you want to retain water molecules you will need to change the residue name to something other than this (e.g. WWW) so that the atoms are not identified as water. To include the water molecules in the scattering, the HETATM records will also have to be changed to ATOM records as described above.<br />
<br />
===Building an Ensemble from Electron Density===<br />
When using density as a model, it is necessary to specify both the extent (x,y,z limits) of the cut-out region of density, and the centre of this region. With coordinates, Phaser can work this out by itself. This information is needed, for instance, to decide how large rotational steps can be in the rotation search and to carry out the molecular transform interpolation correctly. In the case of electron density, the RMS value does not have the same physical meaning that it has when the model is specified by atomic coordinates, but it is used to judge how the accuracy of the calculated structure factors drops off with resolution. A suitable value for RMS can be obtained, in the case of density from an experimentally-phased map, by choosing a value that makes the SigmaA curve fall off with resolution similarly to the mean figures-of-merit. In the case of density from an EM image reconstruction, the RMS value should make the SigmaA curve fall off similarly to a Fourier correlation curve used to judge the resolution of the EM image.<br />
<br />
For detailed information, including a tutorial with example scripts, see<br />
[[Using Electron Density as a Model| Using density as a model]]<br />
<br />
==How to Define Composition==<br />
The composition defines the total amount of protein and nucleic acid that you have in the asymmetric unit. It is very important to include everything in the composition, not just the components that you are searching for, because Phaser needs to know what fraction of the total scattering is accounted for by each model. For the options that specify the size of a particular component (sequence, number of residues, molecular weight), you can separately define several components of the composition of the asymmetric unit and Phaser will just add them up. Note that, for these options, you can specify the composition of one copy of a component and also say how many copies of that component are expected to be present. You can also mix compositions entered by sequence, number of residues and molecular weight. When the composition is checked, Phaser will check for the plausibility of the composition you have specified, as well as multiples of that composition.<br />
<br />
===Default Composition===<br />
For convenience, the composition defaults to 50% protein scattering by volume (the average for protein crystals). It is better to enter it explicitly, even if only to check that you have correctly deduced the probable content of your crystal. If your crystal has higher or lower solvent content than this, or contains nucleic acid, then the composition should be entered explicitly.<br />
===Composition by Sequence===<br />
The composition is calculated from the amino acid sequence of the protein and the base sequence of the nucleic acid in fasta format.<br />
===Composition by Atom===<br />
Individual atoms can be added to the composition. This allows the explicit addition of heavy atoms in the structure, e.g. Fe atoms.<br />
===Composition by Solvent Content===<br />
Scattering is determined from the solvent content of the crystal, assuming that the crystal contains protein only, and the average distribution of amino acids in protein. If your crystal contains nucleic acid or your protein has an unusual amino acid distribution then the composition should be entered explicitly using the MW or sequence options.<br />
===Composition by Number of Residues in ASU===<br />
Scattering is determined from the number of residues in the asymmetric unit, assuming that the crystal contains protein only or nucleic acid only, and assuming an average distribution of residues for either. If your crystal contains a mixture then the composition should be entered explicitly using the MW or sequence options. If your crystal has an unusual residue distribution then the composition should be entered explicitly using the sequence options.<br />
===Composition by Molecular Weight===<br />
The composition is calculated from the molecular weight of the protein and nucleic acid assuming the protein and nucleic acid have the average distribution of amino acids and bases. If your protein or nucleic acid has an unusual amino acid or base distribution the composition should be entered by sequence. You can mix compositions entered by molecular weight with those entered by sequence.<br />
===Composition by Percentage Scattering===<br />
The fraction scattering of each ensemble can be entered directly. The fraction scattering of each ensemble is normally automatically worked out from the average scattering from each ensemble (calculated from the pdb files if entered as coordinates, or from the protein and nucleic acid molecular weights if entered as a map) divided by the total scattering given by the composition, but entering the fraction scattering directly overrides this calculation. This option is for use when the pdb files of the models in the ensemble are unusual e.g. consist only of C-alpha atoms, or only of hydrogen atoms (as in the CLOUDS method for NMR).<br />
<br />
==How to Define Searches==<br />
Phaser does not compare sequences you specify in the composition with the models you specify as ensembles, so you have to specify separately the number of copies of a particular sequence that you expect to be found in the asymmetric unit of your crystal and the number of copies of each ensemble you want to place in the asymmetric unit. By default, Phaser will search first for ensembles expected to yield the highest signal in the MR search (as judged by the expected LLG or eLLG calculation); if that fails to result in a clear solution, different search orders will be tested automatically. For that reason, it does not normally matter which order you use to specify the searches. There is an option to override Phaser's automatic choice of search order, but this will only rarely be useful. It is best to specify the searches for everything that you hope to find in the MR calculation in one job, as that gives Phaser the greatest scope to optimise the calculation. Note that if your crystal possesses translational non-crystallographic symmetry (tNCS), you should be searching for a number of copies of each ensemble divisible by the order of the tNCS (i.e. the number of molecules that should be related by repeated application of a translation vector).<br />
<br />
==How to Define Solutions==<br />
Phaser writes out files ending in ".sol" and ".rlist" that contain the solution information from the job. The root of the files is given by the ROOT keyword. By default, the root filename is PHASER. These files can be read back into subsequent runs of Phaser to build up solutions containing more than one molecule in the asymmetric unit.<br />
<br />
"PHASER.sol" files are generated by all modes (rotation function modes with VERBOSE output), and contain the current idea of potential molecular replacement solutions.<br />
<br />
"PHASER.rlist" files are generated by the rotation function modes, and are used as input for performing translation functions.<br />
<br />
For simple MR cases you don't really need to know how to define molecular replacement solutions. However, for difficult cases you might need to edit the files "PHASER.sol" and "PHASER.rlist" files manually<br />
<br />
=== "sol" Files===<br />
SOLUtion 6DIM keywords describe Ensembles that have been oriented by a rotation search and positioned by a translation search. Each Ensemble in the asymmetric unit has its own SOLUtion keyword. When more than one (potential) molecular replacement solution is present, the solutions are separated with the SOLUTION SET keywords.<br />
<br />
==="rlist" Files===<br />
These files define a rotation function list. The peak list is given with a series of SOLUtion TRIAl keywords.<br />
<br />
If a partial solution is already known, then the information for the currently "known" parts of the asymmetric unit is given in the form used for the PHASER.sol file, followed by the list of trial orientations for which a translation function is to be performed.<br />
<br />
===Fixed partial structure===<br />
If you have the coordinates of a partial solution with the pdb coordinates of the known structure in the correct orientation and position, then you can force Phaser to use these coordinates. Use the SOLUTION keyword to fix a rotation of 0 0 0 and a position of 0 0 0 for these coordinates.<br />
<br />
==How to Select Peaks==<br />
<br />
<br />
<br />
The selection of peaks saved for output in the rotation and translation functions can be done in four different ways.<br />
*'''Select by Percentage'''<br />
*: Percentage of the top peak, where the value of the top peak is defined as 100% and the value of the mean is defined as 0%.<br />
*: Default, cutoff=75%. This criteria has the advantange that at least one peak (the top peak) always survives the selection. If the top solution is clear, then only the one solution will be output, but if the distribution of peaks is rather flat, then many peaks will be output for testing in the next part of the MR procedure (e.g. many peaks selected from the rotation function for testing with a translation function). <br />
*'''Select by Z-score'''<br />
*: Number of standard deviations (sigmas) over the mean (the Z-score). <br />
*: Absolute significance test. Not all searches will produce output if the cutoff value is too high (e.g. 5 sigma). <br />
*'''Select by Number'''<br />
*: Number of top peaks to select. <br />
*: If the distribution is very flat then it might be better to select a fixed large number (e.g. 1000) of top rotation peaks for testing in the translation function.<br />
*'''No selection'''<br />
*: All peaks are selected. <br />
*: Enables full 6 dimensional searches, where all the solutions from the rotation function are output for testing in the translation function. This should never be necessary; it would be much faster and probably just as likely to work if the top 1000 peaks were used in this way.<br />
<br />
[[Image:Phaser_selection.gif| Selection criteria]]<br />
<br />
Peaks can also be clustered or not clustered prior to selection in steps 1 and 2.<br />
*'''Clustering Off'''<br />
: All high peaks on the search grid are selected<br />
*'''Clustering On'''<br />
: Points on the search grid with higher neighbouring points are removed from the selection<br />
<br />
<br />
[[Image:Phaser_clustering.gif| Clustering]]<br />
<br />
==How to Control Output==<br />
The output of Phaser can be controlled with optional keywords. <br />
<br />
The ROOT keyword is not compulsory (the default root filename is "PHASER"), but should always be given, so that your jobs have separate and meaningful output filenames.<br />
<br />
The TOPFiles keyword controls the number of potential MR solutions for which PDB and (in the appropriate modes) MTZ files are produced.<br />
<br />
For the MR_AUTO, MR_RNP and MR_LLG modes, unless HKLOut OFF is given as an optional keyword, Phaser produces an MTZ file with "SigmaA" type weighted Fourier map coefficients for producing electron density maps for rebuilding.<br />
<br />
{| class="wikitable" style="text-align:left" width=100%<br />
|-<br />
! MTZ Column Labels !! Description<br />
|-<br />
| FWT/PHWT || Amplitude and phase for 2''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| DELFWT/PHDELWT || Amplitude and phase for ''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| FOM || ''m'', analogous to the "Sim" weight, to estimate the reliability of &alpha;<sub>calc</sub><br />
|-<br />
| HLA/HLB/HLC/HLD || Hendrickson-Lattman coefficients encoding the phase probability distribution<br />
|}<br />
<br />
==Translational Non-crystallographic Symmetry==<br />
<br />
<span style="color:crimson">'''*Warning*''' Solution by MR in the presence of translational non-crystallographic symmetry is not fully automated.</span><br />
<br />
Phaser calculates correction factors for the expected intensities in the presence of translational non-crystallographic symmetry (tNCS), and is able to solve structures with complex patterns of tNCS. '''However, the use of Phaser in the presence of tNCS requires the nature of the tNCS to be understood by the user.''' In simple cases, solution is no more difficult than solution without tNCS, but in complex cases, separate Phaser runs with tNCS turned on and off, and/or the use of different tNCS vectors, may be necessary.<br />
<br />
The output of Phaser will help the user in detecting and understanding the tNCS, but '''the tNCS is not completely characterised by Phaser'''. The default behaviour may or may not be correct for the particular crystal under study.<br />
<br />
Characterization of the tNCS involves understanding the number of copies of the molecule in the asymmetric unit and the translation vectors between them. Molecules related by a tNCS vector will have an associated peak in the native Patterson. Phaser calculates the native Patterson (MODE TNCS) and lists the peaks that are more than 20% of the origin peak. Any given crystal with tNCS may have one or more peaks meeting this criteria.<br />
<br />
===Default tNCS detection and correction===<br />
<span style="color:crimson">Documentation for Phaser-2.7.16 and above</span><br />
<br />
====No tNCS====<br />
No tNCS correction is applied by default if there is<br />
# no peak in the native Patterson <br />
# more than one peak in the native Patterson over 20% of the origin and these peaks are not all the result of a commensurate modulation<br />
<br />
====Pairs of molecules====<br />
By default, if Phaser detects a peak in the native Patterson then Phaser will search for molecules in pairs related by the tNCS vector given by the peak in the native Patterson.<br />
<br />
This will be the correct behaviour if and only if there are an even number of copies of the molecule in the asymmetric unit, clustered into two groups related by a single tNCS vector. There will only be one significant peak in the native Patterson. Fortunately, this is a reasonably common scenario.<br />
<br />
Phaser refines the relative orientation of the molecules in the two groups (rotations of up to 10 degrees will still give rise to a significant native Patterson peak) and uses this information to generate expected intensity factors for the reflections. Solution should be straightforward, with the usual caveat for MR that there is a sufficiently good model.<br />
<br />
Where there is a single peak in the native Patterson, it is often located at a position half way along a unit cell axis or diagonal, representing a pseudo-halving of the unit cell dimensions. However, Phaser is by no means restricted to these sorts of pseudo-cells in its handling of two-fold tNCS, and the tNCS vector can be in a general position.<br />
<br />
===Non-default tNCS correction===<br />
====Higher order tNCS====<br />
Frequently, tNCS does not associate 2 clusters of molecules in the asymmetric unit, but rather there are 3 or more (n) clusters of molecules associated by a series of vectors that are multiples of 1, 2, 3 ... (n-1) times a basic translation vector. Where n times the basic translation vector equates to (very close to) integer multiples of unit cell axes, the tNCS represents a pseudo-cell, and this case is known as commensurate modulation. <br />
<br />
Phaser attempts to automatically detect commensurate modulation. The peaks of the native Patterson are analyzed to find the n-fold relationship. The series will not generally have all peaks the same height. Lower peaks in the series represent relationships where the relative rotations between related molecules are larger. Missing peaks in the series may be below the default 20% of origin cut-off. This can be lowered with TNCS PATT PERCENT <x><br />
<br />
Phaser then sets TNCS NMOL <n> and the vector for the tNCS, and searches for ensembles in multiples of NMOL.<br />
<br />
When there are more than two molecules related by tNCS, Phaser does not refine the orientations between the molecules related by the tNCS.<br />
<br />
However, as for two-fold tNCS, Phaser is not restricted to these sorts of pseudo-cells and the basic tNCS vector can be in a general position, as can the number of copies.<br />
<br />
'''The automatic detection may not give the true tNCS relationship'''. For example, the true commensurate modulation may be a factor of the NMOL automatically detected by Phaser, or there may not be commensurate modulation at all, or commensurate modulation may not be found with the default Pattesron peak height cutoff. In difficult cases, please inspect the Patterson for peaks.<br />
<br />
====Complex tNCS====<br />
If there are many molecules in the asymmetric unit but they are not all related by tNCS, or there are sub-groups of molecules related by different tNCS vectors, then the modulations of the expected intensities due to the tNCS will be much less significant than the cases described above. '''In these cases it is possible that structure solution will be achieved without any tNCS correction factors being applied.''' Indeed, searching for all the copies as tNCS-related multiples when some molecules are not related by tNCS will cause structure solution to fail. To turn off the automatic detection and use of tNCS use the keyword TNCS USE OFF.<br />
<br />
If turning off the TNCS correction factors fails to give a solution, then a good approach is to proceed step-wise. Consider the highest native Patterson peak first and determine that nature of the tNCS associated with it. Use the appropriate correction factors to locate all the molecules with this tNCS. Then take the second independent native Patterson peak and apply the correction factors associated with it to find the second set of molecules, fixing the first, etc. Finally, turn TNCS off to find any orphan molecules.</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Molecular_Replacement&diff=2470Molecular Replacement2018-07-04T10:32:25Z<p>Rdo20: /* Automated Molecular Replacement */</p>
<hr />
<div><div style="margin-left: 25px; float: right;">__TOC__</div><br />
<br />
'''Quicklink to example scripts''' -> [[MR using keyword input]]<br />
<br />
'''Quicklink to phaser.famos (find_alt_orig_sym_mate) documentation''' -> [[Famos]]<br />
<br />
Phaser should be able to solve most structures with the Automated Molecular Replacement mode, and this is the first mode that you should try. Give Phaser your data ([[#How to Define Data|How to Define Data]]) and your models ([[#How to Define Models|How to Define Models]]), tell Phaser what to search for, and a list of possible spacegroups (in the same point group).<br />
<br />
If this doesn't work (see [[#Has Phaser Solved It?| Has Phaser Solved It?]]), you can try selecting peaks of lower significance in the rotation function in case the real orientation was not within the selection criteria. By default peaks above 75% of the top peak are selected (see [[#How to Select Peaks| How to Select Peaks]]). See [[#What to do in Difficult Cases| What to do in Difficult Cases]] for more hints and tips. If the automated molecular replacement mode doesn't work even with non-default input you need to run the modes of Phaser separately. The possibilities are endless - you can even try exhaustive searches (translations of all orientations) if you want - but experience has shown that most structures that can be solved by Phaser can be solved by relatively simple strategies.<br />
<br />
==Automated Molecular Replacement==<br />
Automated Molecular Replacement combines the anisotropy correction, likelihood enhanced fast rotation function, likelihood enhanced fast translation function, packing and refinement modes for multiple search models and a set of possible spacegroups to automatically solve a structure by molecular replacement. Top solutions are output to the files FILEROOT.sol, FILEROOT.#.mtz and FILEROOT.#.pdb (where "#" refers to the sorted solution number, 1 being the best, and only 1 is output by default). Many structures can be solved by running an automated molecular replacement search with defaults, giving the ensembles that you expect to be easiest to find first.<br />
<br />
At the completion of Molecular Replacement you may wish to place your solutions on a common origin with a previous solution, for which [[Famos | Famos ]] can be used.<br />
<br />
[[Image:Phaser_MR_auto2.png|Flow Diagram for Automated MR|800px]]<br />
<br />
==Should Phaser Solve It?==<br />
The difficulty of a molecular replacement problem depends primarily on two major factors: how well the model will be able to explain the diffraction data (which depends both on the accuracy of the model and on its completeness), and how many reflections can be explained, at least in part. Each reflection provides a piece of information that helps to identify correct MR solutions.<br />
<br />
It is possible to make a reasonable prediction of whether or not a solution will be found. If the quality of the model (its accuracy and completeness) can be estimated, then the expected contribution of each reflection to the total LLG can also be estimated. From a large battery of tests, we know that an LLG of 40 or greater usually indicates a correct solution (at least in the absence of complicating factors such as translational non-crystallographic symmetry, tNCS). Building on this understanding, if it is estimated that the LLG will be 60 or less, then Phaser will assume that the problem is a difficult one, and will implement search procedures optimised for difficult problems.<br />
<br />
==What Resolution of Data Should be Used?==<br />
The signal for a molecular replacement solution should be very clear if the expected value of the LLG is much higher than the minimum required to be fairly certain of a solution. Currently Phaser aims for a minimum LLG of 120 and, if it is possible to achieve an even higher value, given the quality of the model and the quantity of diffraction data, then the resolution for the initial search is limited to the value required to achieve an expected LLG of 120. Data to the full resolution are still used for a final rigid-body refinement, or in a second pass if a clear solution is not found in the first attempt.<br />
<br />
However, if the model is expected to have a large RMS error (based usually on the correlation between sequence identity and RMS error), then data to high resolution will not contribute any significant signal. Regardless of the expected LLG at the highest resolution limit, the resolution used is limited to 1.8 times the estimated RMS error of the model, because this resolution limit gives about 99% of the LLG that could be achieved.<br />
<br />
Because Phaser implements strategies designed to solve structures with as much confidence as possible, as efficiently as possible, it is best to leave the choice of resolution to Phaser, at least in the first instance.<br />
<br />
==Has Phaser Solved It?==<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! TF Z-score !! Have I solved it?<br />
|-<br />
| less than 5 || no<br />
|-<br />
| 5 - 6 || unlikely<br />
|-<br />
| 6 - 7 || possibly<br />
|-<br />
| 7 - 8 || probably<br />
|-<br />
| more than 8* ||definitely<br />
|-<br />
| *''6 for 1st model in monoclinic space groups'' || <br />
|} <br />
<br />
Ideally, a unique solution with a strong signal will be found at the end of the search. If you are searching for multiple components, then ideally the search for each component will also give a strong signal. However if the signal-to-noise of your search is low, there will be noise peaks and multiple ambiguous solutions. Signal-to-noise is judged using the '''Z-score''', which is computed by comparing the LLG values from the rotation or translation search with LLG values for a set of random rotations or translations. The mean and the RMS deviation from the mean are computed from the random set, then the Z-score for a search peak is defined as its LLG minus the mean, all divided by the RMS deviation, ''i.e. '' '''the number of standard deviations above (or below) the mean. '''<br />
<br />
For a rotation function, the correct orientation may be well down the list with a Z-score (number of standard deviations above the mean value, or RFZ) under 4, and it is often not possible to identify the correct orientation until a translation function is performed and yields a clear solution. Note that the signal-to-noise of the rotation function drops with increasing number of primitive symmetry operations (the number of different orientations for symmetry-related molecules), because there is more uncertainty about how the structure factor contributions from symmetry-related copies will add up.<br />
<br />
For a translation function the correct solution will generally have a Z-score (TFZ) over 5 and be well separated from the rest of the solutions. Of course, there will always be exceptions! The table gives a very rough guide to interpreting TFZ scores. This table will be updated, as we learn more from systematic molecular replacement trials.<br />
<br />
When you are searching for multiple components, the signal may be low for the first few components but, as the model becomes more complete, the signal should become stronger. Finding a clear solution for a new component is a good sign that the partial solution to which that component was added was indeed correct.<br />
<br />
You should always at least glance through the summary of the logfile. One thing to look for, in particular, is whether any translation solutions with a high Z-score have been rejected by the packing step. By default up to 5 percent of marker atoms (C-alpha atoms for protein) are allowed to be involved in clashes. A solution with more clashes may still be correct, and the clashes may arise only because of differences in small surface loops. If this happens, repeat the run allowing a suitable number of clashes. Note that, unless there is specific evidence in the logfile that a high TFZ-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond the default will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
<br />
Note that, by default, Phaser will produce a single PDB file corresponding to the top solution found (if any), so finding a single PDB file in your output directory is not an indication that the search succeeded! You have to look, at least, at the summary of the logfile, or at the list of possible solutions in the .sol file that is produced if you run Phaser from ccp4i or command-line scripts.<br />
<br />
==Annotation==<br />
<br />
A highly compact summary of the history of the statistics of a solution is given in the SOLUTION SET in the .sol file. This is a good place to start your analysis of the output. The annotation gives the Z-score of the solution at each rotation and translation function, the number of clashes in the packing, and the refined LLG.<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! Annotation !! Meaning<br />
|-<br />
| RFZ= || Rotation Function Z-score<br />
|-<br />
| TFZ= || Translation Function Z-score<br />
|-<br />
| PAK= || Number of packing clashes<br />
|-<br />
| LLG= || LLG after refinement. Will be repeated when a low resolution refinement is followed by a high resolution refinement.<br />
|-<br />
| TFZ== || Translation Function Z-score equivalent, only calculated for the top solution after refinement (or for the number of top files specified by TOPFILES)<br />
|-<br />
| RF++ || Rotation angle from previous strong solution has been used in the addition of next solution<br />
|-<br />
| RF*0 || Rotation angle 000 identified by low R-factor of input model<br />
|-<br />
| TFZ=* || First molecule in P1 (arbitrary origin, no Translation Function required)<br />
|-<br />
| TF*0 || Translation vector 000 identified by low R-factor of input model<br />
|-<br />
| (&&nbsp;... & ...) || Set of TFZ PAK and LLG values for placements that were amalgamated (more than one placement from a single Translation Function)<br />
|-<br />
| LLG+=(...&nbsp;&&nbsp;...)&nbsp;|| Set of LLG values calculated during amalgamation, which will always be increasing in value<br />
|-<br />
| +TNCS || Components added by Translational NCS relation<br />
|-<br />
| *T=<i>n</i> || Solution matches template solution <i>n</i><br />
|} <br />
<br />
Two versions of TFZ (the translation function Z-score) now appear for each component. The first ("TFZ=") is the Z-score from the actual translation search, which depends on the accuracy of the orientation used for that search. The second ("TFZ==") is the TFZ-equivalent, which indicates what the TFZ score would have been with the correct (refined) orientation. You should see the TFZ-equivalent is high at least for the final components of the solution, and that the LLG (log-likelihood gain) increases as each component of the solution is added. For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU SET RFZ=10.7 TFZ=24.3 PAK=0 LLG=472 TFZ==24.7 RFZ=6.4 TFZ=24.4 PAK=0 LLG=1006 TFZ==29.7 LLG=1006 TFZ==29.7<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
Note that the Euler angles in Phaser follow the same convention as those defined for the Crowther fast rotation function, i.e. z-y-z (rotate around the z-axis, followed by the new y-axis, followed by the new z-axis).<br />
<br />
==History==<br />
<br />
A highly compact summary of the history of the peak positions of a solution is given in the SOLUTION HISTORY in the .sol file. Together with the SOLUTION SET annotation, this is useful in your analysis of the output. <br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! History !! Meaning<br />
|-<br />
| RF/TF(r/t:n) || (r) Rotation Function peak number/(t) Translation Function peak number for the rotation function : (n) number of peak in final merged and sorted list<br />
|-<br />
| PAK(n:m) || (n) input solution number : (m) output solution number after packing condition applied<br />
|-<br />
| RNP(m,a,b,c,... : p) || All input peaks amalgamated after refinement to give output solution number (m and others): (p) output solution number<br />
|-<br />
| FUSE(A,B,C) || Solution numbers merged in amalgamation<br />
|} <br />
<br />
For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU HISTORY RF/TF(1/1:1)PAK(1:1)RNP(1:1)RNP(1:1)<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
A more complicated structure solution may have<br />
<br />
SOLU HISTORY RF/TF(7/1:10)PAK(10:10)RNP(10,12,13,11,17,16,18,25,3,8,22,21,20,7,969,6,5,201,9,4,390,2,1,19:1)RNP(1:1)<br />
<br />
==What to do in Difficult Cases==<br />
<br />
Not every structure can be solved by molecular replacement, but the right strategy can push the limits. What to do when the default jobs fail depends on why your structure is difficult.<br />
*'''Flexible Structure'''<br />
*:The relative orientations of the domains may be different in your crystal than in the model. If that may be the case, break the model into separate PDB files containing rigid-body units, enter these as separate ensembles, and search for them separately. If you find a convincing solution for one domain, but fail to find a solution for the next domain, you can take advantage of the knowledge that its orientation is likely to be similar to that of the first domain. The ROTAte&nbsp;AROUnd option of the brute rotation search can be used to restrict the search to orientations within, say, 30 degrees of that of the known domain. Allow for close approach of the domains by increasing the allowed clashes with the PACK keyword by, say, 1 for each domain break that you introduce. Note that it is possible to use the brute rotation search as part of the automated molecular replacement pipeline, by changing the choice of the type of rotation search. Alternatively, you could try generating a series of models perturbed by normal modes, with the NMAPdb keyword. One of these may duplicate the hinge motion and provide a good single model.<br />
*'''Poor or Incomplete Model'''<br />
*:Signal-to-noise is reduced by coordinate errors or incompleteness of the model. Since the rotation search has lower signal to begin with than the translation search, it is usually more severely affected. For this reason, it can be very useful to use the subsequent translation search as a way to choose among many (say 1000) orientations. THe MR_AUTO FAST search mode automatically reduces the cutoff for accepting peaks from the fast rotation function if the decault pass does not find a solution with a high z-score, but you can manually reduce this further with the PEAKS and PURGE keywords. You can also try turning off the clustering of fast rotation function peaks because the correct orientation may sit on the shoulder of a peak in the rotation function. <br />
*:As shown convincingly by Schwarzenbacher ''et al.'' (Schwarzenbacher, Godzik, Grzechnik &amp; Jaroszewski, ''Acta Cryst.'' D'''60''', 1229-1236, 2004), judicious editing can make a significant difference in the quality of a distant model. In a number of tests with their data on models below 30% sequence identity, we have found that Phaser works best with a "mixed model" (non-identical sidechains longer than Ser replaced by Ser). In agreement with their results, the best models are generally derived using more sophisticated alignment protocols, such as their FFAS protocol. Use [http://www.phenix-online.org/documentation/sculptor.htm phenix.sculptor] to edit your model.<br />
*'''High Degree of Non-crystallographic Symmetry'''<br />
*:If there are clear peaks in the self-rotation function, you can expect orientations to be related by this known NCS. Methods to automatically use such information will be implemented in a future version of Phaser. In the meantime, you can work out for yourself the orientations that would be consistent with NCS and use the ROTAte&nbsp;AROUnd option to sample similar orientations. Alternatively, you may have an oligomeric model and expect similar NCS in the crystal. First search with the oligomeric model; if this fails, search with a monomer. If that succeeds, you can again use the ROTAte&nbsp;AROUnd option to force a subsequent monomer to adopt an orientation similar to the one you expect.<br />
*'''What <u>not</u> to do'''<br />
*:The automated mode of Phaser is fast when Phaser finds a high Z-score solution to your problem. When Phaser cannot find a solution with a significant Z-score, it "thrashes", meaning it maintains a list of 100-1000's of low Z-score potential solutions and tries to improve them. This can lead to exceptionally long Phaser runs (over a week of CPU time). Such runs are possible because the highly automated script allows many consecutive MR jobs to be run without you having to manually set 100-1000's of jobs running and keep track of the results. "Thrashing" generally does not produce a solution: solutions generally appear relatively quickly or not at all. It is more useful to go back and analyse your models and your data to see where improvements can be made. Your system manager will appreciate you terminating these jobs.<br />
*:It is also not a good idea to effectively remove the packing test. Unless there is specific evidence in the logfile that a high TF-function Z-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond a few (e.g. 1-5) will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
*'''Other suggestions'''<br />
*:Phaser has powerful input, output and scripting facilities that allow a large number of possibilities for altering default behaviour and forcing Phaser to do what you think it should. However, you will need to read the information in the manual below to take advantage of these facilities!<br />
<br />
==How to Define Data==<br />
You need to tell Phaser the name of the mtz file containing your data and the columns in the mtz file to be used using the HKLIn and LABIn keywords. Additional keywords (BINS CELL OUTLier RESOlution SPACegroup) define how the data are used.<br />
<br />
==How to Define Models==<br />
Phaser must be given the models that it will use for molecular replacement. A model in Phaser is referred to as an "ensemble", even when it is described by a single file. This is because it is possible to provide a set of aligned structures as an ensemble, from which a statistically-weighted averaged model is calculated. A molecular replacement model is provided either as one or more aligned pdb files, or as an electron density map, entered as structure factors in an mtz file. Each ensemble is treated as a separate type of rigid body to be placed in the molecular replacement solution. An ensemble should only be defined once, even if there are several copies of the molecule in the asymmetric unit.<br />
<br />
Fundamental to the way in which Phaser uses MR models (either from coordinates or maps) is to estimate how the accuracy of the model falls off as a function of resolution, represented by the Sigma(A) curve. To generate the Sigma(A) curve, Phaser needs to know the RMS coordinate error expected for the model and the fraction of the scattering power in the asymmetric unit that this model contributes.<br />
<br />
A Babinet-style correction is used to account for the effects of disordered solvent on the completeness of the model at low resolution.<br />
<br />
Molecular replacement models are defined with the ENSEmble keyword and the COMPosition keyword. The ENSEmble keyword gives (amongst other things) the RMS deviation for the Sigma(A) curve. The COMPosition keyword is used to deduce the fraction of the scattering power in the asymmetric unit that each ensemble contributes. The composition of the asymmetric unit is defined either by entering the molecular weights or sequences of the components in the asymmetric unit, and giving the number of copies of each. Expert users can also enter the fraction of the scattering of each component directly, although the composition must still be entered for the absolute scale calculation. Please note that the composition supplied to Phaser has to include everything in the asymmetric unit, not just what is being looked for in the current search!<br />
<br />
===Building an Ensemble from Coordinates===<br />
The RMS deviation is determined directly from RMS or indirectly from IDENtity in the ENSEmble<br />
keyword using a formula that depends on the sequence identity and the number of residues in the model.<br />
<br />
The RMS deviation estimated from ID may be an underestimate of the true value if there is a slight conformational change between the model and target structures. To find a solution in these cases it may be necessary to increase the RMS from the default value generated from the ID, by say 0.5 Angstroms. On the other hand, when Phaser succeeds in solving a structure from a model with sequence identity much below 30%, it is often found that the fold is preserved better than the average for that level of sequence identity. So it may be worth submitting a run in which the RMS error is set at, say, 1.5, even if the sequence identity is low. The table below can be used as a guide as to the default RMS value corresponding to ID.<br />
<br />
If you construct a model by homology modelling, remember that the RMS error you expect is essentially the error you expect from the template structure (if not worse!). So specify the sequence identity of the template, not of the homology model.<br />
<br />
Only the model with the highest sequence identity is reported in the output pdb file. Also, HETATM cards in the input pdb file are ignored in the calculation of the structure factors for the ensemble, but are carried through to the output pdb file. Thus, the phases on the output mtz file (which come from the structure factors of the ensemble) do not correspond to those that would be calculated from the output pdb file, when there is more than one pdb file in an ensemble and/or the pdbfile(s) have HETATM records.<br />
<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|+ '''Initial estimate of RMS deviation in Angstrom: Number of residues in model (upper row) versus sequence identity (left column)'''<br />
|-<br />
! !! #50 !! #100 !! #200 !! #300 !! #400 !! #600 !! #850 !! #1000 !! #1500 !! #2000<br />
|-<br />
|'''ID=0%''' || 1.579 || 1.689 || 1.875 || 2.030 || 2.164 || 2.391 || 2.625 || 2.748 || 3.093 || 3.375<br />
|-<br />
|'''ID=10%''' || 1.356 || 1.451 || 1.610 || 1.743 || 1.858 || 2.053 || 2.255 || 2.360 || 2.657 || 2.899<br />
|-<br />
|'''ID=20%''' || 1.165 || 1.246 || 1.383 || 1.497 || 1.596 || 1.764 || 1.936 || 2.027 || 2.281 || 2.489<br />
|-<br />
|'''ID=30%''' || 1.000 || 1.070 || 1.188 || 1.286 || 1.371 || 1.515 || 1.663 || 1.741 || 1.959 || 2.138<br />
|-<br />
|'''ID=40%''' || 0.859 || 0.919 || 1.020 || 1.104 || 1.177 || 1.301 || 1.428 || 1.495 || 1.683 || 1.836<br />
|-<br />
|'''ID=50%''' || 0.738 || 0.789 || 0.876 || 0.948 || 1.011 || 1.117 || 1.227 || 1.284 || 1.445 || 1.577<br />
|-<br />
|'''ID=60%''' || 0.634 || 0.678 || 0.752 || 0.814 || 0.868 || 0.959 || 1.053 || 1.103 || 1.241 || 1.354<br />
|-<br />
|'''ID=70%''' || 0.544 || 0.582 || 0.646 || 0.699 || 0.746 || 0.824 || 0.905 || 0.947 || 1.066 || 1.163<br />
|-<br />
|'''ID=80%''' || 0.467 || 0.500 || 0.555 || 0.601 || 0.640 || 0.708 || 0.777 || 0.813 || 0.915 || 0.999<br />
|-<br />
|'''ID=90%''' || 0.401 || 0.429 || 0.477 || 0.516 || 0.550 || 0.608 || 0.667 || 0.698 || 0.786 || 0.858<br />
|-<br />
|'''ID=100%''' || 0.345 || 0.369 || 0.409 || 0.443 || 0.472 || 0.522 || 0.573 || 0.600 || 0.675 || 0.737<br />
|}<br />
<br />
<br />
====Coordinate Editing====<br />
=====HETATM/LIGANDS=====<br />
Phaser ignores the scattering from HETATM records. The HETATM records are carried though to output with occupancy set to zero. Ligands will therefore not contribute to the scattering used for molecular replacement. The exceptions to this rule are the HETATM records for MSE (seleno-methionine) MSO (seleno-methionine selenoxide) CSE (seleno-cysteine) CSO (seleno-cysteine selenoxide) ALY (acetyllysine) MLY (n-dimethyl-lysine) and MLZ (n-methyl-lysine) which are used in the scattering and carried through to output with their original occupancy. If you wish to include any HETATM records in the scattering the record name use the keyword ENSE modlid HETATOM ON<br />
<br />
=====WATER=====<br />
Water molecules (identified by the residue name OW WAT HOH H2O OH2 MOH WTR or TIP) are deleted from the pdb file on input, are not used in the scattering and are not carried through to file output. If you want to retain water molecules you will need to change the residue name to something other than this (e.g. WWW) so that the atoms are not identified as water. To include the water molecules in the scattering, the HETATM records will also have to be changed to ATOM records as described above.<br />
<br />
===Building an Ensemble from Electron Density===<br />
When using density as a model, it is necessary to specify both the extent (x,y,z limits) of the cut-out region of density, and the centre of this region. With coordinates, Phaser can work this out by itself. This information is needed, for instance, to decide how large rotational steps can be in the rotation search and to carry out the molecular transform interpolation correctly. In the case of electron density, the RMS value does not have the same physical meaning that it has when the model is specified by atomic coordinates, but it is used to judge how the accuracy of the calculated structure factors drops off with resolution. A suitable value for RMS can be obtained, in the case of density from an experimentally-phased map, by choosing a value that makes the SigmaA curve fall off with resolution similarly to the mean figures-of-merit. In the case of density from an EM image reconstruction, the RMS value should make the SigmaA curve fall off similarly to a Fourier correlation curve used to judge the resolution of the EM image.<br />
<br />
For detailed information, including a tutorial with example scripts, see<br />
[[Using Electron Density as a Model| Using density as a model]]<br />
<br />
==How to Define Composition==<br />
The composition defines the total amount of protein and nucleic acid that you have in the asymmetric unit. It is very important to include everything in the composition, not just the components that you are searching for, because Phaser needs to know what fraction of the total scattering is accounted for by each model. For the options that specify the size of a particular component (sequence, number of residues, molecular weight), you can separately define several components of the composition of the asymmetric unit and Phaser will just add them up. Note that, for these options, you can specify the composition of one copy of a component and also say how many copies of that component are expected to be present. You can also mix compositions entered by sequence, number of residues and molecular weight. When the composition is checked, Phaser will check for the plausibility of the composition you have specified, as well as multiples of that composition.<br />
<br />
===Default Composition===<br />
For convenience, the composition defaults to 50% protein scattering by volume (the average for protein crystals). It is better to enter it explicitly, even if only to check that you have correctly deduced the probable content of your crystal. If your crystal has higher or lower solvent content than this, or contains nucleic acid, then the composition should be entered explicitly.<br />
===Composition by Sequence===<br />
The composition is calculated from the amino acid sequence of the protein and the base sequence of the nucleic acid in fasta format.<br />
===Composition by Atom===<br />
Individual atoms can be added to the composition. This allows the explicit addition of heavy atoms in the structure, e.g. Fe atoms.<br />
===Composition by Solvent Content===<br />
Scattering is determined from the solvent content of the crystal, assuming that the crystal contains protein only, and the average distribution of amino acids in protein. If your crystal contains nucleic acid or your protein has an unusual amino acid distribution then the composition should be entered explicitly using the MW or sequence options.<br />
===Composition by Number of Residues in ASU===<br />
Scattering is determined from the number of residues in the asymmetric unit, assuming that the crystal contains protein only or nucleic acid only, and assuming an average distribution of residues for either. If your crystal contains a mixture then the composition should be entered explicitly using the MW or sequence options. If your crystal has an unusual residue distribution then the composition should be entered explicitly using the sequence options.<br />
===Composition by Molecular Weight===<br />
The composition is calculated from the molecular weight of the protein and nucleic acid assuming the protein and nucleic acid have the average distribution of amino acids and bases. If your protein or nucleic acid has an unusual amino acid or base distribution the composition should be entered by sequence. You can mix compositions entered by molecular weight with those entered by sequence.<br />
===Composition by Percentage Scattering===<br />
The fraction scattering of each ensemble can be entered directly. The fraction scattering of each ensemble is normally automatically worked out from the average scattering from each ensemble (calculated from the pdb files if entered as coordinates, or from the protein and nucleic acid molecular weights if entered as a map) divided by the total scattering given by the composition, but entering the fraction scattering directly overrides this calculation. This option is for use when the pdb files of the models in the ensemble are unusual e.g. consist only of C-alpha atoms, or only of hydrogen atoms (as in the CLOUDS method for NMR).<br />
<br />
==How to Define Searches==<br />
Phaser does not compare sequences you specify in the composition with the models you specify as ensembles, so you have to specify separately the number of copies of a particular sequence that you expect to be found in the asymmetric unit of your crystal and the number of copies of each ensemble you want to place in the asymmetric unit. By default, Phaser will search first for ensembles expected to yield the highest signal in the MR search (as judged by the expected LLG or eLLG calculation); if that fails to result in a clear solution, different search orders will be tested automatically. For that reason, it does not normally matter which order you use to specify the searches. There is an option to override Phaser's automatic choice of search order, but this will only rarely be useful. It is best to specify the searches for everything that you hope to find in the MR calculation in one job, as that gives Phaser the greatest scope to optimise the calculation. Note that if your crystal possesses translational non-crystallographic symmetry (tNCS), you should be searching for a number of copies of each ensemble divisible by the order of the tNCS (i.e. the number of molecules that should be related by repeated application of a translation vector).<br />
<br />
==How to Define Solutions==<br />
Phaser writes out files ending in ".sol" and ".rlist" that contain the solution information from the job. The root of the files is given by the ROOT keyword. By default, the root filename is PHASER. These files can be read back into subsequent runs of Phaser to build up solutions containing more than one molecule in the asymmetric unit.<br />
<br />
"PHASER.sol" files are generated by all modes (rotation function modes with VERBOSE output), and contain the current idea of potential molecular replacement solutions.<br />
<br />
"PHASER.rlist" files are generated by the rotation function modes, and are used as input for performing translation functions.<br />
<br />
For simple MR cases you don't really need to know how to define molecular replacement solutions. However, for difficult cases you might need to edit the files "PHASER.sol" and "PHASER.rlist" files manually<br />
<br />
=== "sol" Files===<br />
SOLUtion 6DIM keywords describe Ensembles that have been oriented by a rotation search and positioned by a translation search. Each Ensemble in the asymmetric unit has its own SOLUtion keyword. When more than one (potential) molecular replacement solution is present, the solutions are separated with the SOLUTION SET keywords.<br />
<br />
==="rlist" Files===<br />
These files define a rotation function list. The peak list is given with a series of SOLUtion TRIAl keywords.<br />
<br />
If a partial solution is already known, then the information for the currently "known" parts of the asymmetric unit is given in the form used for the PHASER.sol file, followed by the list of trial orientations for which a translation function is to be performed.<br />
<br />
===Fixed partial structure===<br />
If you have the coordinates of a partial solution with the pdb coordinates of the known structure in the correct orientation and position, then you can force Phaser to use these coordinates. Use the SOLUTION keyword to fix a rotation of 0 0 0 and a position of 0 0 0 for these coordinates.<br />
<br />
==How to Select Peaks==<br />
<br />
<br />
<br />
The selection of peaks saved for output in the rotation and translation functions can be done in four different ways.<br />
*'''Select by Percentage'''<br />
*: Percentage of the top peak, where the value of the top peak is defined as 100% and the value of the mean is defined as 0%.<br />
*: Default, cutoff=75%. This criteria has the advantange that at least one peak (the top peak) always survives the selection. If the top solution is clear, then only the one solution will be output, but if the distribution of peaks is rather flat, then many peaks will be output for testing in the next part of the MR procedure (e.g. many peaks selected from the rotation function for testing with a translation function). <br />
*'''Select by Z-score'''<br />
*: Number of standard deviations (sigmas) over the mean (the Z-score). <br />
*: Absolute significance test. Not all searches will produce output if the cutoff value is too high (e.g. 5 sigma). <br />
*'''Select by Number'''<br />
*: Number of top peaks to select. <br />
*: If the distribution is very flat then it might be better to select a fixed large number (e.g. 1000) of top rotation peaks for testing in the translation function.<br />
*'''No selection'''<br />
*: All peaks are selected. <br />
*: Enables full 6 dimensional searches, where all the solutions from the rotation function are output for testing in the translation function. This should never be necessary; it would be much faster and probably just as likely to work if the top 1000 peaks were used in this way.<br />
<br />
[[Image:Phaser_selection.gif| Selection criteria]]<br />
<br />
Peaks can also be clustered or not clustered prior to selection in steps 1 and 2.<br />
*'''Clustering Off'''<br />
: All high peaks on the search grid are selected<br />
*'''Clustering On'''<br />
: Points on the search grid with higher neighbouring points are removed from the selection<br />
<br />
<br />
[[Image:Phaser_clustering.gif| Clustering]]<br />
<br />
==How to Control Output==<br />
The output of Phaser can be controlled with optional keywords. <br />
<br />
The ROOT keyword is not compulsory (the default root filename is "PHASER"), but should always be given, so that your jobs have separate and meaningful output filenames.<br />
<br />
The TOPFiles keyword controls the number of potential MR solutions for which PDB and (in the appropriate modes) MTZ files are produced.<br />
<br />
For the MR_AUTO, MR_RNP and MR_LLG modes, unless HKLOut OFF is given as an optional keyword, Phaser produces an MTZ file with "SigmaA" type weighted Fourier map coefficients for producing electron density maps for rebuilding.<br />
<br />
{| class="wikitable" style="text-align:left" width=100%<br />
|-<br />
! MTZ Column Labels !! Description<br />
|-<br />
| FWT/PHWT || Amplitude and phase for 2''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| DELFWT/PHDELWT || Amplitude and phase for ''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| FOM || ''m'', analogous to the "Sim" weight, to estimate the reliability of &alpha;<sub>calc</sub><br />
|-<br />
| HLA/HLB/HLC/HLD || Hendrickson-Lattman coefficients encoding the phase probability distribution<br />
|}<br />
<br />
==Translational Non-crystallographic Symmetry==<br />
<br />
<span style="color:crimson">'''*Warning*''' Solution by MR in the presence of translational non-crystallographic symmetry is not fully automated.</span><br />
<br />
Phaser calculates correction factors for the expected intensities in the presence of translational non-crystallographic symmetry (tNCS), and is able to solve structures with complex patterns of tNCS. '''However, the use of Phaser in the presence of tNCS requires the nature of the tNCS to be understood by the user.''' In simple cases, solution is no more difficult than solution without tNCS, but in complex cases, separate Phaser runs with tNCS turned on and off, and/or the use of different tNCS vectors, may be necessary.<br />
<br />
The output of Phaser will help the user in detecting and understanding the tNCS, but '''the tNCS is not completely characterised by Phaser'''. The default behaviour may or may not be correct for the particular crystal under study.<br />
<br />
Characterization of the tNCS involves understanding the number of copies of the molecule in the asymmetric unit and the translation vectors between them. Molecules related by a tNCS vector will have an associated peak in the native Patterson. Phaser calculates the native Patterson (MODE TNCS) and lists the peaks that are more than 20% of the origin peak. Any given crystal with tNCS may have one or more peaks meeting this criteria.<br />
<br />
===Default tNCS detection and correction===<br />
<span style="color:crimson">Documentation for Phaser-2.7.16 and above</span><br />
<br />
====No tNCS====<br />
No tNCS correction is applied by default if there is<br />
# no peak in the native Patterson <br />
# more than one peak in the native Patterson over 20% of the origin and these peaks are not all the result of a commensurate modulation<br />
<br />
====Pairs of molecules====<br />
By default, if Phaser detects a peak in the native Patterson then Phaser will search for molecules in pairs related by the tNCS vector given by the peak in the native Patterson.<br />
<br />
This will be the correct behaviour if and only if there are an even number of copies of the molecule in the asymmetric unit, clustered into two groups related by a single tNCS vector. There will only be one significant peak in the native Patterson. Fortunately, this is a reasonably common scenario.<br />
<br />
Phaser refines the relative orientation of the molecules in the two groups (rotations of up to 10 degrees will still give rise to a significant native Patterson peak) and uses this information to generate expected intensity factors for the reflections. Solution should be straightforward, with the usual caveat for MR that there is a sufficiently good model.<br />
<br />
Where there is a single peak in the native Patterson, it is often located at a position half way along a unit cell axis or diagonal, representing a pseudo-halving of the unit cell dimensions. However, Phaser is by no means restricted to these sorts of pseudo-cells in its handling of two-fold tNCS, and the tNCS vector can be in a general position.<br />
<br />
===Non-default tNCS correction===<br />
====Higher order tNCS====<br />
Frequently, tNCS does not associate 2 clusters of molecules in the asymmetric unit, but rather there are 3 or more (n) clusters of molecules associated by a series of vectors that are multiples of 1, 2, 3 ... (n-1) times a basic translation vector. Where n times the basic translation vector equates to (very close to) integer multiples of unit cell axes, the tNCS represents a pseudo-cell, and this case is known as commensurate modulation. <br />
<br />
Phaser attempts to automatically detect commensurate modulation. The peaks of the native Patterson are analyzed to find the n-fold relationship. The series will not generally have all peaks the same height. Lower peaks in the series represent relationships where the relative rotations between related molecules are larger. Missing peaks in the series may be below the default 20% of origin cut-off. This can be lowered with TNCS PATT PERCENT <x><br />
<br />
Phaser then sets TNCS NMOL <n> and the vector for the tNCS, and searches for ensembles in multiples of NMOL.<br />
<br />
When there are more than two molecules related by tNCS, Phaser does not refine the orientations between the molecules related by the tNCS.<br />
<br />
However, as for two-fold tNCS, Phaser is not restricted to these sorts of pseudo-cells and the basic tNCS vector can be in a general position, as can the number of copies.<br />
<br />
'''The automatic detection may not give the true tNCS relationship'''. For example, the true commensurate modulation may be a factor of the NMOL automatically detected by Phaser, or there may not be commensurate modulation at all, or commensurate modulation may not be found with the default Pattesron peak height cutoff. In difficult cases, please inspect the Patterson for peaks.<br />
<br />
====Complex tNCS====<br />
If there are many molecules in the asymmetric unit but they are not all related by tNCS, or there are sub-groups of molecules related by different tNCS vectors, then the modulations of the expected intensities due to the tNCS will be much less significant than the cases described above. '''In these cases it is possible that structure solution will be achieved without any tNCS correction factors being applied.''' Indeed, searching for all the copies as tNCS-related multiples when some molecules are not related by tNCS will cause structure solution to fail. To turn off the automatic detection and use of tNCS use the keyword TNCS USE OFF.<br />
<br />
If turning off the TNCS correction factors fails to give a solution, then a good approach is to proceed step-wise. Consider the highest native Patterson peak first and determine that nature of the tNCS associated with it. Use the appropriate correction factors to locate all the molecules with this tNCS. Then take the second independent native Patterson peak and apply the correction factors associated with it to find the second set of molecules, fixing the first, etc. Finally, turn TNCS off to find any orphan molecules.</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Molecular_Replacement&diff=2469Molecular Replacement2018-07-04T10:30:13Z<p>Rdo20: /* Automated Molecular Replacement */</p>
<hr />
<div><div style="margin-left: 25px; float: right;">__TOC__</div><br />
<br />
'''Quicklink to example scripts''' -> [[MR using keyword input]]<br />
<br />
'''Quicklink to phaser.famos (find_alt_orig_sym_mate) documentation''' -> [[Famos]]<br />
<br />
Phaser should be able to solve most structures with the Automated Molecular Replacement mode, and this is the first mode that you should try. Give Phaser your data ([[#How to Define Data|How to Define Data]]) and your models ([[#How to Define Models|How to Define Models]]), tell Phaser what to search for, and a list of possible spacegroups (in the same point group).<br />
<br />
If this doesn't work (see [[#Has Phaser Solved It?| Has Phaser Solved It?]]), you can try selecting peaks of lower significance in the rotation function in case the real orientation was not within the selection criteria. By default peaks above 75% of the top peak are selected (see [[#How to Select Peaks| How to Select Peaks]]). See [[#What to do in Difficult Cases| What to do in Difficult Cases]] for more hints and tips. If the automated molecular replacement mode doesn't work even with non-default input you need to run the modes of Phaser separately. The possibilities are endless - you can even try exhaustive searches (translations of all orientations) if you want - but experience has shown that most structures that can be solved by Phaser can be solved by relatively simple strategies.<br />
<br />
==Automated Molecular Replacement==<br />
Automated Molecular Replacement combines the anisotropy correction, likelihood enhanced fast rotation function, likelihood enhanced fast translation function, packing and refinement modes for multiple search models and a set of possible spacegroups to automatically solve a structure by molecular replacement. Top solutions are output to the files FILEROOT.sol, FILEROOT.#.mtz and FILEROOT.#.pdb (where "#" refers to the sorted solution number, 1 being the best, and only 1 is output by default). Many structures can be solved by running an automated molecular replacement search with defaults, giving the ensembles that you expect to be easiest to find first.<br />
<br />
At the completion of Molecular Replacement you may wish to place your solutions on a common origin with a previous solution, for which [[Famos | Famos ]] can be used.<br />
<br />
[[Image:Phaser_MR_auto2.png|Flow Diagram for Automated MR|frame|800px]]<br />
<br />
==Should Phaser Solve It?==<br />
The difficulty of a molecular replacement problem depends primarily on two major factors: how well the model will be able to explain the diffraction data (which depends both on the accuracy of the model and on its completeness), and how many reflections can be explained, at least in part. Each reflection provides a piece of information that helps to identify correct MR solutions.<br />
<br />
It is possible to make a reasonable prediction of whether or not a solution will be found. If the quality of the model (its accuracy and completeness) can be estimated, then the expected contribution of each reflection to the total LLG can also be estimated. From a large battery of tests, we know that an LLG of 40 or greater usually indicates a correct solution (at least in the absence of complicating factors such as translational non-crystallographic symmetry, tNCS). Building on this understanding, if it is estimated that the LLG will be 60 or less, then Phaser will assume that the problem is a difficult one, and will implement search procedures optimised for difficult problems.<br />
<br />
==What Resolution of Data Should be Used?==<br />
The signal for a molecular replacement solution should be very clear if the expected value of the LLG is much higher than the minimum required to be fairly certain of a solution. Currently Phaser aims for a minimum LLG of 120 and, if it is possible to achieve an even higher value, given the quality of the model and the quantity of diffraction data, then the resolution for the initial search is limited to the value required to achieve an expected LLG of 120. Data to the full resolution are still used for a final rigid-body refinement, or in a second pass if a clear solution is not found in the first attempt.<br />
<br />
However, if the model is expected to have a large RMS error (based usually on the correlation between sequence identity and RMS error), then data to high resolution will not contribute any significant signal. Regardless of the expected LLG at the highest resolution limit, the resolution used is limited to 1.8 times the estimated RMS error of the model, because this resolution limit gives about 99% of the LLG that could be achieved.<br />
<br />
Because Phaser implements strategies designed to solve structures with as much confidence as possible, as efficiently as possible, it is best to leave the choice of resolution to Phaser, at least in the first instance.<br />
<br />
==Has Phaser Solved It?==<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! TF Z-score !! Have I solved it?<br />
|-<br />
| less than 5 || no<br />
|-<br />
| 5 - 6 || unlikely<br />
|-<br />
| 6 - 7 || possibly<br />
|-<br />
| 7 - 8 || probably<br />
|-<br />
| more than 8* ||definitely<br />
|-<br />
| *''6 for 1st model in monoclinic space groups'' || <br />
|} <br />
<br />
Ideally, a unique solution with a strong signal will be found at the end of the search. If you are searching for multiple components, then ideally the search for each component will also give a strong signal. However if the signal-to-noise of your search is low, there will be noise peaks and multiple ambiguous solutions. Signal-to-noise is judged using the '''Z-score''', which is computed by comparing the LLG values from the rotation or translation search with LLG values for a set of random rotations or translations. The mean and the RMS deviation from the mean are computed from the random set, then the Z-score for a search peak is defined as its LLG minus the mean, all divided by the RMS deviation, ''i.e. '' '''the number of standard deviations above (or below) the mean. '''<br />
<br />
For a rotation function, the correct orientation may be well down the list with a Z-score (number of standard deviations above the mean value, or RFZ) under 4, and it is often not possible to identify the correct orientation until a translation function is performed and yields a clear solution. Note that the signal-to-noise of the rotation function drops with increasing number of primitive symmetry operations (the number of different orientations for symmetry-related molecules), because there is more uncertainty about how the structure factor contributions from symmetry-related copies will add up.<br />
<br />
For a translation function the correct solution will generally have a Z-score (TFZ) over 5 and be well separated from the rest of the solutions. Of course, there will always be exceptions! The table gives a very rough guide to interpreting TFZ scores. This table will be updated, as we learn more from systematic molecular replacement trials.<br />
<br />
When you are searching for multiple components, the signal may be low for the first few components but, as the model becomes more complete, the signal should become stronger. Finding a clear solution for a new component is a good sign that the partial solution to which that component was added was indeed correct.<br />
<br />
You should always at least glance through the summary of the logfile. One thing to look for, in particular, is whether any translation solutions with a high Z-score have been rejected by the packing step. By default up to 5 percent of marker atoms (C-alpha atoms for protein) are allowed to be involved in clashes. A solution with more clashes may still be correct, and the clashes may arise only because of differences in small surface loops. If this happens, repeat the run allowing a suitable number of clashes. Note that, unless there is specific evidence in the logfile that a high TFZ-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond the default will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
<br />
Note that, by default, Phaser will produce a single PDB file corresponding to the top solution found (if any), so finding a single PDB file in your output directory is not an indication that the search succeeded! You have to look, at least, at the summary of the logfile, or at the list of possible solutions in the .sol file that is produced if you run Phaser from ccp4i or command-line scripts.<br />
<br />
==Annotation==<br />
<br />
A highly compact summary of the history of the statistics of a solution is given in the SOLUTION SET in the .sol file. This is a good place to start your analysis of the output. The annotation gives the Z-score of the solution at each rotation and translation function, the number of clashes in the packing, and the refined LLG.<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! Annotation !! Meaning<br />
|-<br />
| RFZ= || Rotation Function Z-score<br />
|-<br />
| TFZ= || Translation Function Z-score<br />
|-<br />
| PAK= || Number of packing clashes<br />
|-<br />
| LLG= || LLG after refinement. Will be repeated when a low resolution refinement is followed by a high resolution refinement.<br />
|-<br />
| TFZ== || Translation Function Z-score equivalent, only calculated for the top solution after refinement (or for the number of top files specified by TOPFILES)<br />
|-<br />
| RF++ || Rotation angle from previous strong solution has been used in the addition of next solution<br />
|-<br />
| RF*0 || Rotation angle 000 identified by low R-factor of input model<br />
|-<br />
| TFZ=* || First molecule in P1 (arbitrary origin, no Translation Function required)<br />
|-<br />
| TF*0 || Translation vector 000 identified by low R-factor of input model<br />
|-<br />
| (&&nbsp;... & ...) || Set of TFZ PAK and LLG values for placements that were amalgamated (more than one placement from a single Translation Function)<br />
|-<br />
| LLG+=(...&nbsp;&&nbsp;...)&nbsp;|| Set of LLG values calculated during amalgamation, which will always be increasing in value<br />
|-<br />
| +TNCS || Components added by Translational NCS relation<br />
|-<br />
| *T=<i>n</i> || Solution matches template solution <i>n</i><br />
|} <br />
<br />
Two versions of TFZ (the translation function Z-score) now appear for each component. The first ("TFZ=") is the Z-score from the actual translation search, which depends on the accuracy of the orientation used for that search. The second ("TFZ==") is the TFZ-equivalent, which indicates what the TFZ score would have been with the correct (refined) orientation. You should see the TFZ-equivalent is high at least for the final components of the solution, and that the LLG (log-likelihood gain) increases as each component of the solution is added. For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU SET RFZ=10.7 TFZ=24.3 PAK=0 LLG=472 TFZ==24.7 RFZ=6.4 TFZ=24.4 PAK=0 LLG=1006 TFZ==29.7 LLG=1006 TFZ==29.7<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
Note that the Euler angles in Phaser follow the same convention as those defined for the Crowther fast rotation function, i.e. z-y-z (rotate around the z-axis, followed by the new y-axis, followed by the new z-axis).<br />
<br />
==History==<br />
<br />
A highly compact summary of the history of the peak positions of a solution is given in the SOLUTION HISTORY in the .sol file. Together with the SOLUTION SET annotation, this is useful in your analysis of the output. <br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! History !! Meaning<br />
|-<br />
| RF/TF(r/t:n) || (r) Rotation Function peak number/(t) Translation Function peak number for the rotation function : (n) number of peak in final merged and sorted list<br />
|-<br />
| PAK(n:m) || (n) input solution number : (m) output solution number after packing condition applied<br />
|-<br />
| RNP(m,a,b,c,... : p) || All input peaks amalgamated after refinement to give output solution number (m and others): (p) output solution number<br />
|-<br />
| FUSE(A,B,C) || Solution numbers merged in amalgamation<br />
|} <br />
<br />
For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU HISTORY RF/TF(1/1:1)PAK(1:1)RNP(1:1)RNP(1:1)<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
A more complicated structure solution may have<br />
<br />
SOLU HISTORY RF/TF(7/1:10)PAK(10:10)RNP(10,12,13,11,17,16,18,25,3,8,22,21,20,7,969,6,5,201,9,4,390,2,1,19:1)RNP(1:1)<br />
<br />
==What to do in Difficult Cases==<br />
<br />
Not every structure can be solved by molecular replacement, but the right strategy can push the limits. What to do when the default jobs fail depends on why your structure is difficult.<br />
*'''Flexible Structure'''<br />
*:The relative orientations of the domains may be different in your crystal than in the model. If that may be the case, break the model into separate PDB files containing rigid-body units, enter these as separate ensembles, and search for them separately. If you find a convincing solution for one domain, but fail to find a solution for the next domain, you can take advantage of the knowledge that its orientation is likely to be similar to that of the first domain. The ROTAte&nbsp;AROUnd option of the brute rotation search can be used to restrict the search to orientations within, say, 30 degrees of that of the known domain. Allow for close approach of the domains by increasing the allowed clashes with the PACK keyword by, say, 1 for each domain break that you introduce. Note that it is possible to use the brute rotation search as part of the automated molecular replacement pipeline, by changing the choice of the type of rotation search. Alternatively, you could try generating a series of models perturbed by normal modes, with the NMAPdb keyword. One of these may duplicate the hinge motion and provide a good single model.<br />
*'''Poor or Incomplete Model'''<br />
*:Signal-to-noise is reduced by coordinate errors or incompleteness of the model. Since the rotation search has lower signal to begin with than the translation search, it is usually more severely affected. For this reason, it can be very useful to use the subsequent translation search as a way to choose among many (say 1000) orientations. THe MR_AUTO FAST search mode automatically reduces the cutoff for accepting peaks from the fast rotation function if the decault pass does not find a solution with a high z-score, but you can manually reduce this further with the PEAKS and PURGE keywords. You can also try turning off the clustering of fast rotation function peaks because the correct orientation may sit on the shoulder of a peak in the rotation function. <br />
*:As shown convincingly by Schwarzenbacher ''et al.'' (Schwarzenbacher, Godzik, Grzechnik &amp; Jaroszewski, ''Acta Cryst.'' D'''60''', 1229-1236, 2004), judicious editing can make a significant difference in the quality of a distant model. In a number of tests with their data on models below 30% sequence identity, we have found that Phaser works best with a "mixed model" (non-identical sidechains longer than Ser replaced by Ser). In agreement with their results, the best models are generally derived using more sophisticated alignment protocols, such as their FFAS protocol. Use [http://www.phenix-online.org/documentation/sculptor.htm phenix.sculptor] to edit your model.<br />
*'''High Degree of Non-crystallographic Symmetry'''<br />
*:If there are clear peaks in the self-rotation function, you can expect orientations to be related by this known NCS. Methods to automatically use such information will be implemented in a future version of Phaser. In the meantime, you can work out for yourself the orientations that would be consistent with NCS and use the ROTAte&nbsp;AROUnd option to sample similar orientations. Alternatively, you may have an oligomeric model and expect similar NCS in the crystal. First search with the oligomeric model; if this fails, search with a monomer. If that succeeds, you can again use the ROTAte&nbsp;AROUnd option to force a subsequent monomer to adopt an orientation similar to the one you expect.<br />
*'''What <u>not</u> to do'''<br />
*:The automated mode of Phaser is fast when Phaser finds a high Z-score solution to your problem. When Phaser cannot find a solution with a significant Z-score, it "thrashes", meaning it maintains a list of 100-1000's of low Z-score potential solutions and tries to improve them. This can lead to exceptionally long Phaser runs (over a week of CPU time). Such runs are possible because the highly automated script allows many consecutive MR jobs to be run without you having to manually set 100-1000's of jobs running and keep track of the results. "Thrashing" generally does not produce a solution: solutions generally appear relatively quickly or not at all. It is more useful to go back and analyse your models and your data to see where improvements can be made. Your system manager will appreciate you terminating these jobs.<br />
*:It is also not a good idea to effectively remove the packing test. Unless there is specific evidence in the logfile that a high TF-function Z-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond a few (e.g. 1-5) will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
*'''Other suggestions'''<br />
*:Phaser has powerful input, output and scripting facilities that allow a large number of possibilities for altering default behaviour and forcing Phaser to do what you think it should. However, you will need to read the information in the manual below to take advantage of these facilities!<br />
<br />
==How to Define Data==<br />
You need to tell Phaser the name of the mtz file containing your data and the columns in the mtz file to be used using the HKLIn and LABIn keywords. Additional keywords (BINS CELL OUTLier RESOlution SPACegroup) define how the data are used.<br />
<br />
==How to Define Models==<br />
Phaser must be given the models that it will use for molecular replacement. A model in Phaser is referred to as an "ensemble", even when it is described by a single file. This is because it is possible to provide a set of aligned structures as an ensemble, from which a statistically-weighted averaged model is calculated. A molecular replacement model is provided either as one or more aligned pdb files, or as an electron density map, entered as structure factors in an mtz file. Each ensemble is treated as a separate type of rigid body to be placed in the molecular replacement solution. An ensemble should only be defined once, even if there are several copies of the molecule in the asymmetric unit.<br />
<br />
Fundamental to the way in which Phaser uses MR models (either from coordinates or maps) is to estimate how the accuracy of the model falls off as a function of resolution, represented by the Sigma(A) curve. To generate the Sigma(A) curve, Phaser needs to know the RMS coordinate error expected for the model and the fraction of the scattering power in the asymmetric unit that this model contributes.<br />
<br />
A Babinet-style correction is used to account for the effects of disordered solvent on the completeness of the model at low resolution.<br />
<br />
Molecular replacement models are defined with the ENSEmble keyword and the COMPosition keyword. The ENSEmble keyword gives (amongst other things) the RMS deviation for the Sigma(A) curve. The COMPosition keyword is used to deduce the fraction of the scattering power in the asymmetric unit that each ensemble contributes. The composition of the asymmetric unit is defined either by entering the molecular weights or sequences of the components in the asymmetric unit, and giving the number of copies of each. Expert users can also enter the fraction of the scattering of each component directly, although the composition must still be entered for the absolute scale calculation. Please note that the composition supplied to Phaser has to include everything in the asymmetric unit, not just what is being looked for in the current search!<br />
<br />
===Building an Ensemble from Coordinates===<br />
The RMS deviation is determined directly from RMS or indirectly from IDENtity in the ENSEmble<br />
keyword using a formula that depends on the sequence identity and the number of residues in the model.<br />
<br />
The RMS deviation estimated from ID may be an underestimate of the true value if there is a slight conformational change between the model and target structures. To find a solution in these cases it may be necessary to increase the RMS from the default value generated from the ID, by say 0.5 Angstroms. On the other hand, when Phaser succeeds in solving a structure from a model with sequence identity much below 30%, it is often found that the fold is preserved better than the average for that level of sequence identity. So it may be worth submitting a run in which the RMS error is set at, say, 1.5, even if the sequence identity is low. The table below can be used as a guide as to the default RMS value corresponding to ID.<br />
<br />
If you construct a model by homology modelling, remember that the RMS error you expect is essentially the error you expect from the template structure (if not worse!). So specify the sequence identity of the template, not of the homology model.<br />
<br />
Only the model with the highest sequence identity is reported in the output pdb file. Also, HETATM cards in the input pdb file are ignored in the calculation of the structure factors for the ensemble, but are carried through to the output pdb file. Thus, the phases on the output mtz file (which come from the structure factors of the ensemble) do not correspond to those that would be calculated from the output pdb file, when there is more than one pdb file in an ensemble and/or the pdbfile(s) have HETATM records.<br />
<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|+ '''Initial estimate of RMS deviation in Angstrom: Number of residues in model (upper row) versus sequence identity (left column)'''<br />
|-<br />
! !! #50 !! #100 !! #200 !! #300 !! #400 !! #600 !! #850 !! #1000 !! #1500 !! #2000<br />
|-<br />
|'''ID=0%''' || 1.579 || 1.689 || 1.875 || 2.030 || 2.164 || 2.391 || 2.625 || 2.748 || 3.093 || 3.375<br />
|-<br />
|'''ID=10%''' || 1.356 || 1.451 || 1.610 || 1.743 || 1.858 || 2.053 || 2.255 || 2.360 || 2.657 || 2.899<br />
|-<br />
|'''ID=20%''' || 1.165 || 1.246 || 1.383 || 1.497 || 1.596 || 1.764 || 1.936 || 2.027 || 2.281 || 2.489<br />
|-<br />
|'''ID=30%''' || 1.000 || 1.070 || 1.188 || 1.286 || 1.371 || 1.515 || 1.663 || 1.741 || 1.959 || 2.138<br />
|-<br />
|'''ID=40%''' || 0.859 || 0.919 || 1.020 || 1.104 || 1.177 || 1.301 || 1.428 || 1.495 || 1.683 || 1.836<br />
|-<br />
|'''ID=50%''' || 0.738 || 0.789 || 0.876 || 0.948 || 1.011 || 1.117 || 1.227 || 1.284 || 1.445 || 1.577<br />
|-<br />
|'''ID=60%''' || 0.634 || 0.678 || 0.752 || 0.814 || 0.868 || 0.959 || 1.053 || 1.103 || 1.241 || 1.354<br />
|-<br />
|'''ID=70%''' || 0.544 || 0.582 || 0.646 || 0.699 || 0.746 || 0.824 || 0.905 || 0.947 || 1.066 || 1.163<br />
|-<br />
|'''ID=80%''' || 0.467 || 0.500 || 0.555 || 0.601 || 0.640 || 0.708 || 0.777 || 0.813 || 0.915 || 0.999<br />
|-<br />
|'''ID=90%''' || 0.401 || 0.429 || 0.477 || 0.516 || 0.550 || 0.608 || 0.667 || 0.698 || 0.786 || 0.858<br />
|-<br />
|'''ID=100%''' || 0.345 || 0.369 || 0.409 || 0.443 || 0.472 || 0.522 || 0.573 || 0.600 || 0.675 || 0.737<br />
|}<br />
<br />
<br />
====Coordinate Editing====<br />
=====HETATM/LIGANDS=====<br />
Phaser ignores the scattering from HETATM records. The HETATM records are carried though to output with occupancy set to zero. Ligands will therefore not contribute to the scattering used for molecular replacement. The exceptions to this rule are the HETATM records for MSE (seleno-methionine) MSO (seleno-methionine selenoxide) CSE (seleno-cysteine) CSO (seleno-cysteine selenoxide) ALY (acetyllysine) MLY (n-dimethyl-lysine) and MLZ (n-methyl-lysine) which are used in the scattering and carried through to output with their original occupancy. If you wish to include any HETATM records in the scattering the record name use the keyword ENSE modlid HETATOM ON<br />
<br />
=====WATER=====<br />
Water molecules (identified by the residue name OW WAT HOH H2O OH2 MOH WTR or TIP) are deleted from the pdb file on input, are not used in the scattering and are not carried through to file output. If you want to retain water molecules you will need to change the residue name to something other than this (e.g. WWW) so that the atoms are not identified as water. To include the water molecules in the scattering, the HETATM records will also have to be changed to ATOM records as described above.<br />
<br />
===Building an Ensemble from Electron Density===<br />
When using density as a model, it is necessary to specify both the extent (x,y,z limits) of the cut-out region of density, and the centre of this region. With coordinates, Phaser can work this out by itself. This information is needed, for instance, to decide how large rotational steps can be in the rotation search and to carry out the molecular transform interpolation correctly. In the case of electron density, the RMS value does not have the same physical meaning that it has when the model is specified by atomic coordinates, but it is used to judge how the accuracy of the calculated structure factors drops off with resolution. A suitable value for RMS can be obtained, in the case of density from an experimentally-phased map, by choosing a value that makes the SigmaA curve fall off with resolution similarly to the mean figures-of-merit. In the case of density from an EM image reconstruction, the RMS value should make the SigmaA curve fall off similarly to a Fourier correlation curve used to judge the resolution of the EM image.<br />
<br />
For detailed information, including a tutorial with example scripts, see<br />
[[Using Electron Density as a Model| Using density as a model]]<br />
<br />
==How to Define Composition==<br />
The composition defines the total amount of protein and nucleic acid that you have in the asymmetric unit. It is very important to include everything in the composition, not just the components that you are searching for, because Phaser needs to know what fraction of the total scattering is accounted for by each model. For the options that specify the size of a particular component (sequence, number of residues, molecular weight), you can separately define several components of the composition of the asymmetric unit and Phaser will just add them up. Note that, for these options, you can specify the composition of one copy of a component and also say how many copies of that component are expected to be present. You can also mix compositions entered by sequence, number of residues and molecular weight. When the composition is checked, Phaser will check for the plausibility of the composition you have specified, as well as multiples of that composition.<br />
<br />
===Default Composition===<br />
For convenience, the composition defaults to 50% protein scattering by volume (the average for protein crystals). It is better to enter it explicitly, even if only to check that you have correctly deduced the probable content of your crystal. If your crystal has higher or lower solvent content than this, or contains nucleic acid, then the composition should be entered explicitly.<br />
===Composition by Sequence===<br />
The composition is calculated from the amino acid sequence of the protein and the base sequence of the nucleic acid in fasta format.<br />
===Composition by Atom===<br />
Individual atoms can be added to the composition. This allows the explicit addition of heavy atoms in the structure, e.g. Fe atoms.<br />
===Composition by Solvent Content===<br />
Scattering is determined from the solvent content of the crystal, assuming that the crystal contains protein only, and the average distribution of amino acids in protein. If your crystal contains nucleic acid or your protein has an unusual amino acid distribution then the composition should be entered explicitly using the MW or sequence options.<br />
===Composition by Number of Residues in ASU===<br />
Scattering is determined from the number of residues in the asymmetric unit, assuming that the crystal contains protein only or nucleic acid only, and assuming an average distribution of residues for either. If your crystal contains a mixture then the composition should be entered explicitly using the MW or sequence options. If your crystal has an unusual residue distribution then the composition should be entered explicitly using the sequence options.<br />
===Composition by Molecular Weight===<br />
The composition is calculated from the molecular weight of the protein and nucleic acid assuming the protein and nucleic acid have the average distribution of amino acids and bases. If your protein or nucleic acid has an unusual amino acid or base distribution the composition should be entered by sequence. You can mix compositions entered by molecular weight with those entered by sequence.<br />
===Composition by Percentage Scattering===<br />
The fraction scattering of each ensemble can be entered directly. The fraction scattering of each ensemble is normally automatically worked out from the average scattering from each ensemble (calculated from the pdb files if entered as coordinates, or from the protein and nucleic acid molecular weights if entered as a map) divided by the total scattering given by the composition, but entering the fraction scattering directly overrides this calculation. This option is for use when the pdb files of the models in the ensemble are unusual e.g. consist only of C-alpha atoms, or only of hydrogen atoms (as in the CLOUDS method for NMR).<br />
<br />
==How to Define Searches==<br />
Phaser does not compare sequences you specify in the composition with the models you specify as ensembles, so you have to specify separately the number of copies of a particular sequence that you expect to be found in the asymmetric unit of your crystal and the number of copies of each ensemble you want to place in the asymmetric unit. By default, Phaser will search first for ensembles expected to yield the highest signal in the MR search (as judged by the expected LLG or eLLG calculation); if that fails to result in a clear solution, different search orders will be tested automatically. For that reason, it does not normally matter which order you use to specify the searches. There is an option to override Phaser's automatic choice of search order, but this will only rarely be useful. It is best to specify the searches for everything that you hope to find in the MR calculation in one job, as that gives Phaser the greatest scope to optimise the calculation. Note that if your crystal possesses translational non-crystallographic symmetry (tNCS), you should be searching for a number of copies of each ensemble divisible by the order of the tNCS (i.e. the number of molecules that should be related by repeated application of a translation vector).<br />
<br />
==How to Define Solutions==<br />
Phaser writes out files ending in ".sol" and ".rlist" that contain the solution information from the job. The root of the files is given by the ROOT keyword. By default, the root filename is PHASER. These files can be read back into subsequent runs of Phaser to build up solutions containing more than one molecule in the asymmetric unit.<br />
<br />
"PHASER.sol" files are generated by all modes (rotation function modes with VERBOSE output), and contain the current idea of potential molecular replacement solutions.<br />
<br />
"PHASER.rlist" files are generated by the rotation function modes, and are used as input for performing translation functions.<br />
<br />
For simple MR cases you don't really need to know how to define molecular replacement solutions. However, for difficult cases you might need to edit the files "PHASER.sol" and "PHASER.rlist" files manually<br />
<br />
=== "sol" Files===<br />
SOLUtion 6DIM keywords describe Ensembles that have been oriented by a rotation search and positioned by a translation search. Each Ensemble in the asymmetric unit has its own SOLUtion keyword. When more than one (potential) molecular replacement solution is present, the solutions are separated with the SOLUTION SET keywords.<br />
<br />
==="rlist" Files===<br />
These files define a rotation function list. The peak list is given with a series of SOLUtion TRIAl keywords.<br />
<br />
If a partial solution is already known, then the information for the currently "known" parts of the asymmetric unit is given in the form used for the PHASER.sol file, followed by the list of trial orientations for which a translation function is to be performed.<br />
<br />
===Fixed partial structure===<br />
If you have the coordinates of a partial solution with the pdb coordinates of the known structure in the correct orientation and position, then you can force Phaser to use these coordinates. Use the SOLUTION keyword to fix a rotation of 0 0 0 and a position of 0 0 0 for these coordinates.<br />
<br />
==How to Select Peaks==<br />
<br />
<br />
<br />
The selection of peaks saved for output in the rotation and translation functions can be done in four different ways.<br />
*'''Select by Percentage'''<br />
*: Percentage of the top peak, where the value of the top peak is defined as 100% and the value of the mean is defined as 0%.<br />
*: Default, cutoff=75%. This criteria has the advantange that at least one peak (the top peak) always survives the selection. If the top solution is clear, then only the one solution will be output, but if the distribution of peaks is rather flat, then many peaks will be output for testing in the next part of the MR procedure (e.g. many peaks selected from the rotation function for testing with a translation function). <br />
*'''Select by Z-score'''<br />
*: Number of standard deviations (sigmas) over the mean (the Z-score). <br />
*: Absolute significance test. Not all searches will produce output if the cutoff value is too high (e.g. 5 sigma). <br />
*'''Select by Number'''<br />
*: Number of top peaks to select. <br />
*: If the distribution is very flat then it might be better to select a fixed large number (e.g. 1000) of top rotation peaks for testing in the translation function.<br />
*'''No selection'''<br />
*: All peaks are selected. <br />
*: Enables full 6 dimensional searches, where all the solutions from the rotation function are output for testing in the translation function. This should never be necessary; it would be much faster and probably just as likely to work if the top 1000 peaks were used in this way.<br />
<br />
[[Image:Phaser_selection.gif| Selection criteria]]<br />
<br />
Peaks can also be clustered or not clustered prior to selection in steps 1 and 2.<br />
*'''Clustering Off'''<br />
: All high peaks on the search grid are selected<br />
*'''Clustering On'''<br />
: Points on the search grid with higher neighbouring points are removed from the selection<br />
<br />
<br />
[[Image:Phaser_clustering.gif| Clustering]]<br />
<br />
==How to Control Output==<br />
The output of Phaser can be controlled with optional keywords. <br />
<br />
The ROOT keyword is not compulsory (the default root filename is "PHASER"), but should always be given, so that your jobs have separate and meaningful output filenames.<br />
<br />
The TOPFiles keyword controls the number of potential MR solutions for which PDB and (in the appropriate modes) MTZ files are produced.<br />
<br />
For the MR_AUTO, MR_RNP and MR_LLG modes, unless HKLOut OFF is given as an optional keyword, Phaser produces an MTZ file with "SigmaA" type weighted Fourier map coefficients for producing electron density maps for rebuilding.<br />
<br />
{| class="wikitable" style="text-align:left" width=100%<br />
|-<br />
! MTZ Column Labels !! Description<br />
|-<br />
| FWT/PHWT || Amplitude and phase for 2''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| DELFWT/PHDELWT || Amplitude and phase for ''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| FOM || ''m'', analogous to the "Sim" weight, to estimate the reliability of &alpha;<sub>calc</sub><br />
|-<br />
| HLA/HLB/HLC/HLD || Hendrickson-Lattman coefficients encoding the phase probability distribution<br />
|}<br />
<br />
==Translational Non-crystallographic Symmetry==<br />
<br />
<span style="color:crimson">'''*Warning*''' Solution by MR in the presence of translational non-crystallographic symmetry is not fully automated.</span><br />
<br />
Phaser calculates correction factors for the expected intensities in the presence of translational non-crystallographic symmetry (tNCS), and is able to solve structures with complex patterns of tNCS. '''However, the use of Phaser in the presence of tNCS requires the nature of the tNCS to be understood by the user.''' In simple cases, solution is no more difficult than solution without tNCS, but in complex cases, separate Phaser runs with tNCS turned on and off, and/or the use of different tNCS vectors, may be necessary.<br />
<br />
The output of Phaser will help the user in detecting and understanding the tNCS, but '''the tNCS is not completely characterised by Phaser'''. The default behaviour may or may not be correct for the particular crystal under study.<br />
<br />
Characterization of the tNCS involves understanding the number of copies of the molecule in the asymmetric unit and the translation vectors between them. Molecules related by a tNCS vector will have an associated peak in the native Patterson. Phaser calculates the native Patterson (MODE TNCS) and lists the peaks that are more than 20% of the origin peak. Any given crystal with tNCS may have one or more peaks meeting this criteria.<br />
<br />
===Default tNCS detection and correction===<br />
<span style="color:crimson">Documentation for Phaser-2.7.16 and above</span><br />
<br />
====No tNCS====<br />
No tNCS correction is applied by default if there is<br />
# no peak in the native Patterson <br />
# more than one peak in the native Patterson over 20% of the origin and these peaks are not all the result of a commensurate modulation<br />
<br />
====Pairs of molecules====<br />
By default, if Phaser detects a peak in the native Patterson then Phaser will search for molecules in pairs related by the tNCS vector given by the peak in the native Patterson.<br />
<br />
This will be the correct behaviour if and only if there are an even number of copies of the molecule in the asymmetric unit, clustered into two groups related by a single tNCS vector. There will only be one significant peak in the native Patterson. Fortunately, this is a reasonably common scenario.<br />
<br />
Phaser refines the relative orientation of the molecules in the two groups (rotations of up to 10 degrees will still give rise to a significant native Patterson peak) and uses this information to generate expected intensity factors for the reflections. Solution should be straightforward, with the usual caveat for MR that there is a sufficiently good model.<br />
<br />
Where there is a single peak in the native Patterson, it is often located at a position half way along a unit cell axis or diagonal, representing a pseudo-halving of the unit cell dimensions. However, Phaser is by no means restricted to these sorts of pseudo-cells in its handling of two-fold tNCS, and the tNCS vector can be in a general position.<br />
<br />
===Non-default tNCS correction===<br />
====Higher order tNCS====<br />
Frequently, tNCS does not associate 2 clusters of molecules in the asymmetric unit, but rather there are 3 or more (n) clusters of molecules associated by a series of vectors that are multiples of 1, 2, 3 ... (n-1) times a basic translation vector. Where n times the basic translation vector equates to (very close to) integer multiples of unit cell axes, the tNCS represents a pseudo-cell, and this case is known as commensurate modulation. <br />
<br />
Phaser attempts to automatically detect commensurate modulation. The peaks of the native Patterson are analyzed to find the n-fold relationship. The series will not generally have all peaks the same height. Lower peaks in the series represent relationships where the relative rotations between related molecules are larger. Missing peaks in the series may be below the default 20% of origin cut-off. This can be lowered with TNCS PATT PERCENT <x><br />
<br />
Phaser then sets TNCS NMOL <n> and the vector for the tNCS, and searches for ensembles in multiples of NMOL.<br />
<br />
When there are more than two molecules related by tNCS, Phaser does not refine the orientations between the molecules related by the tNCS.<br />
<br />
However, as for two-fold tNCS, Phaser is not restricted to these sorts of pseudo-cells and the basic tNCS vector can be in a general position, as can the number of copies.<br />
<br />
'''The automatic detection may not give the true tNCS relationship'''. For example, the true commensurate modulation may be a factor of the NMOL automatically detected by Phaser, or there may not be commensurate modulation at all, or commensurate modulation may not be found with the default Pattesron peak height cutoff. In difficult cases, please inspect the Patterson for peaks.<br />
<br />
====Complex tNCS====<br />
If there are many molecules in the asymmetric unit but they are not all related by tNCS, or there are sub-groups of molecules related by different tNCS vectors, then the modulations of the expected intensities due to the tNCS will be much less significant than the cases described above. '''In these cases it is possible that structure solution will be achieved without any tNCS correction factors being applied.''' Indeed, searching for all the copies as tNCS-related multiples when some molecules are not related by tNCS will cause structure solution to fail. To turn off the automatic detection and use of tNCS use the keyword TNCS USE OFF.<br />
<br />
If turning off the TNCS correction factors fails to give a solution, then a good approach is to proceed step-wise. Consider the highest native Patterson peak first and determine that nature of the tNCS associated with it. Use the appropriate correction factors to locate all the molecules with this tNCS. Then take the second independent native Patterson peak and apply the correction factors associated with it to find the second set of molecules, fixing the first, etc. Finally, turn TNCS off to find any orphan molecules.</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Molecular_Replacement&diff=2468Molecular Replacement2018-07-04T10:28:16Z<p>Rdo20: /* Automated Molecular Replacement */</p>
<hr />
<div><div style="margin-left: 25px; float: right;">__TOC__</div><br />
<br />
'''Quicklink to example scripts''' -> [[MR using keyword input]]<br />
<br />
'''Quicklink to phaser.famos (find_alt_orig_sym_mate) documentation''' -> [[Famos]]<br />
<br />
Phaser should be able to solve most structures with the Automated Molecular Replacement mode, and this is the first mode that you should try. Give Phaser your data ([[#How to Define Data|How to Define Data]]) and your models ([[#How to Define Models|How to Define Models]]), tell Phaser what to search for, and a list of possible spacegroups (in the same point group).<br />
<br />
If this doesn't work (see [[#Has Phaser Solved It?| Has Phaser Solved It?]]), you can try selecting peaks of lower significance in the rotation function in case the real orientation was not within the selection criteria. By default peaks above 75% of the top peak are selected (see [[#How to Select Peaks| How to Select Peaks]]). See [[#What to do in Difficult Cases| What to do in Difficult Cases]] for more hints and tips. If the automated molecular replacement mode doesn't work even with non-default input you need to run the modes of Phaser separately. The possibilities are endless - you can even try exhaustive searches (translations of all orientations) if you want - but experience has shown that most structures that can be solved by Phaser can be solved by relatively simple strategies.<br />
<br />
==Automated Molecular Replacement==<br />
Automated Molecular Replacement combines the anisotropy correction, likelihood enhanced fast rotation function, likelihood enhanced fast translation function, packing and refinement modes for multiple search models and a set of possible spacegroups to automatically solve a structure by molecular replacement. Top solutions are output to the files FILEROOT.sol, FILEROOT.#.mtz and FILEROOT.#.pdb (where "#" refers to the sorted solution number, 1 being the best, and only 1 is output by default). Many structures can be solved by running an automated molecular replacement search with defaults, giving the ensembles that you expect to be easiest to find first.<br />
<br />
At the completion of Molecular Replacement you may wish to place your solutions on a common origin with a previous solution, for which [[Famos | Famos ]] can be used.<br />
<br />
[[Image:Phaser_MR_auto2.png|Flow Diagram for Automated MR]]<br />
<br />
==Should Phaser Solve It?==<br />
The difficulty of a molecular replacement problem depends primarily on two major factors: how well the model will be able to explain the diffraction data (which depends both on the accuracy of the model and on its completeness), and how many reflections can be explained, at least in part. Each reflection provides a piece of information that helps to identify correct MR solutions.<br />
<br />
It is possible to make a reasonable prediction of whether or not a solution will be found. If the quality of the model (its accuracy and completeness) can be estimated, then the expected contribution of each reflection to the total LLG can also be estimated. From a large battery of tests, we know that an LLG of 40 or greater usually indicates a correct solution (at least in the absence of complicating factors such as translational non-crystallographic symmetry, tNCS). Building on this understanding, if it is estimated that the LLG will be 60 or less, then Phaser will assume that the problem is a difficult one, and will implement search procedures optimised for difficult problems.<br />
<br />
==What Resolution of Data Should be Used?==<br />
The signal for a molecular replacement solution should be very clear if the expected value of the LLG is much higher than the minimum required to be fairly certain of a solution. Currently Phaser aims for a minimum LLG of 120 and, if it is possible to achieve an even higher value, given the quality of the model and the quantity of diffraction data, then the resolution for the initial search is limited to the value required to achieve an expected LLG of 120. Data to the full resolution are still used for a final rigid-body refinement, or in a second pass if a clear solution is not found in the first attempt.<br />
<br />
However, if the model is expected to have a large RMS error (based usually on the correlation between sequence identity and RMS error), then data to high resolution will not contribute any significant signal. Regardless of the expected LLG at the highest resolution limit, the resolution used is limited to 1.8 times the estimated RMS error of the model, because this resolution limit gives about 99% of the LLG that could be achieved.<br />
<br />
Because Phaser implements strategies designed to solve structures with as much confidence as possible, as efficiently as possible, it is best to leave the choice of resolution to Phaser, at least in the first instance.<br />
<br />
==Has Phaser Solved It?==<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! TF Z-score !! Have I solved it?<br />
|-<br />
| less than 5 || no<br />
|-<br />
| 5 - 6 || unlikely<br />
|-<br />
| 6 - 7 || possibly<br />
|-<br />
| 7 - 8 || probably<br />
|-<br />
| more than 8* ||definitely<br />
|-<br />
| *''6 for 1st model in monoclinic space groups'' || <br />
|} <br />
<br />
Ideally, a unique solution with a strong signal will be found at the end of the search. If you are searching for multiple components, then ideally the search for each component will also give a strong signal. However if the signal-to-noise of your search is low, there will be noise peaks and multiple ambiguous solutions. Signal-to-noise is judged using the '''Z-score''', which is computed by comparing the LLG values from the rotation or translation search with LLG values for a set of random rotations or translations. The mean and the RMS deviation from the mean are computed from the random set, then the Z-score for a search peak is defined as its LLG minus the mean, all divided by the RMS deviation, ''i.e. '' '''the number of standard deviations above (or below) the mean. '''<br />
<br />
For a rotation function, the correct orientation may be well down the list with a Z-score (number of standard deviations above the mean value, or RFZ) under 4, and it is often not possible to identify the correct orientation until a translation function is performed and yields a clear solution. Note that the signal-to-noise of the rotation function drops with increasing number of primitive symmetry operations (the number of different orientations for symmetry-related molecules), because there is more uncertainty about how the structure factor contributions from symmetry-related copies will add up.<br />
<br />
For a translation function the correct solution will generally have a Z-score (TFZ) over 5 and be well separated from the rest of the solutions. Of course, there will always be exceptions! The table gives a very rough guide to interpreting TFZ scores. This table will be updated, as we learn more from systematic molecular replacement trials.<br />
<br />
When you are searching for multiple components, the signal may be low for the first few components but, as the model becomes more complete, the signal should become stronger. Finding a clear solution for a new component is a good sign that the partial solution to which that component was added was indeed correct.<br />
<br />
You should always at least glance through the summary of the logfile. One thing to look for, in particular, is whether any translation solutions with a high Z-score have been rejected by the packing step. By default up to 5 percent of marker atoms (C-alpha atoms for protein) are allowed to be involved in clashes. A solution with more clashes may still be correct, and the clashes may arise only because of differences in small surface loops. If this happens, repeat the run allowing a suitable number of clashes. Note that, unless there is specific evidence in the logfile that a high TFZ-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond the default will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
<br />
Note that, by default, Phaser will produce a single PDB file corresponding to the top solution found (if any), so finding a single PDB file in your output directory is not an indication that the search succeeded! You have to look, at least, at the summary of the logfile, or at the list of possible solutions in the .sol file that is produced if you run Phaser from ccp4i or command-line scripts.<br />
<br />
==Annotation==<br />
<br />
A highly compact summary of the history of the statistics of a solution is given in the SOLUTION SET in the .sol file. This is a good place to start your analysis of the output. The annotation gives the Z-score of the solution at each rotation and translation function, the number of clashes in the packing, and the refined LLG.<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! Annotation !! Meaning<br />
|-<br />
| RFZ= || Rotation Function Z-score<br />
|-<br />
| TFZ= || Translation Function Z-score<br />
|-<br />
| PAK= || Number of packing clashes<br />
|-<br />
| LLG= || LLG after refinement. Will be repeated when a low resolution refinement is followed by a high resolution refinement.<br />
|-<br />
| TFZ== || Translation Function Z-score equivalent, only calculated for the top solution after refinement (or for the number of top files specified by TOPFILES)<br />
|-<br />
| RF++ || Rotation angle from previous strong solution has been used in the addition of next solution<br />
|-<br />
| RF*0 || Rotation angle 000 identified by low R-factor of input model<br />
|-<br />
| TFZ=* || First molecule in P1 (arbitrary origin, no Translation Function required)<br />
|-<br />
| TF*0 || Translation vector 000 identified by low R-factor of input model<br />
|-<br />
| (&&nbsp;... & ...) || Set of TFZ PAK and LLG values for placements that were amalgamated (more than one placement from a single Translation Function)<br />
|-<br />
| LLG+=(...&nbsp;&&nbsp;...)&nbsp;|| Set of LLG values calculated during amalgamation, which will always be increasing in value<br />
|-<br />
| +TNCS || Components added by Translational NCS relation<br />
|-<br />
| *T=<i>n</i> || Solution matches template solution <i>n</i><br />
|} <br />
<br />
Two versions of TFZ (the translation function Z-score) now appear for each component. The first ("TFZ=") is the Z-score from the actual translation search, which depends on the accuracy of the orientation used for that search. The second ("TFZ==") is the TFZ-equivalent, which indicates what the TFZ score would have been with the correct (refined) orientation. You should see the TFZ-equivalent is high at least for the final components of the solution, and that the LLG (log-likelihood gain) increases as each component of the solution is added. For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU SET RFZ=10.7 TFZ=24.3 PAK=0 LLG=472 TFZ==24.7 RFZ=6.4 TFZ=24.4 PAK=0 LLG=1006 TFZ==29.7 LLG=1006 TFZ==29.7<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
Note that the Euler angles in Phaser follow the same convention as those defined for the Crowther fast rotation function, i.e. z-y-z (rotate around the z-axis, followed by the new y-axis, followed by the new z-axis).<br />
<br />
==History==<br />
<br />
A highly compact summary of the history of the peak positions of a solution is given in the SOLUTION HISTORY in the .sol file. Together with the SOLUTION SET annotation, this is useful in your analysis of the output. <br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! History !! Meaning<br />
|-<br />
| RF/TF(r/t:n) || (r) Rotation Function peak number/(t) Translation Function peak number for the rotation function : (n) number of peak in final merged and sorted list<br />
|-<br />
| PAK(n:m) || (n) input solution number : (m) output solution number after packing condition applied<br />
|-<br />
| RNP(m,a,b,c,... : p) || All input peaks amalgamated after refinement to give output solution number (m and others): (p) output solution number<br />
|-<br />
| FUSE(A,B,C) || Solution numbers merged in amalgamation<br />
|} <br />
<br />
For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU HISTORY RF/TF(1/1:1)PAK(1:1)RNP(1:1)RNP(1:1)<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
A more complicated structure solution may have<br />
<br />
SOLU HISTORY RF/TF(7/1:10)PAK(10:10)RNP(10,12,13,11,17,16,18,25,3,8,22,21,20,7,969,6,5,201,9,4,390,2,1,19:1)RNP(1:1)<br />
<br />
==What to do in Difficult Cases==<br />
<br />
Not every structure can be solved by molecular replacement, but the right strategy can push the limits. What to do when the default jobs fail depends on why your structure is difficult.<br />
*'''Flexible Structure'''<br />
*:The relative orientations of the domains may be different in your crystal than in the model. If that may be the case, break the model into separate PDB files containing rigid-body units, enter these as separate ensembles, and search for them separately. If you find a convincing solution for one domain, but fail to find a solution for the next domain, you can take advantage of the knowledge that its orientation is likely to be similar to that of the first domain. The ROTAte&nbsp;AROUnd option of the brute rotation search can be used to restrict the search to orientations within, say, 30 degrees of that of the known domain. Allow for close approach of the domains by increasing the allowed clashes with the PACK keyword by, say, 1 for each domain break that you introduce. Note that it is possible to use the brute rotation search as part of the automated molecular replacement pipeline, by changing the choice of the type of rotation search. Alternatively, you could try generating a series of models perturbed by normal modes, with the NMAPdb keyword. One of these may duplicate the hinge motion and provide a good single model.<br />
*'''Poor or Incomplete Model'''<br />
*:Signal-to-noise is reduced by coordinate errors or incompleteness of the model. Since the rotation search has lower signal to begin with than the translation search, it is usually more severely affected. For this reason, it can be very useful to use the subsequent translation search as a way to choose among many (say 1000) orientations. THe MR_AUTO FAST search mode automatically reduces the cutoff for accepting peaks from the fast rotation function if the decault pass does not find a solution with a high z-score, but you can manually reduce this further with the PEAKS and PURGE keywords. You can also try turning off the clustering of fast rotation function peaks because the correct orientation may sit on the shoulder of a peak in the rotation function. <br />
*:As shown convincingly by Schwarzenbacher ''et al.'' (Schwarzenbacher, Godzik, Grzechnik &amp; Jaroszewski, ''Acta Cryst.'' D'''60''', 1229-1236, 2004), judicious editing can make a significant difference in the quality of a distant model. In a number of tests with their data on models below 30% sequence identity, we have found that Phaser works best with a "mixed model" (non-identical sidechains longer than Ser replaced by Ser). In agreement with their results, the best models are generally derived using more sophisticated alignment protocols, such as their FFAS protocol. Use [http://www.phenix-online.org/documentation/sculptor.htm phenix.sculptor] to edit your model.<br />
*'''High Degree of Non-crystallographic Symmetry'''<br />
*:If there are clear peaks in the self-rotation function, you can expect orientations to be related by this known NCS. Methods to automatically use such information will be implemented in a future version of Phaser. In the meantime, you can work out for yourself the orientations that would be consistent with NCS and use the ROTAte&nbsp;AROUnd option to sample similar orientations. Alternatively, you may have an oligomeric model and expect similar NCS in the crystal. First search with the oligomeric model; if this fails, search with a monomer. If that succeeds, you can again use the ROTAte&nbsp;AROUnd option to force a subsequent monomer to adopt an orientation similar to the one you expect.<br />
*'''What <u>not</u> to do'''<br />
*:The automated mode of Phaser is fast when Phaser finds a high Z-score solution to your problem. When Phaser cannot find a solution with a significant Z-score, it "thrashes", meaning it maintains a list of 100-1000's of low Z-score potential solutions and tries to improve them. This can lead to exceptionally long Phaser runs (over a week of CPU time). Such runs are possible because the highly automated script allows many consecutive MR jobs to be run without you having to manually set 100-1000's of jobs running and keep track of the results. "Thrashing" generally does not produce a solution: solutions generally appear relatively quickly or not at all. It is more useful to go back and analyse your models and your data to see where improvements can be made. Your system manager will appreciate you terminating these jobs.<br />
*:It is also not a good idea to effectively remove the packing test. Unless there is specific evidence in the logfile that a high TF-function Z-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond a few (e.g. 1-5) will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
*'''Other suggestions'''<br />
*:Phaser has powerful input, output and scripting facilities that allow a large number of possibilities for altering default behaviour and forcing Phaser to do what you think it should. However, you will need to read the information in the manual below to take advantage of these facilities!<br />
<br />
==How to Define Data==<br />
You need to tell Phaser the name of the mtz file containing your data and the columns in the mtz file to be used using the HKLIn and LABIn keywords. Additional keywords (BINS CELL OUTLier RESOlution SPACegroup) define how the data are used.<br />
<br />
==How to Define Models==<br />
Phaser must be given the models that it will use for molecular replacement. A model in Phaser is referred to as an "ensemble", even when it is described by a single file. This is because it is possible to provide a set of aligned structures as an ensemble, from which a statistically-weighted averaged model is calculated. A molecular replacement model is provided either as one or more aligned pdb files, or as an electron density map, entered as structure factors in an mtz file. Each ensemble is treated as a separate type of rigid body to be placed in the molecular replacement solution. An ensemble should only be defined once, even if there are several copies of the molecule in the asymmetric unit.<br />
<br />
Fundamental to the way in which Phaser uses MR models (either from coordinates or maps) is to estimate how the accuracy of the model falls off as a function of resolution, represented by the Sigma(A) curve. To generate the Sigma(A) curve, Phaser needs to know the RMS coordinate error expected for the model and the fraction of the scattering power in the asymmetric unit that this model contributes.<br />
<br />
A Babinet-style correction is used to account for the effects of disordered solvent on the completeness of the model at low resolution.<br />
<br />
Molecular replacement models are defined with the ENSEmble keyword and the COMPosition keyword. The ENSEmble keyword gives (amongst other things) the RMS deviation for the Sigma(A) curve. The COMPosition keyword is used to deduce the fraction of the scattering power in the asymmetric unit that each ensemble contributes. The composition of the asymmetric unit is defined either by entering the molecular weights or sequences of the components in the asymmetric unit, and giving the number of copies of each. Expert users can also enter the fraction of the scattering of each component directly, although the composition must still be entered for the absolute scale calculation. Please note that the composition supplied to Phaser has to include everything in the asymmetric unit, not just what is being looked for in the current search!<br />
<br />
===Building an Ensemble from Coordinates===<br />
The RMS deviation is determined directly from RMS or indirectly from IDENtity in the ENSEmble<br />
keyword using a formula that depends on the sequence identity and the number of residues in the model.<br />
<br />
The RMS deviation estimated from ID may be an underestimate of the true value if there is a slight conformational change between the model and target structures. To find a solution in these cases it may be necessary to increase the RMS from the default value generated from the ID, by say 0.5 Angstroms. On the other hand, when Phaser succeeds in solving a structure from a model with sequence identity much below 30%, it is often found that the fold is preserved better than the average for that level of sequence identity. So it may be worth submitting a run in which the RMS error is set at, say, 1.5, even if the sequence identity is low. The table below can be used as a guide as to the default RMS value corresponding to ID.<br />
<br />
If you construct a model by homology modelling, remember that the RMS error you expect is essentially the error you expect from the template structure (if not worse!). So specify the sequence identity of the template, not of the homology model.<br />
<br />
Only the model with the highest sequence identity is reported in the output pdb file. Also, HETATM cards in the input pdb file are ignored in the calculation of the structure factors for the ensemble, but are carried through to the output pdb file. Thus, the phases on the output mtz file (which come from the structure factors of the ensemble) do not correspond to those that would be calculated from the output pdb file, when there is more than one pdb file in an ensemble and/or the pdbfile(s) have HETATM records.<br />
<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|+ '''Initial estimate of RMS deviation in Angstrom: Number of residues in model (upper row) versus sequence identity (left column)'''<br />
|-<br />
! !! #50 !! #100 !! #200 !! #300 !! #400 !! #600 !! #850 !! #1000 !! #1500 !! #2000<br />
|-<br />
|'''ID=0%''' || 1.579 || 1.689 || 1.875 || 2.030 || 2.164 || 2.391 || 2.625 || 2.748 || 3.093 || 3.375<br />
|-<br />
|'''ID=10%''' || 1.356 || 1.451 || 1.610 || 1.743 || 1.858 || 2.053 || 2.255 || 2.360 || 2.657 || 2.899<br />
|-<br />
|'''ID=20%''' || 1.165 || 1.246 || 1.383 || 1.497 || 1.596 || 1.764 || 1.936 || 2.027 || 2.281 || 2.489<br />
|-<br />
|'''ID=30%''' || 1.000 || 1.070 || 1.188 || 1.286 || 1.371 || 1.515 || 1.663 || 1.741 || 1.959 || 2.138<br />
|-<br />
|'''ID=40%''' || 0.859 || 0.919 || 1.020 || 1.104 || 1.177 || 1.301 || 1.428 || 1.495 || 1.683 || 1.836<br />
|-<br />
|'''ID=50%''' || 0.738 || 0.789 || 0.876 || 0.948 || 1.011 || 1.117 || 1.227 || 1.284 || 1.445 || 1.577<br />
|-<br />
|'''ID=60%''' || 0.634 || 0.678 || 0.752 || 0.814 || 0.868 || 0.959 || 1.053 || 1.103 || 1.241 || 1.354<br />
|-<br />
|'''ID=70%''' || 0.544 || 0.582 || 0.646 || 0.699 || 0.746 || 0.824 || 0.905 || 0.947 || 1.066 || 1.163<br />
|-<br />
|'''ID=80%''' || 0.467 || 0.500 || 0.555 || 0.601 || 0.640 || 0.708 || 0.777 || 0.813 || 0.915 || 0.999<br />
|-<br />
|'''ID=90%''' || 0.401 || 0.429 || 0.477 || 0.516 || 0.550 || 0.608 || 0.667 || 0.698 || 0.786 || 0.858<br />
|-<br />
|'''ID=100%''' || 0.345 || 0.369 || 0.409 || 0.443 || 0.472 || 0.522 || 0.573 || 0.600 || 0.675 || 0.737<br />
|}<br />
<br />
<br />
====Coordinate Editing====<br />
=====HETATM/LIGANDS=====<br />
Phaser ignores the scattering from HETATM records. The HETATM records are carried though to output with occupancy set to zero. Ligands will therefore not contribute to the scattering used for molecular replacement. The exceptions to this rule are the HETATM records for MSE (seleno-methionine) MSO (seleno-methionine selenoxide) CSE (seleno-cysteine) CSO (seleno-cysteine selenoxide) ALY (acetyllysine) MLY (n-dimethyl-lysine) and MLZ (n-methyl-lysine) which are used in the scattering and carried through to output with their original occupancy. If you wish to include any HETATM records in the scattering the record name use the keyword ENSE modlid HETATOM ON<br />
<br />
=====WATER=====<br />
Water molecules (identified by the residue name OW WAT HOH H2O OH2 MOH WTR or TIP) are deleted from the pdb file on input, are not used in the scattering and are not carried through to file output. If you want to retain water molecules you will need to change the residue name to something other than this (e.g. WWW) so that the atoms are not identified as water. To include the water molecules in the scattering, the HETATM records will also have to be changed to ATOM records as described above.<br />
<br />
===Building an Ensemble from Electron Density===<br />
When using density as a model, it is necessary to specify both the extent (x,y,z limits) of the cut-out region of density, and the centre of this region. With coordinates, Phaser can work this out by itself. This information is needed, for instance, to decide how large rotational steps can be in the rotation search and to carry out the molecular transform interpolation correctly. In the case of electron density, the RMS value does not have the same physical meaning that it has when the model is specified by atomic coordinates, but it is used to judge how the accuracy of the calculated structure factors drops off with resolution. A suitable value for RMS can be obtained, in the case of density from an experimentally-phased map, by choosing a value that makes the SigmaA curve fall off with resolution similarly to the mean figures-of-merit. In the case of density from an EM image reconstruction, the RMS value should make the SigmaA curve fall off similarly to a Fourier correlation curve used to judge the resolution of the EM image.<br />
<br />
For detailed information, including a tutorial with example scripts, see<br />
[[Using Electron Density as a Model| Using density as a model]]<br />
<br />
==How to Define Composition==<br />
The composition defines the total amount of protein and nucleic acid that you have in the asymmetric unit. It is very important to include everything in the composition, not just the components that you are searching for, because Phaser needs to know what fraction of the total scattering is accounted for by each model. For the options that specify the size of a particular component (sequence, number of residues, molecular weight), you can separately define several components of the composition of the asymmetric unit and Phaser will just add them up. Note that, for these options, you can specify the composition of one copy of a component and also say how many copies of that component are expected to be present. You can also mix compositions entered by sequence, number of residues and molecular weight. When the composition is checked, Phaser will check for the plausibility of the composition you have specified, as well as multiples of that composition.<br />
<br />
===Default Composition===<br />
For convenience, the composition defaults to 50% protein scattering by volume (the average for protein crystals). It is better to enter it explicitly, even if only to check that you have correctly deduced the probable content of your crystal. If your crystal has higher or lower solvent content than this, or contains nucleic acid, then the composition should be entered explicitly.<br />
===Composition by Sequence===<br />
The composition is calculated from the amino acid sequence of the protein and the base sequence of the nucleic acid in fasta format.<br />
===Composition by Atom===<br />
Individual atoms can be added to the composition. This allows the explicit addition of heavy atoms in the structure, e.g. Fe atoms.<br />
===Composition by Solvent Content===<br />
Scattering is determined from the solvent content of the crystal, assuming that the crystal contains protein only, and the average distribution of amino acids in protein. If your crystal contains nucleic acid or your protein has an unusual amino acid distribution then the composition should be entered explicitly using the MW or sequence options.<br />
===Composition by Number of Residues in ASU===<br />
Scattering is determined from the number of residues in the asymmetric unit, assuming that the crystal contains protein only or nucleic acid only, and assuming an average distribution of residues for either. If your crystal contains a mixture then the composition should be entered explicitly using the MW or sequence options. If your crystal has an unusual residue distribution then the composition should be entered explicitly using the sequence options.<br />
===Composition by Molecular Weight===<br />
The composition is calculated from the molecular weight of the protein and nucleic acid assuming the protein and nucleic acid have the average distribution of amino acids and bases. If your protein or nucleic acid has an unusual amino acid or base distribution the composition should be entered by sequence. You can mix compositions entered by molecular weight with those entered by sequence.<br />
===Composition by Percentage Scattering===<br />
The fraction scattering of each ensemble can be entered directly. The fraction scattering of each ensemble is normally automatically worked out from the average scattering from each ensemble (calculated from the pdb files if entered as coordinates, or from the protein and nucleic acid molecular weights if entered as a map) divided by the total scattering given by the composition, but entering the fraction scattering directly overrides this calculation. This option is for use when the pdb files of the models in the ensemble are unusual e.g. consist only of C-alpha atoms, or only of hydrogen atoms (as in the CLOUDS method for NMR).<br />
<br />
==How to Define Searches==<br />
Phaser does not compare sequences you specify in the composition with the models you specify as ensembles, so you have to specify separately the number of copies of a particular sequence that you expect to be found in the asymmetric unit of your crystal and the number of copies of each ensemble you want to place in the asymmetric unit. By default, Phaser will search first for ensembles expected to yield the highest signal in the MR search (as judged by the expected LLG or eLLG calculation); if that fails to result in a clear solution, different search orders will be tested automatically. For that reason, it does not normally matter which order you use to specify the searches. There is an option to override Phaser's automatic choice of search order, but this will only rarely be useful. It is best to specify the searches for everything that you hope to find in the MR calculation in one job, as that gives Phaser the greatest scope to optimise the calculation. Note that if your crystal possesses translational non-crystallographic symmetry (tNCS), you should be searching for a number of copies of each ensemble divisible by the order of the tNCS (i.e. the number of molecules that should be related by repeated application of a translation vector).<br />
<br />
==How to Define Solutions==<br />
Phaser writes out files ending in ".sol" and ".rlist" that contain the solution information from the job. The root of the files is given by the ROOT keyword. By default, the root filename is PHASER. These files can be read back into subsequent runs of Phaser to build up solutions containing more than one molecule in the asymmetric unit.<br />
<br />
"PHASER.sol" files are generated by all modes (rotation function modes with VERBOSE output), and contain the current idea of potential molecular replacement solutions.<br />
<br />
"PHASER.rlist" files are generated by the rotation function modes, and are used as input for performing translation functions.<br />
<br />
For simple MR cases you don't really need to know how to define molecular replacement solutions. However, for difficult cases you might need to edit the files "PHASER.sol" and "PHASER.rlist" files manually<br />
<br />
=== "sol" Files===<br />
SOLUtion 6DIM keywords describe Ensembles that have been oriented by a rotation search and positioned by a translation search. Each Ensemble in the asymmetric unit has its own SOLUtion keyword. When more than one (potential) molecular replacement solution is present, the solutions are separated with the SOLUTION SET keywords.<br />
<br />
==="rlist" Files===<br />
These files define a rotation function list. The peak list is given with a series of SOLUtion TRIAl keywords.<br />
<br />
If a partial solution is already known, then the information for the currently "known" parts of the asymmetric unit is given in the form used for the PHASER.sol file, followed by the list of trial orientations for which a translation function is to be performed.<br />
<br />
===Fixed partial structure===<br />
If you have the coordinates of a partial solution with the pdb coordinates of the known structure in the correct orientation and position, then you can force Phaser to use these coordinates. Use the SOLUTION keyword to fix a rotation of 0 0 0 and a position of 0 0 0 for these coordinates.<br />
<br />
==How to Select Peaks==<br />
<br />
<br />
<br />
The selection of peaks saved for output in the rotation and translation functions can be done in four different ways.<br />
*'''Select by Percentage'''<br />
*: Percentage of the top peak, where the value of the top peak is defined as 100% and the value of the mean is defined as 0%.<br />
*: Default, cutoff=75%. This criteria has the advantange that at least one peak (the top peak) always survives the selection. If the top solution is clear, then only the one solution will be output, but if the distribution of peaks is rather flat, then many peaks will be output for testing in the next part of the MR procedure (e.g. many peaks selected from the rotation function for testing with a translation function). <br />
*'''Select by Z-score'''<br />
*: Number of standard deviations (sigmas) over the mean (the Z-score). <br />
*: Absolute significance test. Not all searches will produce output if the cutoff value is too high (e.g. 5 sigma). <br />
*'''Select by Number'''<br />
*: Number of top peaks to select. <br />
*: If the distribution is very flat then it might be better to select a fixed large number (e.g. 1000) of top rotation peaks for testing in the translation function.<br />
*'''No selection'''<br />
*: All peaks are selected. <br />
*: Enables full 6 dimensional searches, where all the solutions from the rotation function are output for testing in the translation function. This should never be necessary; it would be much faster and probably just as likely to work if the top 1000 peaks were used in this way.<br />
<br />
[[Image:Phaser_selection.gif| Selection criteria]]<br />
<br />
Peaks can also be clustered or not clustered prior to selection in steps 1 and 2.<br />
*'''Clustering Off'''<br />
: All high peaks on the search grid are selected<br />
*'''Clustering On'''<br />
: Points on the search grid with higher neighbouring points are removed from the selection<br />
<br />
<br />
[[Image:Phaser_clustering.gif| Clustering]]<br />
<br />
==How to Control Output==<br />
The output of Phaser can be controlled with optional keywords. <br />
<br />
The ROOT keyword is not compulsory (the default root filename is "PHASER"), but should always be given, so that your jobs have separate and meaningful output filenames.<br />
<br />
The TOPFiles keyword controls the number of potential MR solutions for which PDB and (in the appropriate modes) MTZ files are produced.<br />
<br />
For the MR_AUTO, MR_RNP and MR_LLG modes, unless HKLOut OFF is given as an optional keyword, Phaser produces an MTZ file with "SigmaA" type weighted Fourier map coefficients for producing electron density maps for rebuilding.<br />
<br />
{| class="wikitable" style="text-align:left" width=100%<br />
|-<br />
! MTZ Column Labels !! Description<br />
|-<br />
| FWT/PHWT || Amplitude and phase for 2''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| DELFWT/PHDELWT || Amplitude and phase for ''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| FOM || ''m'', analogous to the "Sim" weight, to estimate the reliability of &alpha;<sub>calc</sub><br />
|-<br />
| HLA/HLB/HLC/HLD || Hendrickson-Lattman coefficients encoding the phase probability distribution<br />
|}<br />
<br />
==Translational Non-crystallographic Symmetry==<br />
<br />
<span style="color:crimson">'''*Warning*''' Solution by MR in the presence of translational non-crystallographic symmetry is not fully automated.</span><br />
<br />
Phaser calculates correction factors for the expected intensities in the presence of translational non-crystallographic symmetry (tNCS), and is able to solve structures with complex patterns of tNCS. '''However, the use of Phaser in the presence of tNCS requires the nature of the tNCS to be understood by the user.''' In simple cases, solution is no more difficult than solution without tNCS, but in complex cases, separate Phaser runs with tNCS turned on and off, and/or the use of different tNCS vectors, may be necessary.<br />
<br />
The output of Phaser will help the user in detecting and understanding the tNCS, but '''the tNCS is not completely characterised by Phaser'''. The default behaviour may or may not be correct for the particular crystal under study.<br />
<br />
Characterization of the tNCS involves understanding the number of copies of the molecule in the asymmetric unit and the translation vectors between them. Molecules related by a tNCS vector will have an associated peak in the native Patterson. Phaser calculates the native Patterson (MODE TNCS) and lists the peaks that are more than 20% of the origin peak. Any given crystal with tNCS may have one or more peaks meeting this criteria.<br />
<br />
===Default tNCS detection and correction===<br />
<span style="color:crimson">Documentation for Phaser-2.7.16 and above</span><br />
<br />
====No tNCS====<br />
No tNCS correction is applied by default if there is<br />
# no peak in the native Patterson <br />
# more than one peak in the native Patterson over 20% of the origin and these peaks are not all the result of a commensurate modulation<br />
<br />
====Pairs of molecules====<br />
By default, if Phaser detects a peak in the native Patterson then Phaser will search for molecules in pairs related by the tNCS vector given by the peak in the native Patterson.<br />
<br />
This will be the correct behaviour if and only if there are an even number of copies of the molecule in the asymmetric unit, clustered into two groups related by a single tNCS vector. There will only be one significant peak in the native Patterson. Fortunately, this is a reasonably common scenario.<br />
<br />
Phaser refines the relative orientation of the molecules in the two groups (rotations of up to 10 degrees will still give rise to a significant native Patterson peak) and uses this information to generate expected intensity factors for the reflections. Solution should be straightforward, with the usual caveat for MR that there is a sufficiently good model.<br />
<br />
Where there is a single peak in the native Patterson, it is often located at a position half way along a unit cell axis or diagonal, representing a pseudo-halving of the unit cell dimensions. However, Phaser is by no means restricted to these sorts of pseudo-cells in its handling of two-fold tNCS, and the tNCS vector can be in a general position.<br />
<br />
===Non-default tNCS correction===<br />
====Higher order tNCS====<br />
Frequently, tNCS does not associate 2 clusters of molecules in the asymmetric unit, but rather there are 3 or more (n) clusters of molecules associated by a series of vectors that are multiples of 1, 2, 3 ... (n-1) times a basic translation vector. Where n times the basic translation vector equates to (very close to) integer multiples of unit cell axes, the tNCS represents a pseudo-cell, and this case is known as commensurate modulation. <br />
<br />
Phaser attempts to automatically detect commensurate modulation. The peaks of the native Patterson are analyzed to find the n-fold relationship. The series will not generally have all peaks the same height. Lower peaks in the series represent relationships where the relative rotations between related molecules are larger. Missing peaks in the series may be below the default 20% of origin cut-off. This can be lowered with TNCS PATT PERCENT <x><br />
<br />
Phaser then sets TNCS NMOL <n> and the vector for the tNCS, and searches for ensembles in multiples of NMOL.<br />
<br />
When there are more than two molecules related by tNCS, Phaser does not refine the orientations between the molecules related by the tNCS.<br />
<br />
However, as for two-fold tNCS, Phaser is not restricted to these sorts of pseudo-cells and the basic tNCS vector can be in a general position, as can the number of copies.<br />
<br />
'''The automatic detection may not give the true tNCS relationship'''. For example, the true commensurate modulation may be a factor of the NMOL automatically detected by Phaser, or there may not be commensurate modulation at all, or commensurate modulation may not be found with the default Pattesron peak height cutoff. In difficult cases, please inspect the Patterson for peaks.<br />
<br />
====Complex tNCS====<br />
If there are many molecules in the asymmetric unit but they are not all related by tNCS, or there are sub-groups of molecules related by different tNCS vectors, then the modulations of the expected intensities due to the tNCS will be much less significant than the cases described above. '''In these cases it is possible that structure solution will be achieved without any tNCS correction factors being applied.''' Indeed, searching for all the copies as tNCS-related multiples when some molecules are not related by tNCS will cause structure solution to fail. To turn off the automatic detection and use of tNCS use the keyword TNCS USE OFF.<br />
<br />
If turning off the TNCS correction factors fails to give a solution, then a good approach is to proceed step-wise. Consider the highest native Patterson peak first and determine that nature of the tNCS associated with it. Use the appropriate correction factors to locate all the molecules with this tNCS. Then take the second independent native Patterson peak and apply the correction factors associated with it to find the second set of molecules, fixing the first, etc. Finally, turn TNCS off to find any orphan molecules.</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=File:Phaser_MR_auto2.png&diff=2467File:Phaser MR auto2.png2018-07-04T10:26:49Z<p>Rdo20: </p>
<hr />
<div></div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Molecular_Replacement&diff=2466Molecular Replacement2018-07-01T11:21:17Z<p>Rdo20: /* Automated Molecular Replacement */</p>
<hr />
<div><div style="margin-left: 25px; float: right;">__TOC__</div><br />
<br />
'''Quicklink to example scripts''' -> [[MR using keyword input]]<br />
<br />
'''Quicklink to phaser.famos (find_alt_orig_sym_mate) documentation''' -> [[Famos]]<br />
<br />
Phaser should be able to solve most structures with the Automated Molecular Replacement mode, and this is the first mode that you should try. Give Phaser your data ([[#How to Define Data|How to Define Data]]) and your models ([[#How to Define Models|How to Define Models]]), tell Phaser what to search for, and a list of possible spacegroups (in the same point group).<br />
<br />
If this doesn't work (see [[#Has Phaser Solved It?| Has Phaser Solved It?]]), you can try selecting peaks of lower significance in the rotation function in case the real orientation was not within the selection criteria. By default peaks above 75% of the top peak are selected (see [[#How to Select Peaks| How to Select Peaks]]). See [[#What to do in Difficult Cases| What to do in Difficult Cases]] for more hints and tips. If the automated molecular replacement mode doesn't work even with non-default input you need to run the modes of Phaser separately. The possibilities are endless - you can even try exhaustive searches (translations of all orientations) if you want - but experience has shown that most structures that can be solved by Phaser can be solved by relatively simple strategies.<br />
<br />
==Automated Molecular Replacement==<br />
Automated Molecular Replacement combines the anisotropy correction, likelihood enhanced fast rotation function, likelihood enhanced fast translation function, packing and refinement modes for multiple search models and a set of possible spacegroups to automatically solve a structure by molecular replacement. Top solutions are output to the files FILEROOT.sol, FILEROOT.#.mtz and FILEROOT.#.pdb (where "#" refers to the sorted solution number, 1 being the best, and only 1 is output by default). Many structures can be solved by running an automated molecular replacement search with defaults, giving the ensembles that you expect to be easiest to find first.<br />
<br />
At the completion of Molecular Replacement you may wish to place your solutions on a common origin with a previous solution, for which [[Famos | Famos ]] can be used.<br />
<br />
[[Image:Phaser_MR_auto.gif|Flow Diagram for Automated MR]]<br />
<br />
==Should Phaser Solve It?==<br />
The difficulty of a molecular replacement problem depends primarily on two major factors: how well the model will be able to explain the diffraction data (which depends both on the accuracy of the model and on its completeness), and how many reflections can be explained, at least in part. Each reflection provides a piece of information that helps to identify correct MR solutions.<br />
<br />
It is possible to make a reasonable prediction of whether or not a solution will be found. If the quality of the model (its accuracy and completeness) can be estimated, then the expected contribution of each reflection to the total LLG can also be estimated. From a large battery of tests, we know that an LLG of 40 or greater usually indicates a correct solution (at least in the absence of complicating factors such as translational non-crystallographic symmetry, tNCS). Building on this understanding, if it is estimated that the LLG will be 60 or less, then Phaser will assume that the problem is a difficult one, and will implement search procedures optimised for difficult problems.<br />
<br />
==What Resolution of Data Should be Used?==<br />
The signal for a molecular replacement solution should be very clear if the expected value of the LLG is much higher than the minimum required to be fairly certain of a solution. Currently Phaser aims for a minimum LLG of 120 and, if it is possible to achieve an even higher value, given the quality of the model and the quantity of diffraction data, then the resolution for the initial search is limited to the value required to achieve an expected LLG of 120. Data to the full resolution are still used for a final rigid-body refinement, or in a second pass if a clear solution is not found in the first attempt.<br />
<br />
However, if the model is expected to have a large RMS error (based usually on the correlation between sequence identity and RMS error), then data to high resolution will not contribute any significant signal. Regardless of the expected LLG at the highest resolution limit, the resolution used is limited to 1.8 times the estimated RMS error of the model, because this resolution limit gives about 99% of the LLG that could be achieved.<br />
<br />
Because Phaser implements strategies designed to solve structures with as much confidence as possible, as efficiently as possible, it is best to leave the choice of resolution to Phaser, at least in the first instance.<br />
<br />
==Has Phaser Solved It?==<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! TF Z-score !! Have I solved it?<br />
|-<br />
| less than 5 || no<br />
|-<br />
| 5 - 6 || unlikely<br />
|-<br />
| 6 - 7 || possibly<br />
|-<br />
| 7 - 8 || probably<br />
|-<br />
| more than 8* ||definitely<br />
|-<br />
| *''6 for 1st model in monoclinic space groups'' || <br />
|} <br />
<br />
Ideally, a unique solution with a strong signal will be found at the end of the search. If you are searching for multiple components, then ideally the search for each component will also give a strong signal. However if the signal-to-noise of your search is low, there will be noise peaks and multiple ambiguous solutions. Signal-to-noise is judged using the '''Z-score''', which is computed by comparing the LLG values from the rotation or translation search with LLG values for a set of random rotations or translations. The mean and the RMS deviation from the mean are computed from the random set, then the Z-score for a search peak is defined as its LLG minus the mean, all divided by the RMS deviation, ''i.e. '' '''the number of standard deviations above (or below) the mean. '''<br />
<br />
For a rotation function, the correct orientation may be well down the list with a Z-score (number of standard deviations above the mean value, or RFZ) under 4, and it is often not possible to identify the correct orientation until a translation function is performed and yields a clear solution. Note that the signal-to-noise of the rotation function drops with increasing number of primitive symmetry operations (the number of different orientations for symmetry-related molecules), because there is more uncertainty about how the structure factor contributions from symmetry-related copies will add up.<br />
<br />
For a translation function the correct solution will generally have a Z-score (TFZ) over 5 and be well separated from the rest of the solutions. Of course, there will always be exceptions! The table gives a very rough guide to interpreting TFZ scores. This table will be updated, as we learn more from systematic molecular replacement trials.<br />
<br />
When you are searching for multiple components, the signal may be low for the first few components but, as the model becomes more complete, the signal should become stronger. Finding a clear solution for a new component is a good sign that the partial solution to which that component was added was indeed correct.<br />
<br />
You should always at least glance through the summary of the logfile. One thing to look for, in particular, is whether any translation solutions with a high Z-score have been rejected by the packing step. By default up to 5 percent of marker atoms (C-alpha atoms for protein) are allowed to be involved in clashes. A solution with more clashes may still be correct, and the clashes may arise only because of differences in small surface loops. If this happens, repeat the run allowing a suitable number of clashes. Note that, unless there is specific evidence in the logfile that a high TFZ-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond the default will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
<br />
Note that, by default, Phaser will produce a single PDB file corresponding to the top solution found (if any), so finding a single PDB file in your output directory is not an indication that the search succeeded! You have to look, at least, at the summary of the logfile, or at the list of possible solutions in the .sol file that is produced if you run Phaser from ccp4i or command-line scripts.<br />
<br />
==Annotation==<br />
<br />
A highly compact summary of the history of the statistics of a solution is given in the SOLUTION SET in the .sol file. This is a good place to start your analysis of the output. The annotation gives the Z-score of the solution at each rotation and translation function, the number of clashes in the packing, and the refined LLG.<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! Annotation !! Meaning<br />
|-<br />
| RFZ= || Rotation Function Z-score<br />
|-<br />
| TFZ= || Translation Function Z-score<br />
|-<br />
| PAK= || Number of packing clashes<br />
|-<br />
| LLG= || LLG after refinement. Will be repeated when a low resolution refinement is followed by a high resolution refinement.<br />
|-<br />
| TFZ== || Translation Function Z-score equivalent, only calculated for the top solution after refinement (or for the number of top files specified by TOPFILES)<br />
|-<br />
| RF++ || Rotation angle from previous strong solution has been used in the addition of next solution<br />
|-<br />
| RF*0 || Rotation angle 000 identified by low R-factor of input model<br />
|-<br />
| TFZ=* || First molecule in P1 (arbitrary origin, no Translation Function required)<br />
|-<br />
| TF*0 || Translation vector 000 identified by low R-factor of input model<br />
|-<br />
| (&&nbsp;... & ...) || Set of TFZ PAK and LLG values for placements that were amalgamated (more than one placement from a single Translation Function)<br />
|-<br />
| LLG+=(...&nbsp;&&nbsp;...)&nbsp;|| Set of LLG values calculated during amalgamation, which will always be increasing in value<br />
|-<br />
| +TNCS || Components added by Translational NCS relation<br />
|-<br />
| *T=<i>n</i> || Solution matches template solution <i>n</i><br />
|} <br />
<br />
Two versions of TFZ (the translation function Z-score) now appear for each component. The first ("TFZ=") is the Z-score from the actual translation search, which depends on the accuracy of the orientation used for that search. The second ("TFZ==") is the TFZ-equivalent, which indicates what the TFZ score would have been with the correct (refined) orientation. You should see the TFZ-equivalent is high at least for the final components of the solution, and that the LLG (log-likelihood gain) increases as each component of the solution is added. For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU SET RFZ=10.7 TFZ=24.3 PAK=0 LLG=472 TFZ==24.7 RFZ=6.4 TFZ=24.4 PAK=0 LLG=1006 TFZ==29.7 LLG=1006 TFZ==29.7<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
Note that the Euler angles in Phaser follow the same convention as those defined for the Crowther fast rotation function, i.e. z-y-z (rotate around the z-axis, followed by the new y-axis, followed by the new z-axis).<br />
<br />
==History==<br />
<br />
A highly compact summary of the history of the peak positions of a solution is given in the SOLUTION HISTORY in the .sol file. Together with the SOLUTION SET annotation, this is useful in your analysis of the output. <br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! History !! Meaning<br />
|-<br />
| RF/TF(r/t:n) || (r) Rotation Function peak number/(t) Translation Function peak number for the rotation function : (n) number of peak in final merged and sorted list<br />
|-<br />
| PAK(n:m) || (n) input solution number : (m) output solution number after packing condition applied<br />
|-<br />
| RNP(m,a,b,c,... : p) || All input peaks amalgamated after refinement to give output solution number (m and others): (p) output solution number<br />
|-<br />
| FUSE(A,B,C) || Solution numbers merged in amalgamation<br />
|} <br />
<br />
For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU HISTORY RF/TF(1/1:1)PAK(1:1)RNP(1:1)RNP(1:1)<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
A more complicated structure solution may have<br />
<br />
SOLU HISTORY RF/TF(7/1:10)PAK(10:10)RNP(10,12,13,11,17,16,18,25,3,8,22,21,20,7,969,6,5,201,9,4,390,2,1,19:1)RNP(1:1)<br />
<br />
==What to do in Difficult Cases==<br />
<br />
Not every structure can be solved by molecular replacement, but the right strategy can push the limits. What to do when the default jobs fail depends on why your structure is difficult.<br />
*'''Flexible Structure'''<br />
*:The relative orientations of the domains may be different in your crystal than in the model. If that may be the case, break the model into separate PDB files containing rigid-body units, enter these as separate ensembles, and search for them separately. If you find a convincing solution for one domain, but fail to find a solution for the next domain, you can take advantage of the knowledge that its orientation is likely to be similar to that of the first domain. The ROTAte&nbsp;AROUnd option of the brute rotation search can be used to restrict the search to orientations within, say, 30 degrees of that of the known domain. Allow for close approach of the domains by increasing the allowed clashes with the PACK keyword by, say, 1 for each domain break that you introduce. Note that it is possible to use the brute rotation search as part of the automated molecular replacement pipeline, by changing the choice of the type of rotation search. Alternatively, you could try generating a series of models perturbed by normal modes, with the NMAPdb keyword. One of these may duplicate the hinge motion and provide a good single model.<br />
*'''Poor or Incomplete Model'''<br />
*:Signal-to-noise is reduced by coordinate errors or incompleteness of the model. Since the rotation search has lower signal to begin with than the translation search, it is usually more severely affected. For this reason, it can be very useful to use the subsequent translation search as a way to choose among many (say 1000) orientations. THe MR_AUTO FAST search mode automatically reduces the cutoff for accepting peaks from the fast rotation function if the decault pass does not find a solution with a high z-score, but you can manually reduce this further with the PEAKS and PURGE keywords. You can also try turning off the clustering of fast rotation function peaks because the correct orientation may sit on the shoulder of a peak in the rotation function. <br />
*:As shown convincingly by Schwarzenbacher ''et al.'' (Schwarzenbacher, Godzik, Grzechnik &amp; Jaroszewski, ''Acta Cryst.'' D'''60''', 1229-1236, 2004), judicious editing can make a significant difference in the quality of a distant model. In a number of tests with their data on models below 30% sequence identity, we have found that Phaser works best with a "mixed model" (non-identical sidechains longer than Ser replaced by Ser). In agreement with their results, the best models are generally derived using more sophisticated alignment protocols, such as their FFAS protocol. Use [http://www.phenix-online.org/documentation/sculptor.htm phenix.sculptor] to edit your model.<br />
*'''High Degree of Non-crystallographic Symmetry'''<br />
*:If there are clear peaks in the self-rotation function, you can expect orientations to be related by this known NCS. Methods to automatically use such information will be implemented in a future version of Phaser. In the meantime, you can work out for yourself the orientations that would be consistent with NCS and use the ROTAte&nbsp;AROUnd option to sample similar orientations. Alternatively, you may have an oligomeric model and expect similar NCS in the crystal. First search with the oligomeric model; if this fails, search with a monomer. If that succeeds, you can again use the ROTAte&nbsp;AROUnd option to force a subsequent monomer to adopt an orientation similar to the one you expect.<br />
*'''What <u>not</u> to do'''<br />
*:The automated mode of Phaser is fast when Phaser finds a high Z-score solution to your problem. When Phaser cannot find a solution with a significant Z-score, it "thrashes", meaning it maintains a list of 100-1000's of low Z-score potential solutions and tries to improve them. This can lead to exceptionally long Phaser runs (over a week of CPU time). Such runs are possible because the highly automated script allows many consecutive MR jobs to be run without you having to manually set 100-1000's of jobs running and keep track of the results. "Thrashing" generally does not produce a solution: solutions generally appear relatively quickly or not at all. It is more useful to go back and analyse your models and your data to see where improvements can be made. Your system manager will appreciate you terminating these jobs.<br />
*:It is also not a good idea to effectively remove the packing test. Unless there is specific evidence in the logfile that a high TF-function Z-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond a few (e.g. 1-5) will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
*'''Other suggestions'''<br />
*:Phaser has powerful input, output and scripting facilities that allow a large number of possibilities for altering default behaviour and forcing Phaser to do what you think it should. However, you will need to read the information in the manual below to take advantage of these facilities!<br />
<br />
==How to Define Data==<br />
You need to tell Phaser the name of the mtz file containing your data and the columns in the mtz file to be used using the HKLIn and LABIn keywords. Additional keywords (BINS CELL OUTLier RESOlution SPACegroup) define how the data are used.<br />
<br />
==How to Define Models==<br />
Phaser must be given the models that it will use for molecular replacement. A model in Phaser is referred to as an "ensemble", even when it is described by a single file. This is because it is possible to provide a set of aligned structures as an ensemble, from which a statistically-weighted averaged model is calculated. A molecular replacement model is provided either as one or more aligned pdb files, or as an electron density map, entered as structure factors in an mtz file. Each ensemble is treated as a separate type of rigid body to be placed in the molecular replacement solution. An ensemble should only be defined once, even if there are several copies of the molecule in the asymmetric unit.<br />
<br />
Fundamental to the way in which Phaser uses MR models (either from coordinates or maps) is to estimate how the accuracy of the model falls off as a function of resolution, represented by the Sigma(A) curve. To generate the Sigma(A) curve, Phaser needs to know the RMS coordinate error expected for the model and the fraction of the scattering power in the asymmetric unit that this model contributes.<br />
<br />
A Babinet-style correction is used to account for the effects of disordered solvent on the completeness of the model at low resolution.<br />
<br />
Molecular replacement models are defined with the ENSEmble keyword and the COMPosition keyword. The ENSEmble keyword gives (amongst other things) the RMS deviation for the Sigma(A) curve. The COMPosition keyword is used to deduce the fraction of the scattering power in the asymmetric unit that each ensemble contributes. The composition of the asymmetric unit is defined either by entering the molecular weights or sequences of the components in the asymmetric unit, and giving the number of copies of each. Expert users can also enter the fraction of the scattering of each component directly, although the composition must still be entered for the absolute scale calculation. Please note that the composition supplied to Phaser has to include everything in the asymmetric unit, not just what is being looked for in the current search!<br />
<br />
===Building an Ensemble from Coordinates===<br />
The RMS deviation is determined directly from RMS or indirectly from IDENtity in the ENSEmble<br />
keyword using a formula that depends on the sequence identity and the number of residues in the model.<br />
<br />
The RMS deviation estimated from ID may be an underestimate of the true value if there is a slight conformational change between the model and target structures. To find a solution in these cases it may be necessary to increase the RMS from the default value generated from the ID, by say 0.5 Angstroms. On the other hand, when Phaser succeeds in solving a structure from a model with sequence identity much below 30%, it is often found that the fold is preserved better than the average for that level of sequence identity. So it may be worth submitting a run in which the RMS error is set at, say, 1.5, even if the sequence identity is low. The table below can be used as a guide as to the default RMS value corresponding to ID.<br />
<br />
If you construct a model by homology modelling, remember that the RMS error you expect is essentially the error you expect from the template structure (if not worse!). So specify the sequence identity of the template, not of the homology model.<br />
<br />
Only the model with the highest sequence identity is reported in the output pdb file. Also, HETATM cards in the input pdb file are ignored in the calculation of the structure factors for the ensemble, but are carried through to the output pdb file. Thus, the phases on the output mtz file (which come from the structure factors of the ensemble) do not correspond to those that would be calculated from the output pdb file, when there is more than one pdb file in an ensemble and/or the pdbfile(s) have HETATM records.<br />
<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|+ '''Initial estimate of RMS deviation in Angstrom: Number of residues in model (upper row) versus sequence identity (left column)'''<br />
|-<br />
! !! #50 !! #100 !! #200 !! #300 !! #400 !! #600 !! #850 !! #1000 !! #1500 !! #2000<br />
|-<br />
|'''ID=0%''' || 1.579 || 1.689 || 1.875 || 2.030 || 2.164 || 2.391 || 2.625 || 2.748 || 3.093 || 3.375<br />
|-<br />
|'''ID=10%''' || 1.356 || 1.451 || 1.610 || 1.743 || 1.858 || 2.053 || 2.255 || 2.360 || 2.657 || 2.899<br />
|-<br />
|'''ID=20%''' || 1.165 || 1.246 || 1.383 || 1.497 || 1.596 || 1.764 || 1.936 || 2.027 || 2.281 || 2.489<br />
|-<br />
|'''ID=30%''' || 1.000 || 1.070 || 1.188 || 1.286 || 1.371 || 1.515 || 1.663 || 1.741 || 1.959 || 2.138<br />
|-<br />
|'''ID=40%''' || 0.859 || 0.919 || 1.020 || 1.104 || 1.177 || 1.301 || 1.428 || 1.495 || 1.683 || 1.836<br />
|-<br />
|'''ID=50%''' || 0.738 || 0.789 || 0.876 || 0.948 || 1.011 || 1.117 || 1.227 || 1.284 || 1.445 || 1.577<br />
|-<br />
|'''ID=60%''' || 0.634 || 0.678 || 0.752 || 0.814 || 0.868 || 0.959 || 1.053 || 1.103 || 1.241 || 1.354<br />
|-<br />
|'''ID=70%''' || 0.544 || 0.582 || 0.646 || 0.699 || 0.746 || 0.824 || 0.905 || 0.947 || 1.066 || 1.163<br />
|-<br />
|'''ID=80%''' || 0.467 || 0.500 || 0.555 || 0.601 || 0.640 || 0.708 || 0.777 || 0.813 || 0.915 || 0.999<br />
|-<br />
|'''ID=90%''' || 0.401 || 0.429 || 0.477 || 0.516 || 0.550 || 0.608 || 0.667 || 0.698 || 0.786 || 0.858<br />
|-<br />
|'''ID=100%''' || 0.345 || 0.369 || 0.409 || 0.443 || 0.472 || 0.522 || 0.573 || 0.600 || 0.675 || 0.737<br />
|}<br />
<br />
<br />
====Coordinate Editing====<br />
=====HETATM/LIGANDS=====<br />
Phaser ignores the scattering from HETATM records. The HETATM records are carried though to output with occupancy set to zero. Ligands will therefore not contribute to the scattering used for molecular replacement. The exceptions to this rule are the HETATM records for MSE (seleno-methionine) MSO (seleno-methionine selenoxide) CSE (seleno-cysteine) CSO (seleno-cysteine selenoxide) ALY (acetyllysine) MLY (n-dimethyl-lysine) and MLZ (n-methyl-lysine) which are used in the scattering and carried through to output with their original occupancy. If you wish to include any HETATM records in the scattering the record name use the keyword ENSE modlid HETATOM ON<br />
<br />
=====WATER=====<br />
Water molecules (identified by the residue name OW WAT HOH H2O OH2 MOH WTR or TIP) are deleted from the pdb file on input, are not used in the scattering and are not carried through to file output. If you want to retain water molecules you will need to change the residue name to something other than this (e.g. WWW) so that the atoms are not identified as water. To include the water molecules in the scattering, the HETATM records will also have to be changed to ATOM records as described above.<br />
<br />
===Building an Ensemble from Electron Density===<br />
When using density as a model, it is necessary to specify both the extent (x,y,z limits) of the cut-out region of density, and the centre of this region. With coordinates, Phaser can work this out by itself. This information is needed, for instance, to decide how large rotational steps can be in the rotation search and to carry out the molecular transform interpolation correctly. In the case of electron density, the RMS value does not have the same physical meaning that it has when the model is specified by atomic coordinates, but it is used to judge how the accuracy of the calculated structure factors drops off with resolution. A suitable value for RMS can be obtained, in the case of density from an experimentally-phased map, by choosing a value that makes the SigmaA curve fall off with resolution similarly to the mean figures-of-merit. In the case of density from an EM image reconstruction, the RMS value should make the SigmaA curve fall off similarly to a Fourier correlation curve used to judge the resolution of the EM image.<br />
<br />
For detailed information, including a tutorial with example scripts, see<br />
[[Using Electron Density as a Model| Using density as a model]]<br />
<br />
==How to Define Composition==<br />
The composition defines the total amount of protein and nucleic acid that you have in the asymmetric unit. It is very important to include everything in the composition, not just the components that you are searching for, because Phaser needs to know what fraction of the total scattering is accounted for by each model. For the options that specify the size of a particular component (sequence, number of residues, molecular weight), you can separately define several components of the composition of the asymmetric unit and Phaser will just add them up. Note that, for these options, you can specify the composition of one copy of a component and also say how many copies of that component are expected to be present. You can also mix compositions entered by sequence, number of residues and molecular weight. When the composition is checked, Phaser will check for the plausibility of the composition you have specified, as well as multiples of that composition.<br />
<br />
===Default Composition===<br />
For convenience, the composition defaults to 50% protein scattering by volume (the average for protein crystals). It is better to enter it explicitly, even if only to check that you have correctly deduced the probable content of your crystal. If your crystal has higher or lower solvent content than this, or contains nucleic acid, then the composition should be entered explicitly.<br />
===Composition by Sequence===<br />
The composition is calculated from the amino acid sequence of the protein and the base sequence of the nucleic acid in fasta format.<br />
===Composition by Atom===<br />
Individual atoms can be added to the composition. This allows the explicit addition of heavy atoms in the structure, e.g. Fe atoms.<br />
===Composition by Solvent Content===<br />
Scattering is determined from the solvent content of the crystal, assuming that the crystal contains protein only, and the average distribution of amino acids in protein. If your crystal contains nucleic acid or your protein has an unusual amino acid distribution then the composition should be entered explicitly using the MW or sequence options.<br />
===Composition by Number of Residues in ASU===<br />
Scattering is determined from the number of residues in the asymmetric unit, assuming that the crystal contains protein only or nucleic acid only, and assuming an average distribution of residues for either. If your crystal contains a mixture then the composition should be entered explicitly using the MW or sequence options. If your crystal has an unusual residue distribution then the composition should be entered explicitly using the sequence options.<br />
===Composition by Molecular Weight===<br />
The composition is calculated from the molecular weight of the protein and nucleic acid assuming the protein and nucleic acid have the average distribution of amino acids and bases. If your protein or nucleic acid has an unusual amino acid or base distribution the composition should be entered by sequence. You can mix compositions entered by molecular weight with those entered by sequence.<br />
===Composition by Percentage Scattering===<br />
The fraction scattering of each ensemble can be entered directly. The fraction scattering of each ensemble is normally automatically worked out from the average scattering from each ensemble (calculated from the pdb files if entered as coordinates, or from the protein and nucleic acid molecular weights if entered as a map) divided by the total scattering given by the composition, but entering the fraction scattering directly overrides this calculation. This option is for use when the pdb files of the models in the ensemble are unusual e.g. consist only of C-alpha atoms, or only of hydrogen atoms (as in the CLOUDS method for NMR).<br />
<br />
==How to Define Searches==<br />
Phaser does not compare sequences you specify in the composition with the models you specify as ensembles, so you have to specify separately the number of copies of a particular sequence that you expect to be found in the asymmetric unit of your crystal and the number of copies of each ensemble you want to place in the asymmetric unit. By default, Phaser will search first for ensembles expected to yield the highest signal in the MR search (as judged by the expected LLG or eLLG calculation); if that fails to result in a clear solution, different search orders will be tested automatically. For that reason, it does not normally matter which order you use to specify the searches. There is an option to override Phaser's automatic choice of search order, but this will only rarely be useful. It is best to specify the searches for everything that you hope to find in the MR calculation in one job, as that gives Phaser the greatest scope to optimise the calculation. Note that if your crystal possesses translational non-crystallographic symmetry (tNCS), you should be searching for a number of copies of each ensemble divisible by the order of the tNCS (i.e. the number of molecules that should be related by repeated application of a translation vector).<br />
<br />
==How to Define Solutions==<br />
Phaser writes out files ending in ".sol" and ".rlist" that contain the solution information from the job. The root of the files is given by the ROOT keyword. By default, the root filename is PHASER. These files can be read back into subsequent runs of Phaser to build up solutions containing more than one molecule in the asymmetric unit.<br />
<br />
"PHASER.sol" files are generated by all modes (rotation function modes with VERBOSE output), and contain the current idea of potential molecular replacement solutions.<br />
<br />
"PHASER.rlist" files are generated by the rotation function modes, and are used as input for performing translation functions.<br />
<br />
For simple MR cases you don't really need to know how to define molecular replacement solutions. However, for difficult cases you might need to edit the files "PHASER.sol" and "PHASER.rlist" files manually<br />
<br />
=== "sol" Files===<br />
SOLUtion 6DIM keywords describe Ensembles that have been oriented by a rotation search and positioned by a translation search. Each Ensemble in the asymmetric unit has its own SOLUtion keyword. When more than one (potential) molecular replacement solution is present, the solutions are separated with the SOLUTION SET keywords.<br />
<br />
==="rlist" Files===<br />
These files define a rotation function list. The peak list is given with a series of SOLUtion TRIAl keywords.<br />
<br />
If a partial solution is already known, then the information for the currently "known" parts of the asymmetric unit is given in the form used for the PHASER.sol file, followed by the list of trial orientations for which a translation function is to be performed.<br />
<br />
===Fixed partial structure===<br />
If you have the coordinates of a partial solution with the pdb coordinates of the known structure in the correct orientation and position, then you can force Phaser to use these coordinates. Use the SOLUTION keyword to fix a rotation of 0 0 0 and a position of 0 0 0 for these coordinates.<br />
<br />
==How to Select Peaks==<br />
<br />
<br />
<br />
The selection of peaks saved for output in the rotation and translation functions can be done in four different ways.<br />
*'''Select by Percentage'''<br />
*: Percentage of the top peak, where the value of the top peak is defined as 100% and the value of the mean is defined as 0%.<br />
*: Default, cutoff=75%. This criteria has the advantange that at least one peak (the top peak) always survives the selection. If the top solution is clear, then only the one solution will be output, but if the distribution of peaks is rather flat, then many peaks will be output for testing in the next part of the MR procedure (e.g. many peaks selected from the rotation function for testing with a translation function). <br />
*'''Select by Z-score'''<br />
*: Number of standard deviations (sigmas) over the mean (the Z-score). <br />
*: Absolute significance test. Not all searches will produce output if the cutoff value is too high (e.g. 5 sigma). <br />
*'''Select by Number'''<br />
*: Number of top peaks to select. <br />
*: If the distribution is very flat then it might be better to select a fixed large number (e.g. 1000) of top rotation peaks for testing in the translation function.<br />
*'''No selection'''<br />
*: All peaks are selected. <br />
*: Enables full 6 dimensional searches, where all the solutions from the rotation function are output for testing in the translation function. This should never be necessary; it would be much faster and probably just as likely to work if the top 1000 peaks were used in this way.<br />
<br />
[[Image:Phaser_selection.gif| Selection criteria]]<br />
<br />
Peaks can also be clustered or not clustered prior to selection in steps 1 and 2.<br />
*'''Clustering Off'''<br />
: All high peaks on the search grid are selected<br />
*'''Clustering On'''<br />
: Points on the search grid with higher neighbouring points are removed from the selection<br />
<br />
<br />
[[Image:Phaser_clustering.gif| Clustering]]<br />
<br />
==How to Control Output==<br />
The output of Phaser can be controlled with optional keywords. <br />
<br />
The ROOT keyword is not compulsory (the default root filename is "PHASER"), but should always be given, so that your jobs have separate and meaningful output filenames.<br />
<br />
The TOPFiles keyword controls the number of potential MR solutions for which PDB and (in the appropriate modes) MTZ files are produced.<br />
<br />
For the MR_AUTO, MR_RNP and MR_LLG modes, unless HKLOut OFF is given as an optional keyword, Phaser produces an MTZ file with "SigmaA" type weighted Fourier map coefficients for producing electron density maps for rebuilding.<br />
<br />
{| class="wikitable" style="text-align:left" width=100%<br />
|-<br />
! MTZ Column Labels !! Description<br />
|-<br />
| FWT/PHWT || Amplitude and phase for 2''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| DELFWT/PHDELWT || Amplitude and phase for ''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| FOM || ''m'', analogous to the "Sim" weight, to estimate the reliability of &alpha;<sub>calc</sub><br />
|-<br />
| HLA/HLB/HLC/HLD || Hendrickson-Lattman coefficients encoding the phase probability distribution<br />
|}<br />
<br />
==Translational Non-crystallographic Symmetry==<br />
<br />
<span style="color:crimson">'''*Warning*''' Solution by MR in the presence of translational non-crystallographic symmetry is not fully automated.</span><br />
<br />
Phaser calculates correction factors for the expected intensities in the presence of translational non-crystallographic symmetry (tNCS), and is able to solve structures with complex patterns of tNCS. '''However, the use of Phaser in the presence of tNCS requires the nature of the tNCS to be understood by the user.''' In simple cases, solution is no more difficult than solution without tNCS, but in complex cases, separate Phaser runs with tNCS turned on and off, and/or the use of different tNCS vectors, may be necessary.<br />
<br />
The output of Phaser will help the user in detecting and understanding the tNCS, but '''the tNCS is not completely characterised by Phaser'''. The default behaviour may or may not be correct for the particular crystal under study.<br />
<br />
Characterization of the tNCS involves understanding the number of copies of the molecule in the asymmetric unit and the translation vectors between them. Molecules related by a tNCS vector will have an associated peak in the native Patterson. Phaser calculates the native Patterson (MODE TNCS) and lists the peaks that are more than 20% of the origin peak. Any given crystal with tNCS may have one or more peaks meeting this criteria.<br />
<br />
===Default tNCS detection and correction===<br />
<span style="color:crimson">Documentation for Phaser-2.7.16 and above</span><br />
<br />
====No tNCS====<br />
No tNCS correction is applied by default if there is<br />
# no peak in the native Patterson <br />
# more than one peak in the native Patterson over 20% of the origin and these peaks are not all the result of a commensurate modulation<br />
<br />
====Pairs of molecules====<br />
By default, if Phaser detects a peak in the native Patterson then Phaser will search for molecules in pairs related by the tNCS vector given by the peak in the native Patterson.<br />
<br />
This will be the correct behaviour if and only if there are an even number of copies of the molecule in the asymmetric unit, clustered into two groups related by a single tNCS vector. There will only be one significant peak in the native Patterson. Fortunately, this is a reasonably common scenario.<br />
<br />
Phaser refines the relative orientation of the molecules in the two groups (rotations of up to 10 degrees will still give rise to a significant native Patterson peak) and uses this information to generate expected intensity factors for the reflections. Solution should be straightforward, with the usual caveat for MR that there is a sufficiently good model.<br />
<br />
Where there is a single peak in the native Patterson, it is often located at a position half way along a unit cell axis or diagonal, representing a pseudo-halving of the unit cell dimensions. However, Phaser is by no means restricted to these sorts of pseudo-cells in its handling of two-fold tNCS, and the tNCS vector can be in a general position.<br />
<br />
===Non-default tNCS correction===<br />
====Higher order tNCS====<br />
Frequently, tNCS does not associate 2 clusters of molecules in the asymmetric unit, but rather there are 3 or more (n) clusters of molecules associated by a series of vectors that are multiples of 1, 2, 3 ... (n-1) times a basic translation vector. Where n times the basic translation vector equates to (very close to) integer multiples of unit cell axes, the tNCS represents a pseudo-cell, and this case is known as commensurate modulation. <br />
<br />
Phaser attempts to automatically detect commensurate modulation. The peaks of the native Patterson are analyzed to find the n-fold relationship. The series will not generally have all peaks the same height. Lower peaks in the series represent relationships where the relative rotations between related molecules are larger. Missing peaks in the series may be below the default 20% of origin cut-off. This can be lowered with TNCS PATT PERCENT <x><br />
<br />
Phaser then sets TNCS NMOL <n> and the vector for the tNCS, and searches for ensembles in multiples of NMOL.<br />
<br />
When there are more than two molecules related by tNCS, Phaser does not refine the orientations between the molecules related by the tNCS.<br />
<br />
However, as for two-fold tNCS, Phaser is not restricted to these sorts of pseudo-cells and the basic tNCS vector can be in a general position, as can the number of copies.<br />
<br />
'''The automatic detection may not give the true tNCS relationship'''. For example, the true commensurate modulation may be a factor of the NMOL automatically detected by Phaser, or there may not be commensurate modulation at all, or commensurate modulation may not be found with the default Pattesron peak height cutoff. In difficult cases, please inspect the Patterson for peaks.<br />
<br />
====Complex tNCS====<br />
If there are many molecules in the asymmetric unit but they are not all related by tNCS, or there are sub-groups of molecules related by different tNCS vectors, then the modulations of the expected intensities due to the tNCS will be much less significant than the cases described above. '''In these cases it is possible that structure solution will be achieved without any tNCS correction factors being applied.''' Indeed, searching for all the copies as tNCS-related multiples when some molecules are not related by tNCS will cause structure solution to fail. To turn off the automatic detection and use of tNCS use the keyword TNCS USE OFF.<br />
<br />
If turning off the TNCS correction factors fails to give a solution, then a good approach is to proceed step-wise. Consider the highest native Patterson peak first and determine that nature of the tNCS associated with it. Use the appropriate correction factors to locate all the molecules with this tNCS. Then take the second independent native Patterson peak and apply the correction factors associated with it to find the second set of molecules, fixing the first, etc. Finally, turn TNCS off to find any orphan molecules.</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Molecular_Replacement&diff=2465Molecular Replacement2018-07-01T11:20:14Z<p>Rdo20: </p>
<hr />
<div><div style="margin-left: 25px; float: right;">__TOC__</div><br />
<br />
'''Quicklink to example scripts''' -> [[MR using keyword input]]<br />
<br />
'''Quicklink to phaser.famos (find_alt_orig_sym_mate) documentation''' -> [[Famos]]<br />
<br />
Phaser should be able to solve most structures with the Automated Molecular Replacement mode, and this is the first mode that you should try. Give Phaser your data ([[#How to Define Data|How to Define Data]]) and your models ([[#How to Define Models|How to Define Models]]), tell Phaser what to search for, and a list of possible spacegroups (in the same point group).<br />
<br />
If this doesn't work (see [[#Has Phaser Solved It?| Has Phaser Solved It?]]), you can try selecting peaks of lower significance in the rotation function in case the real orientation was not within the selection criteria. By default peaks above 75% of the top peak are selected (see [[#How to Select Peaks| How to Select Peaks]]). See [[#What to do in Difficult Cases| What to do in Difficult Cases]] for more hints and tips. If the automated molecular replacement mode doesn't work even with non-default input you need to run the modes of Phaser separately. The possibilities are endless - you can even try exhaustive searches (translations of all orientations) if you want - but experience has shown that most structures that can be solved by Phaser can be solved by relatively simple strategies.<br />
<br />
==Automated Molecular Replacement==<br />
Automated Molecular Replacement combines the anisotropy correction, likelihood enhanced fast rotation function, likelihood enhanced fast translation function, packing and refinement modes for multiple search models and a set of possible spacegroups to automatically solve a structure by molecular replacement. Top solutions are output to the files FILEROOT.sol, FILEROOT.#.mtz and FILEROOT.#.pdb (where "#" refers to the sorted solution number, 1 being the best, and only 1 is output by default). Many structures can be solved by running an automated molecular replacement search with defaults, giving the ensembles that you expect to be easiest to find first.<br />
<br />
At the completion of Molecular Replacement you may wish to place your solutions on a common origin with a previous solution, for which [[Famos | Famos ]] can be used.<br />
<br />
[[Image:Phaser_MR_auto2.png|Flow Diagram for Automated MR]]<br />
<br />
==Should Phaser Solve It?==<br />
The difficulty of a molecular replacement problem depends primarily on two major factors: how well the model will be able to explain the diffraction data (which depends both on the accuracy of the model and on its completeness), and how many reflections can be explained, at least in part. Each reflection provides a piece of information that helps to identify correct MR solutions.<br />
<br />
It is possible to make a reasonable prediction of whether or not a solution will be found. If the quality of the model (its accuracy and completeness) can be estimated, then the expected contribution of each reflection to the total LLG can also be estimated. From a large battery of tests, we know that an LLG of 40 or greater usually indicates a correct solution (at least in the absence of complicating factors such as translational non-crystallographic symmetry, tNCS). Building on this understanding, if it is estimated that the LLG will be 60 or less, then Phaser will assume that the problem is a difficult one, and will implement search procedures optimised for difficult problems.<br />
<br />
==What Resolution of Data Should be Used?==<br />
The signal for a molecular replacement solution should be very clear if the expected value of the LLG is much higher than the minimum required to be fairly certain of a solution. Currently Phaser aims for a minimum LLG of 120 and, if it is possible to achieve an even higher value, given the quality of the model and the quantity of diffraction data, then the resolution for the initial search is limited to the value required to achieve an expected LLG of 120. Data to the full resolution are still used for a final rigid-body refinement, or in a second pass if a clear solution is not found in the first attempt.<br />
<br />
However, if the model is expected to have a large RMS error (based usually on the correlation between sequence identity and RMS error), then data to high resolution will not contribute any significant signal. Regardless of the expected LLG at the highest resolution limit, the resolution used is limited to 1.8 times the estimated RMS error of the model, because this resolution limit gives about 99% of the LLG that could be achieved.<br />
<br />
Because Phaser implements strategies designed to solve structures with as much confidence as possible, as efficiently as possible, it is best to leave the choice of resolution to Phaser, at least in the first instance.<br />
<br />
==Has Phaser Solved It?==<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! TF Z-score !! Have I solved it?<br />
|-<br />
| less than 5 || no<br />
|-<br />
| 5 - 6 || unlikely<br />
|-<br />
| 6 - 7 || possibly<br />
|-<br />
| 7 - 8 || probably<br />
|-<br />
| more than 8* ||definitely<br />
|-<br />
| *''6 for 1st model in monoclinic space groups'' || <br />
|} <br />
<br />
Ideally, a unique solution with a strong signal will be found at the end of the search. If you are searching for multiple components, then ideally the search for each component will also give a strong signal. However if the signal-to-noise of your search is low, there will be noise peaks and multiple ambiguous solutions. Signal-to-noise is judged using the '''Z-score''', which is computed by comparing the LLG values from the rotation or translation search with LLG values for a set of random rotations or translations. The mean and the RMS deviation from the mean are computed from the random set, then the Z-score for a search peak is defined as its LLG minus the mean, all divided by the RMS deviation, ''i.e. '' '''the number of standard deviations above (or below) the mean. '''<br />
<br />
For a rotation function, the correct orientation may be well down the list with a Z-score (number of standard deviations above the mean value, or RFZ) under 4, and it is often not possible to identify the correct orientation until a translation function is performed and yields a clear solution. Note that the signal-to-noise of the rotation function drops with increasing number of primitive symmetry operations (the number of different orientations for symmetry-related molecules), because there is more uncertainty about how the structure factor contributions from symmetry-related copies will add up.<br />
<br />
For a translation function the correct solution will generally have a Z-score (TFZ) over 5 and be well separated from the rest of the solutions. Of course, there will always be exceptions! The table gives a very rough guide to interpreting TFZ scores. This table will be updated, as we learn more from systematic molecular replacement trials.<br />
<br />
When you are searching for multiple components, the signal may be low for the first few components but, as the model becomes more complete, the signal should become stronger. Finding a clear solution for a new component is a good sign that the partial solution to which that component was added was indeed correct.<br />
<br />
You should always at least glance through the summary of the logfile. One thing to look for, in particular, is whether any translation solutions with a high Z-score have been rejected by the packing step. By default up to 5 percent of marker atoms (C-alpha atoms for protein) are allowed to be involved in clashes. A solution with more clashes may still be correct, and the clashes may arise only because of differences in small surface loops. If this happens, repeat the run allowing a suitable number of clashes. Note that, unless there is specific evidence in the logfile that a high TFZ-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond the default will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
<br />
Note that, by default, Phaser will produce a single PDB file corresponding to the top solution found (if any), so finding a single PDB file in your output directory is not an indication that the search succeeded! You have to look, at least, at the summary of the logfile, or at the list of possible solutions in the .sol file that is produced if you run Phaser from ccp4i or command-line scripts.<br />
<br />
==Annotation==<br />
<br />
A highly compact summary of the history of the statistics of a solution is given in the SOLUTION SET in the .sol file. This is a good place to start your analysis of the output. The annotation gives the Z-score of the solution at each rotation and translation function, the number of clashes in the packing, and the refined LLG.<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! Annotation !! Meaning<br />
|-<br />
| RFZ= || Rotation Function Z-score<br />
|-<br />
| TFZ= || Translation Function Z-score<br />
|-<br />
| PAK= || Number of packing clashes<br />
|-<br />
| LLG= || LLG after refinement. Will be repeated when a low resolution refinement is followed by a high resolution refinement.<br />
|-<br />
| TFZ== || Translation Function Z-score equivalent, only calculated for the top solution after refinement (or for the number of top files specified by TOPFILES)<br />
|-<br />
| RF++ || Rotation angle from previous strong solution has been used in the addition of next solution<br />
|-<br />
| RF*0 || Rotation angle 000 identified by low R-factor of input model<br />
|-<br />
| TFZ=* || First molecule in P1 (arbitrary origin, no Translation Function required)<br />
|-<br />
| TF*0 || Translation vector 000 identified by low R-factor of input model<br />
|-<br />
| (&&nbsp;... & ...) || Set of TFZ PAK and LLG values for placements that were amalgamated (more than one placement from a single Translation Function)<br />
|-<br />
| LLG+=(...&nbsp;&&nbsp;...)&nbsp;|| Set of LLG values calculated during amalgamation, which will always be increasing in value<br />
|-<br />
| +TNCS || Components added by Translational NCS relation<br />
|-<br />
| *T=<i>n</i> || Solution matches template solution <i>n</i><br />
|} <br />
<br />
Two versions of TFZ (the translation function Z-score) now appear for each component. The first ("TFZ=") is the Z-score from the actual translation search, which depends on the accuracy of the orientation used for that search. The second ("TFZ==") is the TFZ-equivalent, which indicates what the TFZ score would have been with the correct (refined) orientation. You should see the TFZ-equivalent is high at least for the final components of the solution, and that the LLG (log-likelihood gain) increases as each component of the solution is added. For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU SET RFZ=10.7 TFZ=24.3 PAK=0 LLG=472 TFZ==24.7 RFZ=6.4 TFZ=24.4 PAK=0 LLG=1006 TFZ==29.7 LLG=1006 TFZ==29.7<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
Note that the Euler angles in Phaser follow the same convention as those defined for the Crowther fast rotation function, i.e. z-y-z (rotate around the z-axis, followed by the new y-axis, followed by the new z-axis).<br />
<br />
==History==<br />
<br />
A highly compact summary of the history of the peak positions of a solution is given in the SOLUTION HISTORY in the .sol file. Together with the SOLUTION SET annotation, this is useful in your analysis of the output. <br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! History !! Meaning<br />
|-<br />
| RF/TF(r/t:n) || (r) Rotation Function peak number/(t) Translation Function peak number for the rotation function : (n) number of peak in final merged and sorted list<br />
|-<br />
| PAK(n:m) || (n) input solution number : (m) output solution number after packing condition applied<br />
|-<br />
| RNP(m,a,b,c,... : p) || All input peaks amalgamated after refinement to give output solution number (m and others): (p) output solution number<br />
|-<br />
| FUSE(A,B,C) || Solution numbers merged in amalgamation<br />
|} <br />
<br />
For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU HISTORY RF/TF(1/1:1)PAK(1:1)RNP(1:1)RNP(1:1)<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
A more complicated structure solution may have<br />
<br />
SOLU HISTORY RF/TF(7/1:10)PAK(10:10)RNP(10,12,13,11,17,16,18,25,3,8,22,21,20,7,969,6,5,201,9,4,390,2,1,19:1)RNP(1:1)<br />
<br />
==What to do in Difficult Cases==<br />
<br />
Not every structure can be solved by molecular replacement, but the right strategy can push the limits. What to do when the default jobs fail depends on why your structure is difficult.<br />
*'''Flexible Structure'''<br />
*:The relative orientations of the domains may be different in your crystal than in the model. If that may be the case, break the model into separate PDB files containing rigid-body units, enter these as separate ensembles, and search for them separately. If you find a convincing solution for one domain, but fail to find a solution for the next domain, you can take advantage of the knowledge that its orientation is likely to be similar to that of the first domain. The ROTAte&nbsp;AROUnd option of the brute rotation search can be used to restrict the search to orientations within, say, 30 degrees of that of the known domain. Allow for close approach of the domains by increasing the allowed clashes with the PACK keyword by, say, 1 for each domain break that you introduce. Note that it is possible to use the brute rotation search as part of the automated molecular replacement pipeline, by changing the choice of the type of rotation search. Alternatively, you could try generating a series of models perturbed by normal modes, with the NMAPdb keyword. One of these may duplicate the hinge motion and provide a good single model.<br />
*'''Poor or Incomplete Model'''<br />
*:Signal-to-noise is reduced by coordinate errors or incompleteness of the model. Since the rotation search has lower signal to begin with than the translation search, it is usually more severely affected. For this reason, it can be very useful to use the subsequent translation search as a way to choose among many (say 1000) orientations. THe MR_AUTO FAST search mode automatically reduces the cutoff for accepting peaks from the fast rotation function if the decault pass does not find a solution with a high z-score, but you can manually reduce this further with the PEAKS and PURGE keywords. You can also try turning off the clustering of fast rotation function peaks because the correct orientation may sit on the shoulder of a peak in the rotation function. <br />
*:As shown convincingly by Schwarzenbacher ''et al.'' (Schwarzenbacher, Godzik, Grzechnik &amp; Jaroszewski, ''Acta Cryst.'' D'''60''', 1229-1236, 2004), judicious editing can make a significant difference in the quality of a distant model. In a number of tests with their data on models below 30% sequence identity, we have found that Phaser works best with a "mixed model" (non-identical sidechains longer than Ser replaced by Ser). In agreement with their results, the best models are generally derived using more sophisticated alignment protocols, such as their FFAS protocol. Use [http://www.phenix-online.org/documentation/sculptor.htm phenix.sculptor] to edit your model.<br />
*'''High Degree of Non-crystallographic Symmetry'''<br />
*:If there are clear peaks in the self-rotation function, you can expect orientations to be related by this known NCS. Methods to automatically use such information will be implemented in a future version of Phaser. In the meantime, you can work out for yourself the orientations that would be consistent with NCS and use the ROTAte&nbsp;AROUnd option to sample similar orientations. Alternatively, you may have an oligomeric model and expect similar NCS in the crystal. First search with the oligomeric model; if this fails, search with a monomer. If that succeeds, you can again use the ROTAte&nbsp;AROUnd option to force a subsequent monomer to adopt an orientation similar to the one you expect.<br />
*'''What <u>not</u> to do'''<br />
*:The automated mode of Phaser is fast when Phaser finds a high Z-score solution to your problem. When Phaser cannot find a solution with a significant Z-score, it "thrashes", meaning it maintains a list of 100-1000's of low Z-score potential solutions and tries to improve them. This can lead to exceptionally long Phaser runs (over a week of CPU time). Such runs are possible because the highly automated script allows many consecutive MR jobs to be run without you having to manually set 100-1000's of jobs running and keep track of the results. "Thrashing" generally does not produce a solution: solutions generally appear relatively quickly or not at all. It is more useful to go back and analyse your models and your data to see where improvements can be made. Your system manager will appreciate you terminating these jobs.<br />
*:It is also not a good idea to effectively remove the packing test. Unless there is specific evidence in the logfile that a high TF-function Z-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond a few (e.g. 1-5) will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
*'''Other suggestions'''<br />
*:Phaser has powerful input, output and scripting facilities that allow a large number of possibilities for altering default behaviour and forcing Phaser to do what you think it should. However, you will need to read the information in the manual below to take advantage of these facilities!<br />
<br />
==How to Define Data==<br />
You need to tell Phaser the name of the mtz file containing your data and the columns in the mtz file to be used using the HKLIn and LABIn keywords. Additional keywords (BINS CELL OUTLier RESOlution SPACegroup) define how the data are used.<br />
<br />
==How to Define Models==<br />
Phaser must be given the models that it will use for molecular replacement. A model in Phaser is referred to as an "ensemble", even when it is described by a single file. This is because it is possible to provide a set of aligned structures as an ensemble, from which a statistically-weighted averaged model is calculated. A molecular replacement model is provided either as one or more aligned pdb files, or as an electron density map, entered as structure factors in an mtz file. Each ensemble is treated as a separate type of rigid body to be placed in the molecular replacement solution. An ensemble should only be defined once, even if there are several copies of the molecule in the asymmetric unit.<br />
<br />
Fundamental to the way in which Phaser uses MR models (either from coordinates or maps) is to estimate how the accuracy of the model falls off as a function of resolution, represented by the Sigma(A) curve. To generate the Sigma(A) curve, Phaser needs to know the RMS coordinate error expected for the model and the fraction of the scattering power in the asymmetric unit that this model contributes.<br />
<br />
A Babinet-style correction is used to account for the effects of disordered solvent on the completeness of the model at low resolution.<br />
<br />
Molecular replacement models are defined with the ENSEmble keyword and the COMPosition keyword. The ENSEmble keyword gives (amongst other things) the RMS deviation for the Sigma(A) curve. The COMPosition keyword is used to deduce the fraction of the scattering power in the asymmetric unit that each ensemble contributes. The composition of the asymmetric unit is defined either by entering the molecular weights or sequences of the components in the asymmetric unit, and giving the number of copies of each. Expert users can also enter the fraction of the scattering of each component directly, although the composition must still be entered for the absolute scale calculation. Please note that the composition supplied to Phaser has to include everything in the asymmetric unit, not just what is being looked for in the current search!<br />
<br />
===Building an Ensemble from Coordinates===<br />
The RMS deviation is determined directly from RMS or indirectly from IDENtity in the ENSEmble<br />
keyword using a formula that depends on the sequence identity and the number of residues in the model.<br />
<br />
The RMS deviation estimated from ID may be an underestimate of the true value if there is a slight conformational change between the model and target structures. To find a solution in these cases it may be necessary to increase the RMS from the default value generated from the ID, by say 0.5 Angstroms. On the other hand, when Phaser succeeds in solving a structure from a model with sequence identity much below 30%, it is often found that the fold is preserved better than the average for that level of sequence identity. So it may be worth submitting a run in which the RMS error is set at, say, 1.5, even if the sequence identity is low. The table below can be used as a guide as to the default RMS value corresponding to ID.<br />
<br />
If you construct a model by homology modelling, remember that the RMS error you expect is essentially the error you expect from the template structure (if not worse!). So specify the sequence identity of the template, not of the homology model.<br />
<br />
Only the model with the highest sequence identity is reported in the output pdb file. Also, HETATM cards in the input pdb file are ignored in the calculation of the structure factors for the ensemble, but are carried through to the output pdb file. Thus, the phases on the output mtz file (which come from the structure factors of the ensemble) do not correspond to those that would be calculated from the output pdb file, when there is more than one pdb file in an ensemble and/or the pdbfile(s) have HETATM records.<br />
<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|+ '''Initial estimate of RMS deviation in Angstrom: Number of residues in model (upper row) versus sequence identity (left column)'''<br />
|-<br />
! !! #50 !! #100 !! #200 !! #300 !! #400 !! #600 !! #850 !! #1000 !! #1500 !! #2000<br />
|-<br />
|'''ID=0%''' || 1.579 || 1.689 || 1.875 || 2.030 || 2.164 || 2.391 || 2.625 || 2.748 || 3.093 || 3.375<br />
|-<br />
|'''ID=10%''' || 1.356 || 1.451 || 1.610 || 1.743 || 1.858 || 2.053 || 2.255 || 2.360 || 2.657 || 2.899<br />
|-<br />
|'''ID=20%''' || 1.165 || 1.246 || 1.383 || 1.497 || 1.596 || 1.764 || 1.936 || 2.027 || 2.281 || 2.489<br />
|-<br />
|'''ID=30%''' || 1.000 || 1.070 || 1.188 || 1.286 || 1.371 || 1.515 || 1.663 || 1.741 || 1.959 || 2.138<br />
|-<br />
|'''ID=40%''' || 0.859 || 0.919 || 1.020 || 1.104 || 1.177 || 1.301 || 1.428 || 1.495 || 1.683 || 1.836<br />
|-<br />
|'''ID=50%''' || 0.738 || 0.789 || 0.876 || 0.948 || 1.011 || 1.117 || 1.227 || 1.284 || 1.445 || 1.577<br />
|-<br />
|'''ID=60%''' || 0.634 || 0.678 || 0.752 || 0.814 || 0.868 || 0.959 || 1.053 || 1.103 || 1.241 || 1.354<br />
|-<br />
|'''ID=70%''' || 0.544 || 0.582 || 0.646 || 0.699 || 0.746 || 0.824 || 0.905 || 0.947 || 1.066 || 1.163<br />
|-<br />
|'''ID=80%''' || 0.467 || 0.500 || 0.555 || 0.601 || 0.640 || 0.708 || 0.777 || 0.813 || 0.915 || 0.999<br />
|-<br />
|'''ID=90%''' || 0.401 || 0.429 || 0.477 || 0.516 || 0.550 || 0.608 || 0.667 || 0.698 || 0.786 || 0.858<br />
|-<br />
|'''ID=100%''' || 0.345 || 0.369 || 0.409 || 0.443 || 0.472 || 0.522 || 0.573 || 0.600 || 0.675 || 0.737<br />
|}<br />
<br />
<br />
====Coordinate Editing====<br />
=====HETATM/LIGANDS=====<br />
Phaser ignores the scattering from HETATM records. The HETATM records are carried though to output with occupancy set to zero. Ligands will therefore not contribute to the scattering used for molecular replacement. The exceptions to this rule are the HETATM records for MSE (seleno-methionine) MSO (seleno-methionine selenoxide) CSE (seleno-cysteine) CSO (seleno-cysteine selenoxide) ALY (acetyllysine) MLY (n-dimethyl-lysine) and MLZ (n-methyl-lysine) which are used in the scattering and carried through to output with their original occupancy. If you wish to include any HETATM records in the scattering the record name use the keyword ENSE modlid HETATOM ON<br />
<br />
=====WATER=====<br />
Water molecules (identified by the residue name OW WAT HOH H2O OH2 MOH WTR or TIP) are deleted from the pdb file on input, are not used in the scattering and are not carried through to file output. If you want to retain water molecules you will need to change the residue name to something other than this (e.g. WWW) so that the atoms are not identified as water. To include the water molecules in the scattering, the HETATM records will also have to be changed to ATOM records as described above.<br />
<br />
===Building an Ensemble from Electron Density===<br />
When using density as a model, it is necessary to specify both the extent (x,y,z limits) of the cut-out region of density, and the centre of this region. With coordinates, Phaser can work this out by itself. This information is needed, for instance, to decide how large rotational steps can be in the rotation search and to carry out the molecular transform interpolation correctly. In the case of electron density, the RMS value does not have the same physical meaning that it has when the model is specified by atomic coordinates, but it is used to judge how the accuracy of the calculated structure factors drops off with resolution. A suitable value for RMS can be obtained, in the case of density from an experimentally-phased map, by choosing a value that makes the SigmaA curve fall off with resolution similarly to the mean figures-of-merit. In the case of density from an EM image reconstruction, the RMS value should make the SigmaA curve fall off similarly to a Fourier correlation curve used to judge the resolution of the EM image.<br />
<br />
For detailed information, including a tutorial with example scripts, see<br />
[[Using Electron Density as a Model| Using density as a model]]<br />
<br />
==How to Define Composition==<br />
The composition defines the total amount of protein and nucleic acid that you have in the asymmetric unit. It is very important to include everything in the composition, not just the components that you are searching for, because Phaser needs to know what fraction of the total scattering is accounted for by each model. For the options that specify the size of a particular component (sequence, number of residues, molecular weight), you can separately define several components of the composition of the asymmetric unit and Phaser will just add them up. Note that, for these options, you can specify the composition of one copy of a component and also say how many copies of that component are expected to be present. You can also mix compositions entered by sequence, number of residues and molecular weight. When the composition is checked, Phaser will check for the plausibility of the composition you have specified, as well as multiples of that composition.<br />
<br />
===Default Composition===<br />
For convenience, the composition defaults to 50% protein scattering by volume (the average for protein crystals). It is better to enter it explicitly, even if only to check that you have correctly deduced the probable content of your crystal. If your crystal has higher or lower solvent content than this, or contains nucleic acid, then the composition should be entered explicitly.<br />
===Composition by Sequence===<br />
The composition is calculated from the amino acid sequence of the protein and the base sequence of the nucleic acid in fasta format.<br />
===Composition by Atom===<br />
Individual atoms can be added to the composition. This allows the explicit addition of heavy atoms in the structure, e.g. Fe atoms.<br />
===Composition by Solvent Content===<br />
Scattering is determined from the solvent content of the crystal, assuming that the crystal contains protein only, and the average distribution of amino acids in protein. If your crystal contains nucleic acid or your protein has an unusual amino acid distribution then the composition should be entered explicitly using the MW or sequence options.<br />
===Composition by Number of Residues in ASU===<br />
Scattering is determined from the number of residues in the asymmetric unit, assuming that the crystal contains protein only or nucleic acid only, and assuming an average distribution of residues for either. If your crystal contains a mixture then the composition should be entered explicitly using the MW or sequence options. If your crystal has an unusual residue distribution then the composition should be entered explicitly using the sequence options.<br />
===Composition by Molecular Weight===<br />
The composition is calculated from the molecular weight of the protein and nucleic acid assuming the protein and nucleic acid have the average distribution of amino acids and bases. If your protein or nucleic acid has an unusual amino acid or base distribution the composition should be entered by sequence. You can mix compositions entered by molecular weight with those entered by sequence.<br />
===Composition by Percentage Scattering===<br />
The fraction scattering of each ensemble can be entered directly. The fraction scattering of each ensemble is normally automatically worked out from the average scattering from each ensemble (calculated from the pdb files if entered as coordinates, or from the protein and nucleic acid molecular weights if entered as a map) divided by the total scattering given by the composition, but entering the fraction scattering directly overrides this calculation. This option is for use when the pdb files of the models in the ensemble are unusual e.g. consist only of C-alpha atoms, or only of hydrogen atoms (as in the CLOUDS method for NMR).<br />
<br />
==How to Define Searches==<br />
Phaser does not compare sequences you specify in the composition with the models you specify as ensembles, so you have to specify separately the number of copies of a particular sequence that you expect to be found in the asymmetric unit of your crystal and the number of copies of each ensemble you want to place in the asymmetric unit. By default, Phaser will search first for ensembles expected to yield the highest signal in the MR search (as judged by the expected LLG or eLLG calculation); if that fails to result in a clear solution, different search orders will be tested automatically. For that reason, it does not normally matter which order you use to specify the searches. There is an option to override Phaser's automatic choice of search order, but this will only rarely be useful. It is best to specify the searches for everything that you hope to find in the MR calculation in one job, as that gives Phaser the greatest scope to optimise the calculation. Note that if your crystal possesses translational non-crystallographic symmetry (tNCS), you should be searching for a number of copies of each ensemble divisible by the order of the tNCS (i.e. the number of molecules that should be related by repeated application of a translation vector).<br />
<br />
==How to Define Solutions==<br />
Phaser writes out files ending in ".sol" and ".rlist" that contain the solution information from the job. The root of the files is given by the ROOT keyword. By default, the root filename is PHASER. These files can be read back into subsequent runs of Phaser to build up solutions containing more than one molecule in the asymmetric unit.<br />
<br />
"PHASER.sol" files are generated by all modes (rotation function modes with VERBOSE output), and contain the current idea of potential molecular replacement solutions.<br />
<br />
"PHASER.rlist" files are generated by the rotation function modes, and are used as input for performing translation functions.<br />
<br />
For simple MR cases you don't really need to know how to define molecular replacement solutions. However, for difficult cases you might need to edit the files "PHASER.sol" and "PHASER.rlist" files manually<br />
<br />
=== "sol" Files===<br />
SOLUtion 6DIM keywords describe Ensembles that have been oriented by a rotation search and positioned by a translation search. Each Ensemble in the asymmetric unit has its own SOLUtion keyword. When more than one (potential) molecular replacement solution is present, the solutions are separated with the SOLUTION SET keywords.<br />
<br />
==="rlist" Files===<br />
These files define a rotation function list. The peak list is given with a series of SOLUtion TRIAl keywords.<br />
<br />
If a partial solution is already known, then the information for the currently "known" parts of the asymmetric unit is given in the form used for the PHASER.sol file, followed by the list of trial orientations for which a translation function is to be performed.<br />
<br />
===Fixed partial structure===<br />
If you have the coordinates of a partial solution with the pdb coordinates of the known structure in the correct orientation and position, then you can force Phaser to use these coordinates. Use the SOLUTION keyword to fix a rotation of 0 0 0 and a position of 0 0 0 for these coordinates.<br />
<br />
==How to Select Peaks==<br />
<br />
<br />
<br />
The selection of peaks saved for output in the rotation and translation functions can be done in four different ways.<br />
*'''Select by Percentage'''<br />
*: Percentage of the top peak, where the value of the top peak is defined as 100% and the value of the mean is defined as 0%.<br />
*: Default, cutoff=75%. This criteria has the advantange that at least one peak (the top peak) always survives the selection. If the top solution is clear, then only the one solution will be output, but if the distribution of peaks is rather flat, then many peaks will be output for testing in the next part of the MR procedure (e.g. many peaks selected from the rotation function for testing with a translation function). <br />
*'''Select by Z-score'''<br />
*: Number of standard deviations (sigmas) over the mean (the Z-score). <br />
*: Absolute significance test. Not all searches will produce output if the cutoff value is too high (e.g. 5 sigma). <br />
*'''Select by Number'''<br />
*: Number of top peaks to select. <br />
*: If the distribution is very flat then it might be better to select a fixed large number (e.g. 1000) of top rotation peaks for testing in the translation function.<br />
*'''No selection'''<br />
*: All peaks are selected. <br />
*: Enables full 6 dimensional searches, where all the solutions from the rotation function are output for testing in the translation function. This should never be necessary; it would be much faster and probably just as likely to work if the top 1000 peaks were used in this way.<br />
<br />
[[Image:Phaser_selection.gif| Selection criteria]]<br />
<br />
Peaks can also be clustered or not clustered prior to selection in steps 1 and 2.<br />
*'''Clustering Off'''<br />
: All high peaks on the search grid are selected<br />
*'''Clustering On'''<br />
: Points on the search grid with higher neighbouring points are removed from the selection<br />
<br />
<br />
[[Image:Phaser_clustering.gif| Clustering]]<br />
<br />
==How to Control Output==<br />
The output of Phaser can be controlled with optional keywords. <br />
<br />
The ROOT keyword is not compulsory (the default root filename is "PHASER"), but should always be given, so that your jobs have separate and meaningful output filenames.<br />
<br />
The TOPFiles keyword controls the number of potential MR solutions for which PDB and (in the appropriate modes) MTZ files are produced.<br />
<br />
For the MR_AUTO, MR_RNP and MR_LLG modes, unless HKLOut OFF is given as an optional keyword, Phaser produces an MTZ file with "SigmaA" type weighted Fourier map coefficients for producing electron density maps for rebuilding.<br />
<br />
{| class="wikitable" style="text-align:left" width=100%<br />
|-<br />
! MTZ Column Labels !! Description<br />
|-<br />
| FWT/PHWT || Amplitude and phase for 2''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| DELFWT/PHDELWT || Amplitude and phase for ''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| FOM || ''m'', analogous to the "Sim" weight, to estimate the reliability of &alpha;<sub>calc</sub><br />
|-<br />
| HLA/HLB/HLC/HLD || Hendrickson-Lattman coefficients encoding the phase probability distribution<br />
|}<br />
<br />
==Translational Non-crystallographic Symmetry==<br />
<br />
<span style="color:crimson">'''*Warning*''' Solution by MR in the presence of translational non-crystallographic symmetry is not fully automated.</span><br />
<br />
Phaser calculates correction factors for the expected intensities in the presence of translational non-crystallographic symmetry (tNCS), and is able to solve structures with complex patterns of tNCS. '''However, the use of Phaser in the presence of tNCS requires the nature of the tNCS to be understood by the user.''' In simple cases, solution is no more difficult than solution without tNCS, but in complex cases, separate Phaser runs with tNCS turned on and off, and/or the use of different tNCS vectors, may be necessary.<br />
<br />
The output of Phaser will help the user in detecting and understanding the tNCS, but '''the tNCS is not completely characterised by Phaser'''. The default behaviour may or may not be correct for the particular crystal under study.<br />
<br />
Characterization of the tNCS involves understanding the number of copies of the molecule in the asymmetric unit and the translation vectors between them. Molecules related by a tNCS vector will have an associated peak in the native Patterson. Phaser calculates the native Patterson (MODE TNCS) and lists the peaks that are more than 20% of the origin peak. Any given crystal with tNCS may have one or more peaks meeting this criteria.<br />
<br />
===Default tNCS detection and correction===<br />
<span style="color:crimson">Documentation for Phaser-2.7.16 and above</span><br />
<br />
====No tNCS====<br />
No tNCS correction is applied by default if there is<br />
# no peak in the native Patterson <br />
# more than one peak in the native Patterson over 20% of the origin and these peaks are not all the result of a commensurate modulation<br />
<br />
====Pairs of molecules====<br />
By default, if Phaser detects a peak in the native Patterson then Phaser will search for molecules in pairs related by the tNCS vector given by the peak in the native Patterson.<br />
<br />
This will be the correct behaviour if and only if there are an even number of copies of the molecule in the asymmetric unit, clustered into two groups related by a single tNCS vector. There will only be one significant peak in the native Patterson. Fortunately, this is a reasonably common scenario.<br />
<br />
Phaser refines the relative orientation of the molecules in the two groups (rotations of up to 10 degrees will still give rise to a significant native Patterson peak) and uses this information to generate expected intensity factors for the reflections. Solution should be straightforward, with the usual caveat for MR that there is a sufficiently good model.<br />
<br />
Where there is a single peak in the native Patterson, it is often located at a position half way along a unit cell axis or diagonal, representing a pseudo-halving of the unit cell dimensions. However, Phaser is by no means restricted to these sorts of pseudo-cells in its handling of two-fold tNCS, and the tNCS vector can be in a general position.<br />
<br />
===Non-default tNCS correction===<br />
====Higher order tNCS====<br />
Frequently, tNCS does not associate 2 clusters of molecules in the asymmetric unit, but rather there are 3 or more (n) clusters of molecules associated by a series of vectors that are multiples of 1, 2, 3 ... (n-1) times a basic translation vector. Where n times the basic translation vector equates to (very close to) integer multiples of unit cell axes, the tNCS represents a pseudo-cell, and this case is known as commensurate modulation. <br />
<br />
Phaser attempts to automatically detect commensurate modulation. The peaks of the native Patterson are analyzed to find the n-fold relationship. The series will not generally have all peaks the same height. Lower peaks in the series represent relationships where the relative rotations between related molecules are larger. Missing peaks in the series may be below the default 20% of origin cut-off. This can be lowered with TNCS PATT PERCENT <x><br />
<br />
Phaser then sets TNCS NMOL <n> and the vector for the tNCS, and searches for ensembles in multiples of NMOL.<br />
<br />
When there are more than two molecules related by tNCS, Phaser does not refine the orientations between the molecules related by the tNCS.<br />
<br />
However, as for two-fold tNCS, Phaser is not restricted to these sorts of pseudo-cells and the basic tNCS vector can be in a general position, as can the number of copies.<br />
<br />
'''The automatic detection may not give the true tNCS relationship'''. For example, the true commensurate modulation may be a factor of the NMOL automatically detected by Phaser, or there may not be commensurate modulation at all, or commensurate modulation may not be found with the default Pattesron peak height cutoff. In difficult cases, please inspect the Patterson for peaks.<br />
<br />
====Complex tNCS====<br />
If there are many molecules in the asymmetric unit but they are not all related by tNCS, or there are sub-groups of molecules related by different tNCS vectors, then the modulations of the expected intensities due to the tNCS will be much less significant than the cases described above. '''In these cases it is possible that structure solution will be achieved without any tNCS correction factors being applied.''' Indeed, searching for all the copies as tNCS-related multiples when some molecules are not related by tNCS will cause structure solution to fail. To turn off the automatic detection and use of tNCS use the keyword TNCS USE OFF.<br />
<br />
If turning off the TNCS correction factors fails to give a solution, then a good approach is to proceed step-wise. Consider the highest native Patterson peak first and determine that nature of the tNCS associated with it. Use the appropriate correction factors to locate all the molecules with this tNCS. Then take the second independent native Patterson peak and apply the correction factors associated with it to find the second set of molecules, fixing the first, etc. Finally, turn TNCS off to find any orphan molecules.</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Python_Interface&diff=2458Python Interface2018-04-26T12:36:31Z<p>Rdo20: </p>
<hr />
<div><div style="margin-left: 25px; float: right;">__TOC__</div><br />
<br />
As an alternative to keyword input, Phaser can be called directly from a python script. This is the way Phaser is called in Phenix and we encourage developers of other automation pipelines to use the python scripting too. In order to call Phaser in python you will need to have Phaser installed from source. See [[Source_Code#Building_Phaser_from_source]]<br />
<br />
==Input-Objects, Run-Jobs, and Results-Objects==<br />
Using Phaser through the python interface is similar to using Phaser through the keyword interface. Each mode of operation of Phaser described above is controlled by an "input-object" (similar to the command script), has a Phaser "run-job" which runs the Phaser executable for the corresponding mode, and produces a "result-object" (which includes the logfile text). The user input is passed to the "input-object" with a calls to set- or add- functions. Phaser is then run with a call to the "run-job" function, which takes the "input-object" for control. Results are returned from the "result-object" with get-functions.<br />
<br />
{| class="wikitable" <br />
|-<br />
!Functionality !!Input-Object !!Run-Job !!Results-Object<br />
|-<br />
| Anisotropy Correction || i = InputANO() || r = runANO(i) || ResultANO()<br />
|-<br />
|Cell Content Analysis || i = InputCCA() || r = runCCA(i) ||ResultCCA()<br />
|-<br />
|Normal Mode Analysis || i = InputNMA() ||r = runNMA(i) ||ResultNMA()<br />
|-<br />
|Translational NCS Analysis || i = InputNCS() ||r = runNCS(i) ||ResultNCS()<br />
|-<br />
|Automated MR ||i = InputMR_AUTO() ||r = runMR_AUTO(i) ||ResultMR()<br />
|-<br />
|Rotation Function ||i = InputMR_FRF() ||r = runMR_FRF(i) ||ResultMR_RF()<br />
|-<br />
|Translation Function ||i = InputMR_FTF() ||r = runMR_FTF(i) ||ResultMR_TF()<br />
|-<br />
|Refinement and Phasing ||i = InputMR_RNP() ||r = runMR_RNP(i) ||ResultMR()<br />
|-<br />
|Log-Likelihood Gain ||i = InputMR_LLG() ||r = runMR_LLG(i) ||ResultMR()<br />
|-<br />
| Packing ||i = InputMR_PAK() ||r = runMR_PAK(i) ||ResultMR()<br />
|-<br />
|Automated Experimental Phasing ||i = InputEP_AUTO() ||r = runEP_AUTO(i) ||ResultEP()<br />
|-<br />
| SAD Experimental Phasing ||i = InputEP_SAD() ||r = runEP_SAD(i) ||ResultEP()<br />
|}<br />
<br />
The major difference between running Phaser though the keyword interface and running Phaser though the python scripting is that the data reading and Phaser functionality are separated. For the Phaser "run-job" functions, the reflection data (for Miller indices, Fobs and SigmaFobs) are simply arrays, the space group is given as a Hall string, and the unitcell is given as an array of 6 numbers. This is an important feature of the Phaser python scripting as it means that the Phaser "run-job" functions are not tied to mtz file input, but the data can be read in python from any file format, and then the data passed to Phaser.<br />
<br />
For the convenience of developers and users, the python scripting comes with data-reading jiffies to read data from mtz files. (These are the same mtz reading jiffies that are used internally by Phaser when calling Phaser from keyword input.)<br />
{| class="wikitable" <br />
!Functionality !!Input-Object !!Run-Job !!Result-Object<br />
|-<br />
| Read Data for MR ||i = InputMR_DAT() ||r = runMR_DAT(i) ||ResultMR_DAT()<br />
|-<br />
| Read Data for EP ||i = InputEP_DAT() ||r = runEP_DAT(i) ||ResultEP_DAT()<br />
|}<br />
<br />
==Input-Object set- and add-Functions==<br />
The syntax of the set- and add- functions on the "input-objects" mirror the keyword input. Each "input-object" only has set- or add- functions corresponding to the keywords that are relevant for that mode. Attempting to set a value on an "input-object" that is irrelevant for that mode will result in an error. This differs from the keyword input, where the parser simply ignores any keywords that are not relevant to the current mode. <br />
<br />
Note that setting the space group by name or number does not specify the setting. It is best to set the space group via the Hall symbol, which is unique to the full definition of the space group.<br />
<br />
The python interface uses standard python and cctbx/scitbx variable types. <br />
<br />
str string<br />
float double precision floating point<br />
Miller cctbx::miller::index<int> <br />
dvect3 scitbx::vec3<float> <br />
dmat33 scitbx::mat3<float> <br />
'''type'''_array scitbx::af::shared<'''type'''> arrays<br />
<br />
{| class="wikitable" <br />
|+ Examples of keyword/python equivalences<br />
|-<br />
! Functionality !! Keyword !!Python Set Function<br />
|-<br />
| Set the root filename || ROOT filename ||i.setROOT(filename)<br />
|-<br />
| Silence logfile output to standard output || MUTE ON ||i.setMUTE(True)<br />
|- <br />
| Add a scattering type for llg map completion ||LLGC SCATTERING S || i.addLLGC_SCAT(S) <br />
|}<br />
<br />
==Results-Object get-Functions==<br />
Data are extracted from the "result-objects" with get-functions. The get-functions are mostly specific to the type of "result-object" (described in sections below), but some are common to all "result-objects" (described in table below).<br />
<br />
Ralf Grosse-Kunstleve's scitbx::af::shared<double> array type is heavily used for passing of arrays into the Phaser "input-objects" and extracting arrays from the Phaser "result-objects". This is a reference counted array type that can be used directly in python and in C++. It is part of the Phaser installation, when Phaser is installed from source. The scitbx (SCIentific ToolBoX) is part of the cctbx (Computational Crystallography ToolBoX) which is hosted by sourceforge<br />
<br />
{| class="wikitable" <br />
|+ Functions common to all output objects<br />
|-<br />
!Results Objects !!Python Get Function<br />
|-<br />
| Exit status "success" ||r.Success()<br />
|-<br />
| Exit status "failure" ||r.Failure()<br />
|-<br />
| Type of Error (see error table). SYNTAX errors are not <br>thrown in python as they are generated by keyword input ||r.ErrorName()<br />
|-<br />
| Message associated with error ||r.ErrorMessage()<br />
|-<br />
| Text of Summary ||r.summary()<br />
|-<br />
| Text of Logfile ||r.logfile()<br />
|-<br />
| Text of Verbose Logfile ||r.verbose()<br />
|-<br />
| Text of Warning messages ||r.warnings()<br />
|-<br />
| Text of Loggraph format tables/graphs ||r.loggraph()<br />
|}<br />
<br />
There is no documentation for the functions available from each results object. Please see the file Outputs_bpl.cpp in the boost_python directory of the phaser source code distribution.<br />
<br />
==Error Handling==<br />
<br />
Exit status is indicated by Success() and Failure() functions of the "result-objects". Success indicates successful execution of Phaser, not that it has solved the structure! For molecular replacement jobs, the foundSolutions() function indicates that Phaser has found one or more potential solutions, the numSolutions() function returns how many solutions were found and the uniqueSolution() function returns True if only one solution was found. More detailed error information in the case of Failure is given by ErrorName() and ErrorMessage().<br />
<br />
Advanced Information: All errors are thrown and caught internally by the "run-jobs", and so do not generate "Runtime Errors" in the python script. In particular "INPUT" errors are not thrown by the set- or add-functions of the "input-objects", but are stored in the "input-object" and passed to the "result-object" once the "run-job" is called. Results objects are derived from std::exception, and so can be thrown. Function what() returns ErrorName() (not the ErrorMessage()).<br />
<br />
==Logfile Handling==<br />
Writing of the logfile to standard output can be silenced with the i.setMUTE(True) function. The logfile or summary text can then be printed to standard output with the print r.logfile() or print r.summary() functions.<br />
<br />
Advanced Information: Setting i.setMUTE(True) prevents real time viewing of the progress of a Phaser job. This may present an inconvenience for users. If you want to view the logfile information but not have it go to standard output, Logfile text can be redirected to a python string using an alternative call to the "run-job" function that includes passing an "output-object" (which controls the Phaser logging methods) on which the output stream has been set to a python string. This feature of Phaser was developed thanks to Ralf Grosse-Kunstleve.<br />
<br />
==Example Scripts==<br />
Copy and edit to start using Phaser<br />
* [[Python Example Scripts | Python Example Scripts ]]</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Source_Code&diff=2457Source Code2018-04-26T12:34:42Z<p>Rdo20: </p>
<hr />
<div>===Repository===<br />
<br />
A public [https://git.csx.cam.ac.uk/x/cimr-phaser/phaser.git/summary Phaser git repository] is available for '''git clone''' and '''git pull''' only. This mirrors commits to the Phaser SVN respository in real time<br />
<br />
The [http://www-structmed.cimr.cam.ac.uk/svn-cgi-bin/viewvc.cgi/ Phaser SVN repository] is located in Cambridge on the CIMR server (password restricted)<br />
<br />
The Berkeley mirror at cci.lbl.gov is updated at midnight Berkeley time<br />
<br />
:/net/cci/auto_build/repositories/phaser<br />
<br />
===Access===<br />
<br />
*You can download nightly builds of Phenix (binaries), which contain the latest version of Phaser that has passed regression tests<br />
*You can compile code with real-time updates from the git repository. This code may not pass regression tests. The git repository is best used for obtaining instant bugfixes, after communication with one of the Phaser developers<br />
*If you are developing a pipeline using Phaser, we are keen to work with you to add features, fix bugs and help you use Phaser optimally<br />
*Note the University of Cambridge's [[ Licences | Licences for Phaser]] with regards to making Phaser part of a pipeline available online<br />
*Source code modifications are allowed under the University of Cambridge's [[ Licences | Licences for Phaser]], provided they are for internal use only. Distribution would require those changes to be incorporated into our SVN repository. <br />
<br />
===Full Access===<br />
<br />
*Requests for permission to commit to the SVN repository via SSH should emailed to [mailto:cimr-phaser@lists.cam.ac.uk phaser-help]<br />
<br />
<br />
===Building Phaser from source===<br />
Phaser can be built as an executable file for the platforms Linux, MacOS, Windows (using VC++ 9.0) and Windows (using g++ in MinGW-W64). It can also be built as python modules useful for python scripting. There are two ways to achieve this. <br />
<br />
<br />
The quick way is to start from an existing installation of CCTBX (available from http://cci.lbl.gov/cctbx_build/) to allow building the Phaser executable. Install CCTBX for the desired platform on your system first. For MinGW-W64 use the Windows build of CCTBX.<br />
Assuming python2.7 is present on your system Phaser can be built from a CCTBX installation <br />
with the following steps from a Bash shell or a Windows command prompt:<br />
#Change directory to the modules/ folder within the CCTBX installation. Then do <pre>git clone git://git.uis.cam.ac.uk/cimr-phaser/phaser.git </pre><br />
#Change directory to the build/ folder within the CCTBX installation<br />
#Delete all files and folders except '''config_modules.sh''' or '''config_modules.cmd'''<br />
#Edit '''config_modules.sh''' or '''config_modules.cmd''' script to like:<br />
#:*'''Linux or MacOS''':<br />
#:<pre>#!/bin/sh &#10;python ../modules/cctbx_project/libtbx/configure.py phaser --enable_openmp_if_possible=True</pre><br />
#:*'''Windows using Microsoft VC++ 9.0'''<br />
#:<pre>python ..\modules\cctbx_project\libtbx\configure.py phaser --enable_openmp_if_possible=True</pre><br />
#:*'''Windows using MinGW-W64 5.3.0'''<br />
#:<pre>python ..\modules\cctbx_project\libtbx\configure.py phaser --enable_openmp_if_possible=True --compiler=mingw --static_exe</pre><br />
#Execute the '''config_modules.sh''' or '''config_modules.cmd''' script.<br />
#On Linux or MacOS source the file '''setpaths.sh''', on Windows execute the file '''setpaths.bat'''<br />
#On Linux or MacOS do '''libtbx.scons -j nproc exe/phaser''', on Windows do '''libtbx.scons -j nproc exe\phaser.exe'''. Here nproc is the number of available CPUs to do the compilation. This will produce the Phaser executable within the build/exe directory. If Phaser python modules are also desired then omit the '''exe/phaser''' or '''exe\phaser.exe''' argument (does not apply to a MinGW-W64 build unless the installed version of your CCTBX was built for MinGW-W64).<br />
<br />
<br />
A simpler but slower way to build Phaser is to run a "bootstrap build". Download the file bootstrap.py (as detailed on https://github.com/cctbx/cctbx_project#installation ) to where you want to build phaser. Assuming that python 2.7 and the compiler is available from the PATH environment variable now run the command: <pre>python bootstrap.py --builder=phaser --nproc=8</pre> from a command prompt or bash shell. This will build a stripped down version of CCTBX in addition to Phaser and its python modules. The Phaser executable is located in the directory build/exe.<br />
<br />
<br />
The steps to build Phaser change from time to time as the developments of required components like CCTBX are moving targets. The steps outlined here may therefore differ from the actual ones at short or no notice.<br />
<br />
===Using Phaser from Python===<br />
Having compiled Phaser as above Phaser can now be accessed from the python interpreter that is part of CCTBX. To invoke it on Linux or MacOS it is necessary first to execute <pre>source build/setpaths.sh</pre> or on Windows <pre>build\setpaths.bat.</pre> From then on you can invoke python with the command <pre>cctbx.python</pre> and run scripts such as [[Python_Example_Scripts]]</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Source_Code&diff=2456Source Code2018-04-26T10:56:12Z<p>Rdo20: /* Building Phaser from source */</p>
<hr />
<div>===Repository===<br />
<br />
A public [https://git.csx.cam.ac.uk/x/cimr-phaser/phaser.git/summary Phaser git repository] is available for '''git clone''' and '''git pull''' only. This mirrors commits to the Phaser SVN respository in real time<br />
<br />
The [http://www-structmed.cimr.cam.ac.uk/svn-cgi-bin/viewvc.cgi/ Phaser SVN repository] is located in Cambridge on the CIMR server (password restricted)<br />
<br />
The Berkeley mirror at cci.lbl.gov is updated at midnight Berkeley time<br />
<br />
:/net/cci/auto_build/repositories/phaser<br />
<br />
===Access===<br />
<br />
*You can download nightly builds of Phenix (binaries), which contain the latest version of Phaser that has passed regression tests<br />
*You can compile code with real-time updates from the git repository. This code may not pass regression tests. The git repository is best used for obtaining instant bugfixes, after communication with one of the Phaser developers<br />
*If you are developing a pipeline using Phaser, we are keen to work with you to add features, fix bugs and help you use Phaser optimally<br />
*Note the University of Cambridge's [[ Licences | Licences for Phaser]] with regards to making Phaser part of a pipeline available online<br />
*Source code modifications are allowed under the University of Cambridge's [[ Licences | Licences for Phaser]], provided they are for internal use only. Distribution would require those changes to be incorporated into our SVN repository. <br />
<br />
===Full Access===<br />
<br />
*Requests for permission to commit to the SVN repository via SSH should emailed to [mailto:cimr-phaser@lists.cam.ac.uk phaser-help]<br />
<br />
<br />
===Building Phaser from source===<br />
Phaser can be built as an executable file for the platforms Linux, MacOS, Windows (using VC++ 9.0) and Windows (using g++ in MinGW-W64). It can also be built as python modules useful for python scripting. There are two ways to achieve this. <br />
<br />
<br />
The quick way is to start from an existing installation of CCTBX (available from http://cci.lbl.gov/cctbx_build/) to allow building the Phaser executable. Install CCTBX for the desired platform on your system first. For MinGW-W64 use the Windows build of CCTBX.<br />
Assuming python2.7 is present on your system Phaser can be built from a CCTBX installation <br />
with the following steps from a Bash shell or a Windows command prompt:<br />
#Change directory to the modules/ folder within the CCTBX installation. Then do <pre>git clone git://git.uis.cam.ac.uk/cimr-phaser/phaser.git </pre><br />
#Change directory to the build/ folder within the CCTBX installation<br />
#Delete all files and folders except '''config_modules.sh''' or '''config_modules.cmd'''<br />
#Edit '''config_modules.sh''' or '''config_modules.cmd''' script to like:<br />
#:*'''Linux or MacOS''':<br />
#:<pre>#!/bin/sh &#10;python ../modules/cctbx_project/libtbx/configure.py phaser --enable_openmp_if_possible=True</pre><br />
#:*'''Windows using Microsoft VC++ 9.0'''<br />
#:<pre>python ..\modules\cctbx_project\libtbx\configure.py phaser --enable_openmp_if_possible=True</pre><br />
#:*'''Windows using MinGW-W64 5.3.0'''<br />
#:<pre>python ..\modules\cctbx_project\libtbx\configure.py phaser --enable_openmp_if_possible=True --compiler=mingw --static_exe</pre><br />
#Execute the '''config_modules.sh''' or '''config_modules.cmd''' script.<br />
#On Linux or MacOS source the file '''setpaths.sh''', on Windows execute the file '''setpaths.bat'''<br />
#On Linux or MacOS do '''libtbx.scons -j nproc exe/phaser''', on Windows do '''libtbx.scons -j nproc exe\phaser.exe'''. Here nproc is the number of available CPUs to do the compilation. This will produce the Phaser executable within the build/exe directory. If Phaser python modules are also desired then omit the '''exe/phaser''' or '''exe\phaser.exe''' argument (does not apply to a MinGW-W64 build unless the installed version of your CCTBX was built for MinGW-W64).<br />
<br />
<br />
A simpler but slower way to build Phaser is to run a "bootstrap build". Download the file bootstrap.py (as detailed on https://github.com/cctbx/cctbx_project#installation ) to where you want to build phaser. Assuming that python 2.7 and the compiler is available from the PATH environment variable now run the command: <pre>python bootstrap.py --builder=phaser --nproc=8</pre> from a command prompt or bash shell. This will build a stripped down version of CCTBX in addition to Phaser and its python modules.<br />
<br />
<br />
The steps to build Phaser change from time to time as the developments of required components like CCTBX are moving targets. The steps outlined here may therefore differ from the actual ones at short or no notice.</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Source_Code&diff=2450Source Code2018-03-19T13:52:18Z<p>Rdo20: /* Building Phaser from source */</p>
<hr />
<div>===Repository===<br />
<br />
A public [https://git.csx.cam.ac.uk/x/cimr-phaser/phaser.git/summary Phaser git repository] is available for '''git clone''' and '''git pull''' only. This mirrors commits to the Phaser SVN respository in real time<br />
<br />
The [http://www-structmed.cimr.cam.ac.uk/svn-cgi-bin/viewvc.cgi/ Phaser SVN repository] is located in Cambridge on the CIMR server (password restricted)<br />
<br />
The Berkeley mirror at cci.lbl.gov is updated at midnight Berkeley time<br />
<br />
:/net/cci/auto_build/repositories/phaser<br />
<br />
===Access===<br />
<br />
*You can download nightly builds of Phenix (binaries), which contain the latest version of Phaser that has passed regression tests<br />
*You can compile code with real-time updates from the git repository. This code may not pass regression tests. The git repository is best used for obtaining instant bugfixes, after communication with one of the Phaser developers<br />
*If you are developing a pipeline using Phaser, we are keen to work with you to add features, fix bugs and help you use Phaser optimally<br />
*Note the University of Cambridge's [[ Licences | Licences for Phaser]] with regards to making Phaser part of a pipeline available online<br />
*Source code modifications are allowed under the University of Cambridge's [[ Licences | Licences for Phaser]], provided they are for internal use only. Distribution would require those changes to be incorporated into our SVN repository. <br />
<br />
===Full Access===<br />
<br />
*Requests for permission to commit to the SVN repository via SSH should emailed to [mailto:cimr-phaser@lists.cam.ac.uk phaser-help]<br />
<br />
<br />
===Building Phaser from source===<br />
Phaser can be built as an executable file for the platforms Linux, MacOS, Windows (using VC++ 9.0) and Windows (using g++ in MinGW-W64). It can also be built as python modules useful for python scripting. There are two ways to achieve this. <br />
<br />
<br />
The quick way is to start from an existing installation of CCTBX (available from http://cci.lbl.gov/cctbx_build/) to allow building the Phaser executable. Install CCTBX for the desired platform on your system first. For MinGW-W64 use the Windows build of CCTBX.<br />
Assuming python2.7 is present on your system Phaser can be built from a CCTBX installation <br />
with the following steps from a Bash shell or a Windows command prompt:<br />
#Change directory to the modules/ folder within the CCTBX installation. Then do <pre>git clone git://git.uis.cam.ac.uk/cimr-phaser/phaser.git </pre><br />
#Change directory to the build/ folder within the CCTBX installation<br />
#Delete all files and folders except '''config_modules.sh''' or '''config_modules.cmd'''<br />
#Edit '''config_modules.sh''' or '''config_modules.cmd''' script to like:<br />
#:*'''Linux or MacOS''':<br />
#:<pre>#!/bin/sh &#10;python ../modules/cctbx_project/libtbx/configure.py phaser --enable_openmp_if_possible=True</pre><br />
#:*'''Windows using Microsoft VC++ 9.0'''<br />
#:<pre>python ..\modules\cctbx_project\libtbx\configure.py phaser --enable_openmp_if_possible=True</pre><br />
#:*'''Windows using MinGW-W64 5.3.0'''<br />
#:<pre>python ..\modules\cctbx_project\libtbx\configure.py phaser --enable_openmp_if_possible=True --compiler=mingw --static_exe</pre><br />
#Execute the '''config_modules.sh''' or '''config_modules.cmd''' script.<br />
#On Linux or MacOS source the file '''setpaths.sh''', on Windows execute the file '''setpaths.bat'''<br />
#On Linux or MacOS do '''libtbx.scons -j nproc exe/phaser''', on Windows do '''libtbx.scons -j nproc exe\phaser.exe'''. Here nproc is the number of available CPUs to do the compilation. This will produce the Phaser executable within the build/exe directory. If Phaser python modules are also desired then omit the '''exe/phaser''' or '''exe\phaser.exe''' argument (does not apply to a MinGW-W64 build unless the installed version of your CCTBX was built for MinGW-W64).<br />
<br />
<br />
A simpler but slower way to build Phaser is to run a "bootstrap build". Download the file bootstrap.py (as detailed on https://github.com/cctbx/cctbx_project#installation ) to where you want to build phaser. Then from a command prompt run the command: <pre>python bootstrap.py --builder=phaser --nproc=8.</pre> This will build a stripped down version of CCTBX in addition to Phaser and its python modules.<br />
<br />
<br />
The steps to build Phaser change from time to time as the developments of required components like CCTBX are moving targets. The steps outlined here may therefore differ from the actual ones at short or no notice.</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Source_Code&diff=2449Source Code2018-03-16T15:22:30Z<p>Rdo20: /* Building Phaser from source */</p>
<hr />
<div>===Repository===<br />
<br />
A public [https://git.csx.cam.ac.uk/x/cimr-phaser/phaser.git/summary Phaser git repository] is available for '''git clone''' and '''git pull''' only. This mirrors commits to the Phaser SVN respository in real time<br />
<br />
The [http://www-structmed.cimr.cam.ac.uk/svn-cgi-bin/viewvc.cgi/ Phaser SVN repository] is located in Cambridge on the CIMR server (password restricted)<br />
<br />
The Berkeley mirror at cci.lbl.gov is updated at midnight Berkeley time<br />
<br />
:/net/cci/auto_build/repositories/phaser<br />
<br />
===Access===<br />
<br />
*You can download nightly builds of Phenix (binaries), which contain the latest version of Phaser that has passed regression tests<br />
*You can compile code with real-time updates from the git repository. This code may not pass regression tests. The git repository is best used for obtaining instant bugfixes, after communication with one of the Phaser developers<br />
*If you are developing a pipeline using Phaser, we are keen to work with you to add features, fix bugs and help you use Phaser optimally<br />
*Note the University of Cambridge's [[ Licences | Licences for Phaser]] with regards to making Phaser part of a pipeline available online<br />
*Source code modifications are allowed under the University of Cambridge's [[ Licences | Licences for Phaser]], provided they are for internal use only. Distribution would require those changes to be incorporated into our SVN repository. <br />
<br />
===Full Access===<br />
<br />
*Requests for permission to commit to the SVN repository via SSH should emailed to [mailto:cimr-phaser@lists.cam.ac.uk phaser-help]<br />
<br />
<br />
===Building Phaser from source===<br />
Phaser can be built as an executable file for the platforms Linux, MacOS, Windows (using VC++ 9.0) and Windows (using g++ in MinGW-W64). It can also be built as python modules useful for python scripting. There are two ways to achieve this. <br />
<br />
<br />
One way is to start from an existing installation of CCTBX (available from http://cci.lbl.gov/cctbx_build/) to allow building the Phaser executable. Install CCTBX for the desired platform on your system first. For MinGW-W64 use the Windows build of CCTBX.<br />
Assuming python2.7 is present on your system Phaser can be built from a CCTBX installation <br />
with the following steps from a Bash shell or a Windows command prompt:<br />
#Change directory to the modules/ folder within the CCTBX installation. Then do <pre>git clone git://git.uis.cam.ac.uk/cimr-phaser/phaser.git </pre><br />
#Change directory to the build/ folder within the CCTBX installation<br />
#Delete all files and folders except '''config_modules.sh''' or '''config_modules.cmd'''<br />
#Edit '''config_modules.sh''' or '''config_modules.cmd''' script to like:<br />
#:*'''Linux or MacOS''':<br />
#:<pre>#!/bin/sh &#10;python ../modules/cctbx_project/libtbx/configure.py phaser --enable_openmp_if_possible=True</pre><br />
#:*'''Windows using Microsoft VC++ 9.0'''<br />
#:<pre>python ..\modules\cctbx_project\libtbx\configure.py phaser --enable_openmp_if_possible=True</pre><br />
#:*'''Windows using MinGW-W64 5.3.0'''<br />
#:<pre>python ..\modules\cctbx_project\libtbx\configure.py phaser --enable_openmp_if_possible=True --compiler=mingw --static_exe</pre><br />
#Execute the '''config_modules.sh''' or '''config_modules.cmd''' script.<br />
#On Linux or MacOS source the file '''setpaths.sh''', on Windows execute the file '''setpaths.bat'''<br />
#On Linux or MacOS do '''libtbx.scons -j nproc exe/phaser''', on Windows do '''libtbx.scons -j nproc exe\phaser.exe'''. Here nproc is the number of available CPUs to do the compilation. This will produce the Phaser executable within the build/exe directory. If Phaser python modules are also desired then omit the '''exe/phaser''' or '''exe\phaser.exe''' argument (does not apply to a MinGW-W64 build unless the installed version of your CCTBX was built for MinGW-W64).<br />
<br />
<br />
Another way to build Phaser is to run a "bootstrap build". Download the file bootstrap.py (as detailed on https://github.com/cctbx/cctbx_project#installation ) to where you want to build phaser. Then from a command prompt run the command: <pre>python bootstrap.py --builder=phaser --nproc=8.</pre> This will build a stripped down version of CCTBX in addition to Phaser and its python modules.<br />
<br />
<br />
The steps to build Phaser change from time to time as the developments of required components like CCTBX are moving targets. The steps outlined here may therefore differ from the actual ones at short or no notice.</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Source_Code&diff=2448Source Code2018-03-16T15:21:33Z<p>Rdo20: /* Building Phaser from source */</p>
<hr />
<div>===Repository===<br />
<br />
A public [https://git.csx.cam.ac.uk/x/cimr-phaser/phaser.git/summary Phaser git repository] is available for '''git clone''' and '''git pull''' only. This mirrors commits to the Phaser SVN respository in real time<br />
<br />
The [http://www-structmed.cimr.cam.ac.uk/svn-cgi-bin/viewvc.cgi/ Phaser SVN repository] is located in Cambridge on the CIMR server (password restricted)<br />
<br />
The Berkeley mirror at cci.lbl.gov is updated at midnight Berkeley time<br />
<br />
:/net/cci/auto_build/repositories/phaser<br />
<br />
===Access===<br />
<br />
*You can download nightly builds of Phenix (binaries), which contain the latest version of Phaser that has passed regression tests<br />
*You can compile code with real-time updates from the git repository. This code may not pass regression tests. The git repository is best used for obtaining instant bugfixes, after communication with one of the Phaser developers<br />
*If you are developing a pipeline using Phaser, we are keen to work with you to add features, fix bugs and help you use Phaser optimally<br />
*Note the University of Cambridge's [[ Licences | Licences for Phaser]] with regards to making Phaser part of a pipeline available online<br />
*Source code modifications are allowed under the University of Cambridge's [[ Licences | Licences for Phaser]], provided they are for internal use only. Distribution would require those changes to be incorporated into our SVN repository. <br />
<br />
===Full Access===<br />
<br />
*Requests for permission to commit to the SVN repository via SSH should emailed to [mailto:cimr-phaser@lists.cam.ac.uk phaser-help]<br />
<br />
<br />
===Building Phaser from source===<br />
Phaser can be built as an executable file for the platforms Linux, MacOS, Windows (using VC++ 9.0) and Windows (using g++ in MinGW-W64). It can also be built as python modules useful for python scripting. There are two ways to achieve this. <br />
<br />
One way is to start from an existing installation of CCTBX (available from http://cci.lbl.gov/cctbx_build/) to allow building the Phaser executable. Install CCTBX for the desired platform on your system first. For MinGW-W64 use the Windows build of CCTBX.<br />
Assuming python2.7 is present on your system Phaser can be built from a CCTBX installation <br />
with the following steps from a Bash shell or a Windows command prompt:<br />
#Change directory to the modules/ folder within the CCTBX installation. Then do <pre>git clone git://git.uis.cam.ac.uk/cimr-phaser/phaser.git </pre><br />
#Change directory to the build/ folder within the CCTBX installation<br />
#Delete all files and folders except '''config_modules.sh''' or '''config_modules.cmd'''<br />
#Edit '''config_modules.sh''' or '''config_modules.cmd''' script to like:<br />
#:*'''Linux or MacOS''':<br />
#:<pre>#!/bin/sh &#10;python ../modules/cctbx_project/libtbx/configure.py phaser --enable_openmp_if_possible=True</pre><br />
#:*'''Windows using Microsoft VC++ 9.0'''<br />
#:<pre>python ..\modules\cctbx_project\libtbx\configure.py phaser --enable_openmp_if_possible=True</pre><br />
#:*'''Windows using MinGW-W64 5.3.0'''<br />
#:<pre>python ..\modules\cctbx_project\libtbx\configure.py phaser --enable_openmp_if_possible=True --compiler=mingw --static_exe</pre><br />
#Execute the '''config_modules.sh''' or '''config_modules.cmd''' script.<br />
#On Linux or MacOS source the file '''setpaths.sh''', on Windows execute the file '''setpaths.bat'''<br />
#On Linux or MacOS do '''libtbx.scons -j nproc exe/phaser''', on Windows do '''libtbx.scons -j nproc exe\phaser.exe'''. Here nproc is the number of available CPUs to do the compilation. This will produce the Phaser executable within the build/exe directory. If Phaser python modules are also desired then omit the '''exe/phaser''' or '''exe\phaser.exe''' argument (does not apply to a MinGW-W64 build unless the installed version of your CCTBX was built for MinGW-W64).<br />
<br />
Another way to build Phaser is to run a "bootstrap build". Download the file bootstrap.py (as detailed on https://github.com/cctbx/cctbx_project#installation ) to where you want to build phaser. Then from a command prompt run the command: <pre>python bootstrap.py --builder=phaser --nproc=8.</pre> This will build a stripped down version of CCTBX in addition to Phaser and its python modules.<br />
<br />
The steps to build Phaser change from time to time as the developments of required components like CCTBX are moving targets. The steps outlined here may therefore differ from the actual ones at short or no notice.</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Developers&diff=2438Developers2018-02-08T16:47:13Z<p>Rdo20: </p>
<hr />
<div>;Principal Investigator<br />
* [[Randy J. Read | Professor Randy J. Read]]<br />
<br />
<br />
;Group<br />
* [[ Tristan Croll | Dr Tristan Croll ]]<br />
* [[ Airlie J. McCoy | Dr Airlie McCoy ]]<br />
* [[ Robert Oeffner | Dr Robert Oeffner ]]<br />
<br />
<br />
;Alumni<br />
* Dr Gabor Bunkoczi<br />
* Dr Anne Baker<br />
* Dr Laurent Storoni<br />
* Dr Hamsapriye<br />
<br />
<br />
;Converting wiki page into rst for phenix docs<br />
# Copy and paste desired wiki source (not raw html) into text file, say Phenix-dev\modules\phenix_html\rst_files\reference\MRwiki.txt.<br />
# Copy any images referenced by MRwiki.txt to Phenix-dev\modules\phenix_html\rst_files\images\<br />
# Image references in MRwiki.txt need to be amended from Image:mypic.gif to Image:../images/mypic.gif<br />
# Install Pandoc on your PC<br />
# Change direcotry to Phenix-dev\modules\phenix_html\rst_files\reference and run Pandoc with command line: <br />
#:<pre>pandoc --columns=150 --toc -f mediawiki MRwiki.txt -t rst -o MRwiki_rst.txt</pre><br />
# Run phenix_html.rebuild_docs<br />
# Inspect new file Phenix\doc\reference\MRwiki_rst.html<br />
# If happy commit files to phenix_html svn server</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Developers&diff=2437Developers2018-02-08T16:45:09Z<p>Rdo20: </p>
<hr />
<div>;Principal Investigator<br />
* [[Randy J. Read | Professor Randy J. Read]]<br />
<br />
<br />
;Group<br />
* [[ Tristan Croll | Dr Tristan Croll ]]<br />
* [[ Airlie J. McCoy | Dr Airlie McCoy ]]<br />
* [[ Robert Oeffner | Dr Robert Oeffner ]]<br />
<br />
<br />
;Alumni<br />
* Dr Gabor Bunkoczi<br />
* Dr Anne Baker<br />
* Dr Laurent Storoni<br />
* Dr Hamsapriye<br />
<br />
<br />
;Converting wiki page into rst for phenix docs<br />
# Copy and paste desired wiki source (not raw html) into text file, say Phenix-dev\modules\phenix_html\rst_files\reference\MRwiki.txt.<br />
# Copy any images referenced by MRwiki.txt to Phenix-dev\modules\phenix_html\rst_files\images\<br />
# Image references in MRwiki.txt need to be amended from Image:mypic.gif to Image:../images/mypic.gif<br />
# Install Pandoc on your PC<br />
# Change direcotry to Phenix-dev\modules\phenix_html\rst_files\reference and run Pandoc with command line: <br />
pandoc --columns=150 --toc -f mediawiki MRwiki.txt -t rst -o MRwiki_rst.txt<br />
# Run phenix_html.rebuild_docs<br />
# Inspect new file Phenix\doc\reference\MRwiki_rst.html<br />
# If happy commit files to phenix_html svn server</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Molecular_Replacement&diff=2436Molecular Replacement2018-02-08T16:27:39Z<p>Rdo20: </p>
<hr />
<div><div style="margin-left: 25px; float: right;">__TOC__</div><br />
<br />
'''Quicklink to example scripts''' -> [[MR using keyword input]]<br />
<br />
'''Quicklink to phaser.famos (find_alt_orig_sym_mate) documentation''' -> [[Famos]]<br />
<br />
Phaser should be able to solve most structures with the Automated Molecular Replacement mode, and this is the first mode that you should try. Give Phaser your data ([[#How to Define Data|How to Define Data]]) and your models ([[#How to Define Models|How to Define Models]]), tell Phaser what to search for, and a list of possible spacegroups (in the same point group).<br />
<br />
If this doesn't work (see [[#Has Phaser Solved It?| Has Phaser Solved It?]]), you can try selecting peaks of lower significance in the rotation function in case the real orientation was not within the selection criteria. By default peaks above 75% of the top peak are selected (see [[#How to Select Peaks| How to Select Peaks]]). See [[#What to do in Difficult Cases| What to do in Difficult Cases]] for more hints and tips. If the automated molecular replacement mode doesn't work even with non-default input you need to run the modes of Phaser separately. The possibilities are endless - you can even try exhaustive searches (translations of all orientations) if you want - but experience has shown that most structures that can be solved by Phaser can be solved by relatively simple strategies.<br />
<br />
==Automated Molecular Replacement==<br />
Automated Molecular Replacement combines the anisotropy correction, likelihood enhanced fast rotation function, likelihood enhanced fast translation function, packing and refinement modes for multiple search models and a set of possible spacegroups to automatically solve a structure by molecular replacement. Top solutions are output to the files FILEROOT.sol, FILEROOT.#.mtz and FILEROOT.#.pdb (where "#" refers to the sorted solution number, 1 being the best, and only 1 is output by default). Many structures can be solved by running an automated molecular replacement search with defaults, giving the ensembles that you expect to be easiest to find first.<br />
<br />
At the completion of Molecular Replacement you may wish to place your solutions on a common origin with a previous solution, for which [[Famos | Famos ]] can be used.<br />
<br />
[[Image:Phaser_MR_auto.gif|Flow Diagram for Automated MR]]<br />
<br />
==Should Phaser Solve It?==<br />
The difficulty of a molecular replacement problem depends primarily on two major factors: how well the model will be able to explain the diffraction data (which depends both on the accuracy of the model and on its completeness), and how many reflections can be explained, at least in part. Each reflection provides a piece of information that helps to identify correct MR solutions.<br />
<br />
It is possible to make a reasonable prediction of whether or not a solution will be found. If the quality of the model (its accuracy and completeness) can be estimated, then the expected contribution of each reflection to the total LLG can also be estimated. From a large battery of tests, we know that an LLG of 40 or greater usually indicates a correct solution (at least in the absence of complicating factors such as translational non-crystallographic symmetry, tNCS). Building on this understanding, if it is estimated that the LLG will be 60 or less, then Phaser will assume that the problem is a difficult one, and will implement search procedures optimised for difficult problems.<br />
<br />
==What Resolution of Data Should be Used?==<br />
The signal for a molecular replacement solution should be very clear if the expected value of the LLG is much higher than the minimum required to be fairly certain of a solution. Currently Phaser aims for a minimum LLG of 120 and, if it is possible to achieve an even higher value, given the quality of the model and the quantity of diffraction data, then the resolution for the initial search is limited to the value required to achieve an expected LLG of 120. Data to the full resolution are still used for a final rigid-body refinement, or in a second pass if a clear solution is not found in the first attempt.<br />
<br />
However, if the model is expected to have a large RMS error (based usually on the correlation between sequence identity and RMS error), then data to high resolution will not contribute any significant signal. Regardless of the expected LLG at the highest resolution limit, the resolution used is limited to 1.8 times the estimated RMS error of the model, because this resolution limit gives about 99% of the LLG that could be achieved.<br />
<br />
Because Phaser implements strategies designed to solve structures with as much confidence as possible, as efficiently as possible, it is best to leave the choice of resolution to Phaser, at least in the first instance.<br />
<br />
==Has Phaser Solved It?==<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! TF Z-score !! Have I solved it?<br />
|-<br />
| less than 5 || no<br />
|-<br />
| 5 - 6 || unlikely<br />
|-<br />
| 6 - 7 || possibly<br />
|-<br />
| 7 - 8 || probably<br />
|-<br />
| more than 8* ||definitely<br />
|-<br />
| *''6 for 1st model in monoclinic space groups'' || <br />
|} <br />
<br />
Ideally, a unique solution with a strong signal will be found at the end of the search. If you are searching for multiple components, then ideally the search for each component will also give a strong signal. However if the signal-to-noise of your search is low, there will be noise peaks and multiple ambiguous solutions. Signal-to-noise is judged using the '''Z-score''', which is computed by comparing the LLG values from the rotation or translation search with LLG values for a set of random rotations or translations. The mean and the RMS deviation from the mean are computed from the random set, then the Z-score for a search peak is defined as its LLG minus the mean, all divided by the RMS deviation, ''i.e. '' '''the number of standard deviations above (or below) the mean. '''<br />
<br />
For a rotation function, the correct orientation may be well down the list with a Z-score (number of standard deviations above the mean value, or RFZ) under 4, and it is often not possible to identify the correct orientation until a translation function is performed and yields a clear solution. Note that the signal-to-noise of the rotation function drops with increasing number of primitive symmetry operations (the number of different orientations for symmetry-related molecules), because there is more uncertainty about how the structure factor contributions from symmetry-related copies will add up.<br />
<br />
For a translation function the correct solution will generally have a Z-score (TFZ) over 5 and be well separated from the rest of the solutions. Of course, there will always be exceptions! The table gives a very rough guide to interpreting TFZ scores. This table will be updated, as we learn more from systematic molecular replacement trials.<br />
<br />
When you are searching for multiple components, the signal may be low for the first few components but, as the model becomes more complete, the signal should become stronger. Finding a clear solution for a new component is a good sign that the partial solution to which that component was added was indeed correct.<br />
<br />
You should always at least glance through the summary of the logfile. One thing to look for, in particular, is whether any translation solutions with a high Z-score have been rejected by the packing step. By default up to 5 percent of marker atoms (C-alpha atoms for protein) are allowed to be involved in clashes. A solution with more clashes may still be correct, and the clashes may arise only because of differences in small surface loops. If this happens, repeat the run allowing a suitable number of clashes. Note that, unless there is specific evidence in the logfile that a high TFZ-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond the default will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
<br />
Note that, by default, Phaser will produce a single PDB file corresponding to the top solution found (if any), so finding a single PDB file in your output directory is not an indication that the search succeeded! You have to look, at least, at the summary of the logfile, or at the list of possible solutions in the .sol file that is produced if you run Phaser from ccp4i or command-line scripts.<br />
<br />
==Annotation==<br />
<br />
A highly compact summary of the history of the statistics of a solution is given in the SOLUTION SET in the .sol file. This is a good place to start your analysis of the output. The annotation gives the Z-score of the solution at each rotation and translation function, the number of clashes in the packing, and the refined LLG.<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! Annotation !! Meaning<br />
|-<br />
| RFZ= || Rotation Function Z-score<br />
|-<br />
| TFZ= || Translation Function Z-score<br />
|-<br />
| PAK= || Number of packing clashes<br />
|-<br />
| LLG= || LLG after refinement. Will be repeated when a low resolution refinement is followed by a high resolution refinement.<br />
|-<br />
| TFZ== || Translation Function Z-score equivalent, only calculated for the top solution after refinement (or for the number of top files specified by TOPFILES)<br />
|-<br />
| RF++ || Rotation angle from previous strong solution has been used in the addition of next solution<br />
|-<br />
| RF*0 || Rotation angle 000 identified by low R-factor of input model<br />
|-<br />
| TFZ=* || First molecule in P1 (arbitrary origin, no Translation Function required)<br />
|-<br />
| TF*0 || Translation vector 000 identified by low R-factor of input model<br />
|-<br />
| (&&nbsp;... & ...) || Set of TFZ PAK and LLG values for placements that were amalgamated (more than one placement from a single Translation Function)<br />
|-<br />
| LLG+=(...&nbsp;&&nbsp;...)&nbsp;|| Set of LLG values calculated during amalgamation, which will always be increasing in value<br />
|-<br />
| +TNCS || Components added by Translational NCS relation<br />
|-<br />
| *T=<i>n</i> || Solution matches template solution <i>n</i><br />
|} <br />
<br />
Two versions of TFZ (the translation function Z-score) now appear for each component. The first ("TFZ=") is the Z-score from the actual translation search, which depends on the accuracy of the orientation used for that search. The second ("TFZ==") is the TFZ-equivalent, which indicates what the TFZ score would have been with the correct (refined) orientation. You should see the TFZ-equivalent is high at least for the final components of the solution, and that the LLG (log-likelihood gain) increases as each component of the solution is added. For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU SET RFZ=10.7 TFZ=24.3 PAK=0 LLG=472 TFZ==24.7 RFZ=6.4 TFZ=24.4 PAK=0 LLG=1006 TFZ==29.7 LLG=1006 TFZ==29.7<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
Note that the Euler angles in Phaser follow the same convention as those defined for the Crowther fast rotation function, i.e. z-y-z (rotate around the z-axis, followed by the new y-axis, followed by the new z-axis).<br />
<br />
==History==<br />
<br />
A highly compact summary of the history of the peak positions of a solution is given in the SOLUTION HISTORY in the .sol file. Together with the SOLUTION SET annotation, this is useful in your analysis of the output. <br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! History !! Meaning<br />
|-<br />
| RF/TF(r/t:n) || (r) Rotation Function peak number/(t) Translation Function peak number for the rotation function : (n) number of peak in final merged and sorted list<br />
|-<br />
| PAK(n:m) || (n) input solution number : (m) output solution number after packing condition applied<br />
|-<br />
| RNP(m,a,b,c,... : p) || All input peaks amalgamated after refinement to give output solution number (m and others): (p) output solution number<br />
|-<br />
| FUSE(A,B,C) || Solution numbers merged in amalgamation<br />
|} <br />
<br />
For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU HISTORY RF/TF(1/1:1)PAK(1:1)RNP(1:1)RNP(1:1)<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
A more complicated structure solution may have<br />
<br />
SOLU HISTORY RF/TF(7/1:10)PAK(10:10)RNP(10,12,13,11,17,16,18,25,3,8,22,21,20,7,969,6,5,201,9,4,390,2,1,19:1)RNP(1:1)<br />
<br />
==What to do in Difficult Cases==<br />
<br />
Not every structure can be solved by molecular replacement, but the right strategy can push the limits. What to do when the default jobs fail depends on why your structure is difficult.<br />
*'''Flexible Structure'''<br />
*:The relative orientations of the domains may be different in your crystal than in the model. If that may be the case, break the model into separate PDB files containing rigid-body units, enter these as separate ensembles, and search for them separately. If you find a convincing solution for one domain, but fail to find a solution for the next domain, you can take advantage of the knowledge that its orientation is likely to be similar to that of the first domain. The ROTAte&nbsp;AROUnd option of the brute rotation search can be used to restrict the search to orientations within, say, 30 degrees of that of the known domain. Allow for close approach of the domains by increasing the allowed clashes with the PACK keyword by, say, 1 for each domain break that you introduce. Note that it is possible to use the brute rotation search as part of the automated molecular replacement pipeline, by changing the choice of the type of rotation search. Alternatively, you could try generating a series of models perturbed by normal modes, with the NMAPdb keyword. One of these may duplicate the hinge motion and provide a good single model.<br />
*'''Poor or Incomplete Model'''<br />
*:Signal-to-noise is reduced by coordinate errors or incompleteness of the model. Since the rotation search has lower signal to begin with than the translation search, it is usually more severely affected. For this reason, it can be very useful to use the subsequent translation search as a way to choose among many (say 1000) orientations. THe MR_AUTO FAST search mode automatically reduces the cutoff for accepting peaks from the fast rotation function if the decault pass does not find a solution with a high z-score, but you can manually reduce this further with the PEAKS and PURGE keywords. You can also try turning off the clustering of fast rotation function peaks because the correct orientation may sit on the shoulder of a peak in the rotation function. <br />
*:As shown convincingly by Schwarzenbacher ''et al.'' (Schwarzenbacher, Godzik, Grzechnik &amp; Jaroszewski, ''Acta Cryst.'' D'''60''', 1229-1236, 2004), judicious editing can make a significant difference in the quality of a distant model. In a number of tests with their data on models below 30% sequence identity, we have found that Phaser works best with a "mixed model" (non-identical sidechains longer than Ser replaced by Ser). In agreement with their results, the best models are generally derived using more sophisticated alignment protocols, such as their FFAS protocol. Use [http://www.phenix-online.org/documentation/sculptor.htm phenix.sculptor] to edit your model.<br />
*'''High Degree of Non-crystallographic Symmetry'''<br />
*:If there are clear peaks in the self-rotation function, you can expect orientations to be related by this known NCS. Methods to automatically use such information will be implemented in a future version of Phaser. In the meantime, you can work out for yourself the orientations that would be consistent with NCS and use the ROTAte&nbsp;AROUnd option to sample similar orientations. Alternatively, you may have an oligomeric model and expect similar NCS in the crystal. First search with the oligomeric model; if this fails, search with a monomer. If that succeeds, you can again use the ROTAte&nbsp;AROUnd option to force a subsequent monomer to adopt an orientation similar to the one you expect.<br />
*'''What <u>not</u> to do'''<br />
*:The automated mode of Phaser is fast when Phaser finds a high Z-score solution to your problem. When Phaser cannot find a solution with a significant Z-score, it "thrashes", meaning it maintains a list of 100-1000's of low Z-score potential solutions and tries to improve them. This can lead to exceptionally long Phaser runs (over a week of CPU time). Such runs are possible because the highly automated script allows many consecutive MR jobs to be run without you having to manually set 100-1000's of jobs running and keep track of the results. "Thrashing" generally does not produce a solution: solutions generally appear relatively quickly or not at all. It is more useful to go back and analyse your models and your data to see where improvements can be made. Your system manager will appreciate you terminating these jobs.<br />
*:It is also not a good idea to effectively remove the packing test. Unless there is specific evidence in the logfile that a high TF-function Z-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond a few (e.g. 1-5) will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
*'''Other suggestions'''<br />
*:Phaser has powerful input, output and scripting facilities that allow a large number of possibilities for altering default behaviour and forcing Phaser to do what you think it should. However, you will need to read the information in the manual below to take advantage of these facilities!<br />
<br />
==How to Define Data==<br />
You need to tell Phaser the name of the mtz file containing your data and the columns in the mtz file to be used using the HKLIn and LABIn keywords. Additional keywords (BINS CELL OUTLier RESOlution SPACegroup) define how the data are used.<br />
<br />
==How to Define Models==<br />
Phaser must be given the models that it will use for molecular replacement. A model in Phaser is referred to as an "ensemble", even when it is described by a single file. This is because it is possible to provide a set of aligned structures as an ensemble, from which a statistically-weighted averaged model is calculated. A molecular replacement model is provided either as one or more aligned pdb files, or as an electron density map, entered as structure factors in an mtz file. Each ensemble is treated as a separate type of rigid body to be placed in the molecular replacement solution. An ensemble should only be defined once, even if there are several copies of the molecule in the asymmetric unit.<br />
<br />
Fundamental to the way in which Phaser uses MR models (either from coordinates or maps) is to estimate how the accuracy of the model falls off as a function of resolution, represented by the Sigma(A) curve. To generate the Sigma(A) curve, Phaser needs to know the RMS coordinate error expected for the model and the fraction of the scattering power in the asymmetric unit that this model contributes.<br />
<br />
A Babinet-style correction is used to account for the effects of disordered solvent on the completeness of the model at low resolution.<br />
<br />
Molecular replacement models are defined with the ENSEmble keyword and the COMPosition keyword. The ENSEmble keyword gives (amongst other things) the RMS deviation for the Sigma(A) curve. The COMPosition keyword is used to deduce the fraction of the scattering power in the asymmetric unit that each ensemble contributes. The composition of the asymmetric unit is defined either by entering the molecular weights or sequences of the components in the asymmetric unit, and giving the number of copies of each. Expert users can also enter the fraction of the scattering of each component directly, although the composition must still be entered for the absolute scale calculation. Please note that the composition supplied to Phaser has to include everything in the asymmetric unit, not just what is being looked for in the current search!<br />
<br />
===Building an Ensemble from Coordinates===<br />
The RMS deviation is determined directly from RMS or indirectly from IDENtity in the ENSEmble<br />
keyword using a formula that depends on the sequence identity and the number of residues in the model.<br />
<br />
The RMS deviation estimated from ID may be an underestimate of the true value if there is a slight conformational change between the model and target structures. To find a solution in these cases it may be necessary to increase the RMS from the default value generated from the ID, by say 0.5 Angstroms. On the other hand, when Phaser succeeds in solving a structure from a model with sequence identity much below 30%, it is often found that the fold is preserved better than the average for that level of sequence identity. So it may be worth submitting a run in which the RMS error is set at, say, 1.5, even if the sequence identity is low. The table below can be used as a guide as to the default RMS value corresponding to ID.<br />
<br />
If you construct a model by homology modelling, remember that the RMS error you expect is essentially the error you expect from the template structure (if not worse!). So specify the sequence identity of the template, not of the homology model.<br />
<br />
Only the model with the highest sequence identity is reported in the output pdb file. Also, HETATM cards in the input pdb file are ignored in the calculation of the structure factors for the ensemble, but are carried through to the output pdb file. Thus, the phases on the output mtz file (which come from the structure factors of the ensemble) do not correspond to those that would be calculated from the output pdb file, when there is more than one pdb file in an ensemble and/or the pdbfile(s) have HETATM records.<br />
<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|+ '''Initial estimate of RMS deviation in Angstrom: Number of residues in model (upper row) versus sequence identity (left column)'''<br />
|-<br />
! !! #50 !! #100 !! #200 !! #300 !! #400 !! #600 !! #850 !! #1000 !! #1500 !! #2000<br />
|-<br />
|'''ID=0%''' || 1.579 || 1.689 || 1.875 || 2.030 || 2.164 || 2.391 || 2.625 || 2.748 || 3.093 || 3.375<br />
|-<br />
|'''ID=10%''' || 1.356 || 1.451 || 1.610 || 1.743 || 1.858 || 2.053 || 2.255 || 2.360 || 2.657 || 2.899<br />
|-<br />
|'''ID=20%''' || 1.165 || 1.246 || 1.383 || 1.497 || 1.596 || 1.764 || 1.936 || 2.027 || 2.281 || 2.489<br />
|-<br />
|'''ID=30%''' || 1.000 || 1.070 || 1.188 || 1.286 || 1.371 || 1.515 || 1.663 || 1.741 || 1.959 || 2.138<br />
|-<br />
|'''ID=40%''' || 0.859 || 0.919 || 1.020 || 1.104 || 1.177 || 1.301 || 1.428 || 1.495 || 1.683 || 1.836<br />
|-<br />
|'''ID=50%''' || 0.738 || 0.789 || 0.876 || 0.948 || 1.011 || 1.117 || 1.227 || 1.284 || 1.445 || 1.577<br />
|-<br />
|'''ID=60%''' || 0.634 || 0.678 || 0.752 || 0.814 || 0.868 || 0.959 || 1.053 || 1.103 || 1.241 || 1.354<br />
|-<br />
|'''ID=70%''' || 0.544 || 0.582 || 0.646 || 0.699 || 0.746 || 0.824 || 0.905 || 0.947 || 1.066 || 1.163<br />
|-<br />
|'''ID=80%''' || 0.467 || 0.500 || 0.555 || 0.601 || 0.640 || 0.708 || 0.777 || 0.813 || 0.915 || 0.999<br />
|-<br />
|'''ID=90%''' || 0.401 || 0.429 || 0.477 || 0.516 || 0.550 || 0.608 || 0.667 || 0.698 || 0.786 || 0.858<br />
|-<br />
|'''ID=100%''' || 0.345 || 0.369 || 0.409 || 0.443 || 0.472 || 0.522 || 0.573 || 0.600 || 0.675 || 0.737<br />
|}<br />
<br />
<br />
====Coordinate Editing====<br />
=====HETATM/LIGANDS=====<br />
Phaser ignores the scattering from HETATM records. The HETATM records are carried though to output with occupancy set to zero. Ligands will therefore not contribute to the scattering used for molecular replacement. The exceptions to this rule are the HETATM records for MSE (seleno-methionine) MSO (seleno-methionine selenoxide) CSE (seleno-cysteine) CSO (seleno-cysteine selenoxide) ALY (acetyllysine) MLY (n-dimethyl-lysine) and MLZ (n-methyl-lysine) which are used in the scattering and carried through to output with their original occupancy. If you wish to include any HETATM records in the scattering the record name use the keyword ENSE modlid HETATOM ON<br />
<br />
=====WATER=====<br />
Water molecules (identified by the residue name OW WAT HOH H2O OH2 MOH WTR or TIP) are deleted from the pdb file on input, are not used in the scattering and are not carried through to file output. If you want to retain water molecules you will need to change the residue name to something other than this (e.g. WWW) so that the atoms are not identified as water. To include the water molecules in the scattering, the HETATM records will also have to be changed to ATOM records as described above.<br />
<br />
===Building an Ensemble from Electron Density===<br />
When using density as a model, it is necessary to specify both the extent (x,y,z limits) of the cut-out region of density, and the centre of this region. With coordinates, Phaser can work this out by itself. This information is needed, for instance, to decide how large rotational steps can be in the rotation search and to carry out the molecular transform interpolation correctly. In the case of electron density, the RMS value does not have the same physical meaning that it has when the model is specified by atomic coordinates, but it is used to judge how the accuracy of the calculated structure factors drops off with resolution. A suitable value for RMS can be obtained, in the case of density from an experimentally-phased map, by choosing a value that makes the SigmaA curve fall off with resolution similarly to the mean figures-of-merit. In the case of density from an EM image reconstruction, the RMS value should make the SigmaA curve fall off similarly to a Fourier correlation curve used to judge the resolution of the EM image.<br />
<br />
For detailed information, including a tutorial with example scripts, see<br />
[[Using Electron Density as a Model| Using density as a model]]<br />
<br />
==How to Define Composition==<br />
The composition defines the total amount of protein and nucleic acid that you have in the asymmetric unit not the fraction of the asymmetric unit that you are searching for.<br />
<br />
===Default Composition===<br />
For convenience, the composition defaults to 50% protein scattering by volume (the average for protein crystals). It is better to enter it explicitly, even if only to check that you have correctly deduced the probable content of your crystal. If your crystal has higher or lower solvent content than this, or contains nucleic acid, then the composition should be entered explicitly.<br />
===Composition by Solvent Content===<br />
Scattering is determined from the solvent content of the crystal, assuming that the crystal contains protein only, and the average distribution of amino acids in protein. If your crystal contains nucleic acid or your protein has an unusual amino acid distribution then the composition should be entered explicitly using the MW or sequence options.<br />
===Composition by Number of Residues in ASU===<br />
Scattering is determined from the number of residues in the asymmetric unit, assuming that the crystal contains protein only or nucleic acid only, and assuming an average distribution of residues for either. If your crystal contains a mixture then the composition should be entered explicitly using the MW or sequence options. If your crystal has an unusual residue distribution then the composition should be entered explicitly using the sequence options.<br />
===Composition by Molecular Weight===<br />
The composition is calculated from the molecular weight of the protein and nucleic acid assuming the protein and nucleic acid have the average distribution of amino acids and bases. If your protein or nucleic acid has an unusual amino acid or base distribution the composition should be entered by sequence. You can mix compositions entered by molecular weight with those entered by sequence.<br />
===Composition by Sequence===<br />
The composition is calculated from the amino acid sequence of the protein and the base sequence of the nucleic acid in fasta format. You can mix compositions entered by molecular weight with those entered by sequence. Individual atoms can be added to the composition with the COMPOSITION ATOM keyword. This allows the explicit addition of heavy atoms in the structure e.g. Fe atoms.<br />
===Composition by Percentage Scattering===<br />
The fraction scattering of each ensemble can be entered directly. The fraction scattering of each ensemble is normally automatically worked out from the average scattering from each ensemble (calculated from the pdb files if entered as coordinates, or from the protein and nucleic acid molecular weights if entered as a map) divided by the total scattering given by the composition, but entering the fraction scattering directly overrides this calculation. This option is for use when the pdb files of the models in the ensemble are unusual e.g. consist only of C-alpha atoms, or only of hydrogen atoms (as in the CLOUDS method for NMR).<br />
<br />
==How to Define Solutions==<br />
Phaser writes out files ending in ".sol" and ".rlist" that contain the solution information from the job. The root of the files is given by the ROOT keyword. By default, the root filename is PHASER. These files can be read back into subsequent runs of Phaser to build up solutions containing more than one molecule in the asymmetric unit.<br />
<br />
"PHASER.sol" files are generated by all modes (rotation function modes with VERBOSE output), and contain the current idea of potential molecular replacement solutions.<br />
<br />
"PHASER.rlist" files are generated by the rotation function modes, and are used as input for performing translation functions.<br />
<br />
For simple MR cases you don't really need to know how to define molecular replacement solutions. However, for difficult cases you might need to edit the files "PHASER.sol" and "PHASER.rlist" files manually<br />
<br />
=== "sol" Files===<br />
SOLUtion 6DIM keywords describe Ensembles that have been oriented by a rotation search and positioned by a translation search. Each Ensemble in the asymmetric unit has its own SOLUtion keyword. When more than one (potential) molecular replacement solution is present, the solutions are separated with the SOLUTION SET keywords.<br />
<br />
==="rlist" Files===<br />
These files define a rotation function list. The peak list is given with a series of SOLUtion TRIAl keywords.<br />
<br />
If a partial solution is already known, then the information for the currently "known" parts of the asymmetric unit is given in the form used for the PHASER.sol file, followed by the list of trial orientations for which a translation function is to be performed.<br />
<br />
===Fixed partial structure===<br />
If you have the coordinates of a partial solution with the pdb coordinates of the known structure in the correct orientation and position, then you can force Phaser to use these coordinates. Use the SOLUTION keyword to fix a rotation of 0 0 0 and a position of 0 0 0 for these coordinates.<br />
<br />
==How to Select Peaks==<br />
<br />
<br />
<br />
The selection of peaks saved for output in the rotation and translation functions can be done in four different ways.<br />
*'''Select by Percentage'''<br />
*: Percentage of the top peak, where the value of the top peak is defined as 100% and the value of the mean is defined as 0%.<br />
*: Default, cutoff=75%. This criteria has the advantange that at least one peak (the top peak) always survives the selection. If the top solution is clear, then only the one solution will be output, but if the distribution of peaks is rather flat, then many peaks will be output for testing in the next part of the MR procedure (e.g. many peaks selected from the rotation function for testing with a translation function). <br />
*'''Select by Z-score'''<br />
*: Number of standard deviations (sigmas) over the mean (the Z-score). <br />
*: Absolute significance test. Not all searches will produce output if the cutoff value is too high (e.g. 5 sigma). <br />
*'''Select by Number'''<br />
*: Number of top peaks to select. <br />
*: If the distribution is very flat then it might be better to select a fixed large number (e.g. 1000) of top rotation peaks for testing in the translation function.<br />
*'''No selection'''<br />
*: All peaks are selected. <br />
*: Enables full 6 dimensional searches, where all the solutions from the rotation function are output for testing in the translation function. This should never be necessary; it would be much faster and probably just as likely to work if the top 1000 peaks were used in this way.<br />
<br />
[[Image:Phaser_selection.gif| Selection criteria]]<br />
<br />
Peaks can also be clustered or not clustered prior to selection in steps 1 and 2.<br />
*'''Clustering Off'''<br />
: All high peaks on the search grid are selected<br />
*'''Clustering On'''<br />
: Points on the search grid with higher neighbouring points are removed from the selection<br />
<br />
<br />
[[Image:Phaser_clustering.gif| Clustering]]<br />
<br />
==How to Control Output==<br />
The output of Phaser can be controlled with optional keywords. <br />
<br />
The ROOT keyword is not compulsory (the default root filename is "PHASER"), but should always be given, so that your jobs have separate and meaningful output filenames.<br />
<br />
The TOPFiles keyword controls the number of potential MR solutions for which PDB and (in the appropriate modes) MTZ files are produced.<br />
<br />
For the MR_AUTO, MR_RNP and MR_LLG modes, unless HKLOut OFF is given as an optional keyword, Phaser produces an MTZ file with "SigmaA" type weighted Fourier map coefficients for producing electron density maps for rebuilding.<br />
<br />
{| class="wikitable" style="text-align:left" width=100%<br />
|-<br />
! MTZ Column Labels !! Description<br />
|-<br />
| FWT/PHWT || Amplitude and phase for 2''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| DELFWT/PHDELWT || Amplitude and phase for ''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| FOM || ''m'', analogous to the "Sim" weight, to estimate the reliability of &alpha;<sub>calc</sub><br />
|-<br />
| HLA/HLB/HLC/HLD || Hendrickson-Lattman coefficients encoding the phase probability distribution<br />
|}<br />
<br />
==Translational Non-crystallographic Symmetry==<br />
<br />
<span style="color:crimson">'''*Warning*''' Solution by MR in the presence of translational non-crystallographic symmetry is not fully automated.</span><br />
<br />
Phaser calculates correction factors for the expected intensities in the presence of translational non-crystallographic symmetry (tNCS), and is able to solve structures with complex patterns of tNCS. '''However, the use of Phaser in the presence of tNCS requires the nature of the tNCS to be understood by the user.''' In simple cases, solution is no more difficult than solution without tNCS, but in complex cases, separate Phaser runs with tNCS turned on and off, and/or the use of different tNCS vectors, may be necessary.<br />
<br />
The output of Phaser will help the user in detecting and understanding the tNCS, but '''the tNCS is not completely characterised by Phaser'''. The default behaviour may or may not be correct for the particular crystal under study.<br />
<br />
Characterization of the tNCS involves understanding the number of copies of the molecule in the asymmetric unit and the translation vectors between them. Molecules related by a tNCS vector will have an associated peak in the native Patterson. Phaser calculates the native Patterson (MODE TNCS) and lists the peaks that are more than 20% of the origin peak. Any given crystal with tNCS may have one or more peaks meeting this criteria.<br />
<br />
===Default tNCS detection and correction===<br />
<span style="color:crimson">Documentation for Phaser-2.7.16 and above</span><br />
<br />
====No tNCS====<br />
No tNCS correction is applied by default if there is<br />
# no peak in the native Patterson <br />
# more than one peak in the native Patterson over 20% of the origin and these peaks are not all the result of a commensurate modulation<br />
<br />
====Pairs of molecules====<br />
By default, if Phaser detects a peak in the native Patterson then Phaser will search for molecules in pairs related by the tNCS vector given by the peak in the native Patterson.<br />
<br />
This will be the correct behaviour if and only if there are an even number of copies of the molecule in the asymmetric unit, clustered into two groups related by a single tNCS vector. There will only be one significant peak in the native Patterson. Fortunately, this is a reasonably common scenario.<br />
<br />
Phaser refines the relative orientation of the molecules in the two groups (rotations of up to 10 degrees will still give rise to a significant native Patterson peak) and uses this information to generate expected intensity factors for the reflections. Solution should be straightforward, with the usual caveat for MR that there is a sufficiently good model.<br />
<br />
Where there is a single peak in the native Patterson, it is often located at a position half way along a unit cell axis or diagonal, representing a pseudo-halving of the unit cell dimensions. However, Phaser is by no means restricted to these sorts of pseudo-cells in its handling of two-fold tNCS, and the tNCS vector can be in a general position.<br />
<br />
===Non-default tNCS correction===<br />
====Higher order tNCS====<br />
Frequently, tNCS does not associate 2 clusters of molecules in the asymmetric unit, but rather there are 3 or more (n) clusters of molecules associated by a series of vectors that are multiples of 1, 2, 3 ... (n-1) times a basic translation vector. Where n times the basic translation vector equates to (very close to) integer multiples of unit cell axes, the tNCS represents a pseudo-cell, and this case is known as commensurate modulation. <br />
<br />
Phaser attempts to automatically detect commensurate modulation. The peaks of the native Patterson are analyzed to find the n-fold relationship. The series will not generally have all peaks the same height. Lower peaks in the series represent relationships where the relative rotations between related molecules are larger. Missing peaks in the series may be below the default 20% of origin cut-off. This can be lowered with TNCS PATT PERCENT <x><br />
<br />
Phaser then sets TNCS NMOL <n> and the vector for the tNCS, and searches for ensembles in multiples of NMOL.<br />
<br />
When there are more than two molecules related by tNCS, Phaser does not refine the orientations between the molecules related by the tNCS.<br />
<br />
However, as for two-fold tNCS, Phaser is not restricted to these sorts of pseudo-cells and the basic tNCS vector can be in a general position, as can the number of copies.<br />
<br />
'''The automatic detection may not give the true tNCS relationship'''. For example, the true commensurate modulation may be a factor of the NMOL automatically detected by Phaser, or there may not be commensurate modulation at all, or commensurate modulation may not be found with the default Pattesron peak height cutoff. In difficult cases, please inspect the Patterson for peaks.<br />
<br />
====Complex tNCS====<br />
If there are many molecules in the asymmetric unit but they are not all related by tNCS, or there are sub-groups of molecules related by different tNCS vectors, then the modulations of the expected intensities due to the tNCS will be much less significant than the cases described above. '''In these cases it is possible that structure solution will be achieved without any tNCS correction factors being applied.''' Indeed, searching for all the copies as tNCS-related multiples when some molecules are not related by tNCS will cause structure solution to fail. To turn off the automatic detection and use of tNCS use the keyword TNCS USE OFF.<br />
<br />
If turning off the TNCS correction factors fails to give a solution, then a good approach is to proceed step-wise. Consider the highest native Patterson peak first and determine that nature of the tNCS associated with it. Use the appropriate correction factors to locate all the molecules with this tNCS. Then take the second independent native Patterson peak and apply the correction factors associated with it to find the second set of molecules, fixing the first, etc. Finally, turn TNCS off to find any orphan molecules.</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Molecular_Replacement&diff=2435Molecular Replacement2018-02-08T16:25:10Z<p>Rdo20: </p>
<hr />
<div><div style="margin-left: 25px; float: right;">__TOC__</div><br />
<br />
'''Quicklink to example scripts''' -> [[MR using keyword input]]<br />
<br />
'''Quicklink to phaser.famos (find_alt_orig_sym_mate) documentation''' -> [[Famos]]<br />
<br />
Phaser should be able to solve most structures with the Automated Molecular Replacement mode, and this is the first mode that you should try. Give Phaser your data ([[#How to Define Data|How to Define Data]]) and your models ([[#How to Define Models|How to Define Models]]), tell Phaser what to search for, and a list of possible spacegroups (in the same point group).<br />
<br />
If this doesn't work (see [[#Has Phaser Solved It?| Has Phaser Solved It?]]), you can try selecting peaks of lower significance in the rotation function in case the real orientation was not within the selection criteria. By default peaks above 75% of the top peak are selected (see [[#How to Select Peaks| How to Select Peaks]]). See [[#What to do in Difficult Cases| What to do in Difficult Cases]] for more hints and tips. If the automated molecular replacement mode doesn't work even with non-default input you need to run the modes of Phaser separately. The possibilities are endless - you can even try exhaustive searches (translations of all orientations) if you want - but experience has shown that most structures that can be solved by Phaser can be solved by relatively simple strategies.<br />
<br />
==Automated Molecular Replacement==<br />
Automated Molecular Replacement combines the anisotropy correction, likelihood enhanced fast rotation function, likelihood enhanced fast translation function, packing and refinement modes for multiple search models and a set of possible spacegroups to automatically solve a structure by molecular replacement. Top solutions are output to the files FILEROOT.sol, FILEROOT.#.mtz and FILEROOT.#.pdb (where "#" refers to the sorted solution number, 1 being the best, and only 1 is output by default). Many structures can be solved by running an automated molecular replacement search with defaults, giving the ensembles that you expect to be easiest to find first.<br />
<br />
At the completion of Molecular Replacement you may wish to place your solutions on a common origin with a previous solution, for which [[Famos | Famos ]] can be used.<br />
<br />
[[Image:Phaser_MR_auto.gif|Flow Diagram for Automated MR]]<br />
<br />
==Should Phaser Solve It?==<br />
The difficulty of a molecular replacement problem depends primarily on two major factors: how well the model will be able to explain the diffraction data (which depends both on the accuracy of the model and on its completeness), and how many reflections can be explained, at least in part. Each reflection provides a piece of information that helps to identify correct MR solutions.<br />
<br />
It is possible to make a reasonable prediction of whether or not a solution will be found. If the quality of the model (its accuracy and completeness) can be estimated, then the expected contribution of each reflection to the total LLG can also be estimated. From a large battery of tests, we know that an LLG of 40 or greater usually indicates a correct solution (at least in the absence of complicating factors such as translational non-crystallographic symmetry, tNCS). Building on this understanding, if it is estimated that the LLG will be 60 or less, then Phaser will assume that the problem is a difficult one, and will implement search procedures optimised for difficult problems.<br />
<br />
==What Resolution of Data Should be Used?==<br />
The signal for a molecular replacement solution should be very clear if the expected value of the LLG is much higher than the minimum required to be fairly certain of a solution. Currently Phaser aims for a minimum LLG of 120 and, if it is possible to achieve an even higher value, given the quality of the model and the quantity of diffraction data, then the resolution for the initial search is limited to the value required to achieve an expected LLG of 120. Data to the full resolution are still used for a final rigid-body refinement, or in a second pass if a clear solution is not found in the first attempt.<br />
<br />
However, if the model is expected to have a large RMS error (based usually on the correlation between sequence identity and RMS error), then data to high resolution will not contribute any significant signal. Regardless of the expected LLG at the highest resolution limit, the resolution used is limited to 1.8 times the estimated RMS error of the model, because this resolution limit gives about 99% of the LLG that could be achieved.<br />
<br />
Because Phaser implements strategies designed to solve structures with as much confidence as possible, as efficiently as possible, it is best to leave the choice of resolution to Phaser, at least in the first instance.<br />
<br />
==Has Phaser Solved It?==<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! TF Z-score !! Have I solved it?<br />
|-<br />
| less than 5 || no<br />
|-<br />
| 5 - 6 || unlikely<br />
|-<br />
| 6 - 7 || possibly<br />
|-<br />
| 7 - 8 || probably<br />
|-<br />
| more than 8* ||definitely<br />
|-<br />
|colspan="2" style="text-align: center;" | *''6 for 1st model in monoclinic space groups''<br />
|} <br />
<br />
Ideally, a unique solution with a strong signal will be found at the end of the search. If you are searching for multiple components, then ideally the search for each component will also give a strong signal. However if the signal-to-noise of your search is low, there will be noise peaks and multiple ambiguous solutions. Signal-to-noise is judged using the '''Z-score''', which is computed by comparing the LLG values from the rotation or translation search with LLG values for a set of random rotations or translations. The mean and the RMS deviation from the mean are computed from the random set, then the Z-score for a search peak is defined as its LLG minus the mean, all divided by the RMS deviation, ''i.e. '' '''the number of standard deviations above (or below) the mean. '''<br />
<br />
For a rotation function, the correct orientation may be well down the list with a Z-score (number of standard deviations above the mean value, or RFZ) under 4, and it is often not possible to identify the correct orientation until a translation function is performed and yields a clear solution. Note that the signal-to-noise of the rotation function drops with increasing number of primitive symmetry operations (the number of different orientations for symmetry-related molecules), because there is more uncertainty about how the structure factor contributions from symmetry-related copies will add up.<br />
<br />
For a translation function the correct solution will generally have a Z-score (TFZ) over 5 and be well separated from the rest of the solutions. Of course, there will always be exceptions! The table gives a very rough guide to interpreting TFZ scores. This table will be updated, as we learn more from systematic molecular replacement trials.<br />
<br />
When you are searching for multiple components, the signal may be low for the first few components but, as the model becomes more complete, the signal should become stronger. Finding a clear solution for a new component is a good sign that the partial solution to which that component was added was indeed correct.<br />
<br />
You should always at least glance through the summary of the logfile. One thing to look for, in particular, is whether any translation solutions with a high Z-score have been rejected by the packing step. By default up to 5 percent of marker atoms (C-alpha atoms for protein) are allowed to be involved in clashes. A solution with more clashes may still be correct, and the clashes may arise only because of differences in small surface loops. If this happens, repeat the run allowing a suitable number of clashes. Note that, unless there is specific evidence in the logfile that a high TFZ-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond the default will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
<br />
Note that, by default, Phaser will produce a single PDB file corresponding to the top solution found (if any), so finding a single PDB file in your output directory is not an indication that the search succeeded! You have to look, at least, at the summary of the logfile, or at the list of possible solutions in the .sol file that is produced if you run Phaser from ccp4i or command-line scripts.<br />
<br />
==Annotation==<br />
<br />
A highly compact summary of the history of the statistics of a solution is given in the SOLUTION SET in the .sol file. This is a good place to start your analysis of the output. The annotation gives the Z-score of the solution at each rotation and translation function, the number of clashes in the packing, and the refined LLG.<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! Annotation !! Meaning<br />
|-<br />
| RFZ= || Rotation Function Z-score<br />
|-<br />
| TFZ= || Translation Function Z-score<br />
|-<br />
| PAK= || Number of packing clashes<br />
|-<br />
| LLG= || LLG after refinement. Will be repeated when a low resolution refinement is followed by a high resolution refinement.<br />
|-<br />
| TFZ== || Translation Function Z-score equivalent, only calculated for the top solution after refinement (or for the number of top files specified by TOPFILES)<br />
|-<br />
| RF++ || Rotation angle from previous strong solution has been used in the addition of next solution<br />
|-<br />
| RF*0 || Rotation angle 000 identified by low R-factor of input model<br />
|-<br />
| TFZ=* || First molecule in P1 (arbitrary origin, no Translation Function required)<br />
|-<br />
| TF*0 || Translation vector 000 identified by low R-factor of input model<br />
|-<br />
| (&&nbsp;... & ...) || Set of TFZ PAK and LLG values for placements that were amalgamated (more than one placement from a single Translation Function)<br />
|-<br />
| LLG+=(...&nbsp;&&nbsp;...)&nbsp;|| Set of LLG values calculated during amalgamation, which will always be increasing in value<br />
|-<br />
| +TNCS || Components added by Translational NCS relation<br />
|-<br />
| *T=<i>n</i> || Solution matches template solution <i>n</i><br />
|} <br />
<br />
Two versions of TFZ (the translation function Z-score) now appear for each component. The first ("TFZ=") is the Z-score from the actual translation search, which depends on the accuracy of the orientation used for that search. The second ("TFZ==") is the TFZ-equivalent, which indicates what the TFZ score would have been with the correct (refined) orientation. You should see the TFZ-equivalent is high at least for the final components of the solution, and that the LLG (log-likelihood gain) increases as each component of the solution is added. For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU SET RFZ=10.7 TFZ=24.3 PAK=0 LLG=472 TFZ==24.7 RFZ=6.4 TFZ=24.4 PAK=0 LLG=1006 TFZ==29.7 LLG=1006 TFZ==29.7<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
Note that the Euler angles in Phaser follow the same convention as those defined for the Crowther fast rotation function, i.e. z-y-z (rotate around the z-axis, followed by the new y-axis, followed by the new z-axis).<br />
<br />
==History==<br />
<br />
A highly compact summary of the history of the peak positions of a solution is given in the SOLUTION HISTORY in the .sol file. Together with the SOLUTION SET annotation, this is useful in your analysis of the output. <br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! History !! Meaning<br />
|-<br />
| RF/TF(r/t:n) || (r) Rotation Function peak number/(t) Translation Function peak number for the rotation function : (n) number of peak in final merged and sorted list<br />
|-<br />
| PAK(n:m) || (n) input solution number : (m) output solution number after packing condition applied<br />
|-<br />
| RNP(m,a,b,c,... : p) || All input peaks amalgamated after refinement to give output solution number (m and others): (p) output solution number<br />
|-<br />
| FUSE(A,B,C) || Solution numbers merged in amalgamation<br />
|} <br />
<br />
For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU HISTORY RF/TF(1/1:1)PAK(1:1)RNP(1:1)RNP(1:1)<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
A more complicated structure solution may have<br />
<br />
SOLU HISTORY RF/TF(7/1:10)PAK(10:10)RNP(10,12,13,11,17,16,18,25,3,8,22,21,20,7,969,6,5,201,9,4,390,2,1,19:1)RNP(1:1)<br />
<br />
==What to do in Difficult Cases==<br />
<br />
Not every structure can be solved by molecular replacement, but the right strategy can push the limits. What to do when the default jobs fail depends on why your structure is difficult.<br />
*'''Flexible Structure'''<br />
*:The relative orientations of the domains may be different in your crystal than in the model. If that may be the case, break the model into separate PDB files containing rigid-body units, enter these as separate ensembles, and search for them separately. If you find a convincing solution for one domain, but fail to find a solution for the next domain, you can take advantage of the knowledge that its orientation is likely to be similar to that of the first domain. The ROTAte&nbsp;AROUnd option of the brute rotation search can be used to restrict the search to orientations within, say, 30 degrees of that of the known domain. Allow for close approach of the domains by increasing the allowed clashes with the PACK keyword by, say, 1 for each domain break that you introduce. Note that it is possible to use the brute rotation search as part of the automated molecular replacement pipeline, by changing the choice of the type of rotation search. Alternatively, you could try generating a series of models perturbed by normal modes, with the NMAPdb keyword. One of these may duplicate the hinge motion and provide a good single model.<br />
*'''Poor or Incomplete Model'''<br />
*:Signal-to-noise is reduced by coordinate errors or incompleteness of the model. Since the rotation search has lower signal to begin with than the translation search, it is usually more severely affected. For this reason, it can be very useful to use the subsequent translation search as a way to choose among many (say 1000) orientations. THe MR_AUTO FAST search mode automatically reduces the cutoff for accepting peaks from the fast rotation function if the decault pass does not find a solution with a high z-score, but you can manually reduce this further with the PEAKS and PURGE keywords. You can also try turning off the clustering of fast rotation function peaks because the correct orientation may sit on the shoulder of a peak in the rotation function. <br />
*:As shown convincingly by Schwarzenbacher ''et al.'' (Schwarzenbacher, Godzik, Grzechnik &amp; Jaroszewski, ''Acta Cryst.'' D'''60''', 1229-1236, 2004), judicious editing can make a significant difference in the quality of a distant model. In a number of tests with their data on models below 30% sequence identity, we have found that Phaser works best with a "mixed model" (non-identical sidechains longer than Ser replaced by Ser). In agreement with their results, the best models are generally derived using more sophisticated alignment protocols, such as their FFAS protocol. Use [http://www.phenix-online.org/documentation/sculptor.htm phenix.sculptor] to edit your model.<br />
*'''High Degree of Non-crystallographic Symmetry'''<br />
*:If there are clear peaks in the self-rotation function, you can expect orientations to be related by this known NCS. Methods to automatically use such information will be implemented in a future version of Phaser. In the meantime, you can work out for yourself the orientations that would be consistent with NCS and use the ROTAte&nbsp;AROUnd option to sample similar orientations. Alternatively, you may have an oligomeric model and expect similar NCS in the crystal. First search with the oligomeric model; if this fails, search with a monomer. If that succeeds, you can again use the ROTAte&nbsp;AROUnd option to force a subsequent monomer to adopt an orientation similar to the one you expect.<br />
*'''What <u>not</u> to do'''<br />
*:The automated mode of Phaser is fast when Phaser finds a high Z-score solution to your problem. When Phaser cannot find a solution with a significant Z-score, it "thrashes", meaning it maintains a list of 100-1000's of low Z-score potential solutions and tries to improve them. This can lead to exceptionally long Phaser runs (over a week of CPU time). Such runs are possible because the highly automated script allows many consecutive MR jobs to be run without you having to manually set 100-1000's of jobs running and keep track of the results. "Thrashing" generally does not produce a solution: solutions generally appear relatively quickly or not at all. It is more useful to go back and analyse your models and your data to see where improvements can be made. Your system manager will appreciate you terminating these jobs.<br />
*:It is also not a good idea to effectively remove the packing test. Unless there is specific evidence in the logfile that a high TF-function Z-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond a few (e.g. 1-5) will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
*'''Other suggestions'''<br />
*:Phaser has powerful input, output and scripting facilities that allow a large number of possibilities for altering default behaviour and forcing Phaser to do what you think it should. However, you will need to read the information in the manual below to take advantage of these facilities!<br />
<br />
==How to Define Data==<br />
You need to tell Phaser the name of the mtz file containing your data and the columns in the mtz file to be used using the HKLIn and LABIn keywords. Additional keywords (BINS CELL OUTLier RESOlution SPACegroup) define how the data are used.<br />
<br />
==How to Define Models==<br />
Phaser must be given the models that it will use for molecular replacement. A model in Phaser is referred to as an "ensemble", even when it is described by a single file. This is because it is possible to provide a set of aligned structures as an ensemble, from which a statistically-weighted averaged model is calculated. A molecular replacement model is provided either as one or more aligned pdb files, or as an electron density map, entered as structure factors in an mtz file. Each ensemble is treated as a separate type of rigid body to be placed in the molecular replacement solution. An ensemble should only be defined once, even if there are several copies of the molecule in the asymmetric unit.<br />
<br />
Fundamental to the way in which Phaser uses MR models (either from coordinates or maps) is to estimate how the accuracy of the model falls off as a function of resolution, represented by the Sigma(A) curve. To generate the Sigma(A) curve, Phaser needs to know the RMS coordinate error expected for the model and the fraction of the scattering power in the asymmetric unit that this model contributes.<br />
<br />
A Babinet-style correction is used to account for the effects of disordered solvent on the completeness of the model at low resolution.<br />
<br />
Molecular replacement models are defined with the ENSEmble keyword and the COMPosition keyword. The ENSEmble keyword gives (amongst other things) the RMS deviation for the Sigma(A) curve. The COMPosition keyword is used to deduce the fraction of the scattering power in the asymmetric unit that each ensemble contributes. The composition of the asymmetric unit is defined either by entering the molecular weights or sequences of the components in the asymmetric unit, and giving the number of copies of each. Expert users can also enter the fraction of the scattering of each component directly, although the composition must still be entered for the absolute scale calculation. Please note that the composition supplied to Phaser has to include everything in the asymmetric unit, not just what is being looked for in the current search!<br />
<br />
===Building an Ensemble from Coordinates===<br />
The RMS deviation is determined directly from RMS or indirectly from IDENtity in the ENSEmble<br />
keyword using a formula that depends on the sequence identity and the number of residues in the model.<br />
<br />
The RMS deviation estimated from ID may be an underestimate of the true value if there is a slight conformational change between the model and target structures. To find a solution in these cases it may be necessary to increase the RMS from the default value generated from the ID, by say 0.5 Angstroms. On the other hand, when Phaser succeeds in solving a structure from a model with sequence identity much below 30%, it is often found that the fold is preserved better than the average for that level of sequence identity. So it may be worth submitting a run in which the RMS error is set at, say, 1.5, even if the sequence identity is low. The table below can be used as a guide as to the default RMS value corresponding to ID.<br />
<br />
If you construct a model by homology modelling, remember that the RMS error you expect is essentially the error you expect from the template structure (if not worse!). So specify the sequence identity of the template, not of the homology model.<br />
<br />
Only the model with the highest sequence identity is reported in the output pdb file. Also, HETATM cards in the input pdb file are ignored in the calculation of the structure factors for the ensemble, but are carried through to the output pdb file. Thus, the phases on the output mtz file (which come from the structure factors of the ensemble) do not correspond to those that would be calculated from the output pdb file, when there is more than one pdb file in an ensemble and/or the pdbfile(s) have HETATM records.<br />
<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|+ '''Initial estimate of RMS deviation in Angstrom: Number of residues in model (upper row) versus sequence identity (left column)'''<br />
|-<br />
! !! #50 !! #100 !! #200 !! #300 !! #400 !! #600 !! #850 !! #1000 !! #1500 !! #2000<br />
|-<br />
|'''ID=0%''' || 1.579 || 1.689 || 1.875 || 2.030 || 2.164 || 2.391 || 2.625 || 2.748 || 3.093 || 3.375<br />
|-<br />
|'''ID=10%''' || 1.356 || 1.451 || 1.610 || 1.743 || 1.858 || 2.053 || 2.255 || 2.360 || 2.657 || 2.899<br />
|-<br />
|'''ID=20%''' || 1.165 || 1.246 || 1.383 || 1.497 || 1.596 || 1.764 || 1.936 || 2.027 || 2.281 || 2.489<br />
|-<br />
|'''ID=30%''' || 1.000 || 1.070 || 1.188 || 1.286 || 1.371 || 1.515 || 1.663 || 1.741 || 1.959 || 2.138<br />
|-<br />
|'''ID=40%''' || 0.859 || 0.919 || 1.020 || 1.104 || 1.177 || 1.301 || 1.428 || 1.495 || 1.683 || 1.836<br />
|-<br />
|'''ID=50%''' || 0.738 || 0.789 || 0.876 || 0.948 || 1.011 || 1.117 || 1.227 || 1.284 || 1.445 || 1.577<br />
|-<br />
|'''ID=60%''' || 0.634 || 0.678 || 0.752 || 0.814 || 0.868 || 0.959 || 1.053 || 1.103 || 1.241 || 1.354<br />
|-<br />
|'''ID=70%''' || 0.544 || 0.582 || 0.646 || 0.699 || 0.746 || 0.824 || 0.905 || 0.947 || 1.066 || 1.163<br />
|-<br />
|'''ID=80%''' || 0.467 || 0.500 || 0.555 || 0.601 || 0.640 || 0.708 || 0.777 || 0.813 || 0.915 || 0.999<br />
|-<br />
|'''ID=90%''' || 0.401 || 0.429 || 0.477 || 0.516 || 0.550 || 0.608 || 0.667 || 0.698 || 0.786 || 0.858<br />
|-<br />
|'''ID=100%''' || 0.345 || 0.369 || 0.409 || 0.443 || 0.472 || 0.522 || 0.573 || 0.600 || 0.675 || 0.737<br />
|}<br />
<br />
<br />
====Coordinate Editing====<br />
=====HETATM/LIGANDS=====<br />
Phaser ignores the scattering from HETATM records. The HETATM records are carried though to output with occupancy set to zero. Ligands will therefore not contribute to the scattering used for molecular replacement. The exceptions to this rule are the HETATM records for MSE (seleno-methionine) MSO (seleno-methionine selenoxide) CSE (seleno-cysteine) CSO (seleno-cysteine selenoxide) ALY (acetyllysine) MLY (n-dimethyl-lysine) and MLZ (n-methyl-lysine) which are used in the scattering and carried through to output with their original occupancy. If you wish to include any HETATM records in the scattering the record name use the keyword ENSE modlid HETATOM ON<br />
<br />
=====WATER=====<br />
Water molecules (identified by the residue name OW WAT HOH H2O OH2 MOH WTR or TIP) are deleted from the pdb file on input, are not used in the scattering and are not carried through to file output. If you want to retain water molecules you will need to change the residue name to something other than this (e.g. WWW) so that the atoms are not identified as water. To include the water molecules in the scattering, the HETATM records will also have to be changed to ATOM records as described above.<br />
<br />
===Building an Ensemble from Electron Density===<br />
When using density as a model, it is necessary to specify both the extent (x,y,z limits) of the cut-out region of density, and the centre of this region. With coordinates, Phaser can work this out by itself. This information is needed, for instance, to decide how large rotational steps can be in the rotation search and to carry out the molecular transform interpolation correctly. In the case of electron density, the RMS value does not have the same physical meaning that it has when the model is specified by atomic coordinates, but it is used to judge how the accuracy of the calculated structure factors drops off with resolution. A suitable value for RMS can be obtained, in the case of density from an experimentally-phased map, by choosing a value that makes the SigmaA curve fall off with resolution similarly to the mean figures-of-merit. In the case of density from an EM image reconstruction, the RMS value should make the SigmaA curve fall off similarly to a Fourier correlation curve used to judge the resolution of the EM image.<br />
<br />
For detailed information, including a tutorial with example scripts, see<br />
[[Using Electron Density as a Model| Using density as a model]]<br />
<br />
==How to Define Composition==<br />
The composition defines the total amount of protein and nucleic acid that you have in the asymmetric unit not the fraction of the asymmetric unit that you are searching for.<br />
<br />
===Default Composition===<br />
For convenience, the composition defaults to 50% protein scattering by volume (the average for protein crystals). It is better to enter it explicitly, even if only to check that you have correctly deduced the probable content of your crystal. If your crystal has higher or lower solvent content than this, or contains nucleic acid, then the composition should be entered explicitly.<br />
===Composition by Solvent Content===<br />
Scattering is determined from the solvent content of the crystal, assuming that the crystal contains protein only, and the average distribution of amino acids in protein. If your crystal contains nucleic acid or your protein has an unusual amino acid distribution then the composition should be entered explicitly using the MW or sequence options.<br />
===Composition by Number of Residues in ASU===<br />
Scattering is determined from the number of residues in the asymmetric unit, assuming that the crystal contains protein only or nucleic acid only, and assuming an average distribution of residues for either. If your crystal contains a mixture then the composition should be entered explicitly using the MW or sequence options. If your crystal has an unusual residue distribution then the composition should be entered explicitly using the sequence options.<br />
===Composition by Molecular Weight===<br />
The composition is calculated from the molecular weight of the protein and nucleic acid assuming the protein and nucleic acid have the average distribution of amino acids and bases. If your protein or nucleic acid has an unusual amino acid or base distribution the composition should be entered by sequence. You can mix compositions entered by molecular weight with those entered by sequence.<br />
===Composition by Sequence===<br />
The composition is calculated from the amino acid sequence of the protein and the base sequence of the nucleic acid in fasta format. You can mix compositions entered by molecular weight with those entered by sequence. Individual atoms can be added to the composition with the COMPOSITION ATOM keyword. This allows the explicit addition of heavy atoms in the structure e.g. Fe atoms.<br />
===Composition by Percentage Scattering===<br />
The fraction scattering of each ensemble can be entered directly. The fraction scattering of each ensemble is normally automatically worked out from the average scattering from each ensemble (calculated from the pdb files if entered as coordinates, or from the protein and nucleic acid molecular weights if entered as a map) divided by the total scattering given by the composition, but entering the fraction scattering directly overrides this calculation. This option is for use when the pdb files of the models in the ensemble are unusual e.g. consist only of C-alpha atoms, or only of hydrogen atoms (as in the CLOUDS method for NMR).<br />
<br />
==How to Define Solutions==<br />
Phaser writes out files ending in ".sol" and ".rlist" that contain the solution information from the job. The root of the files is given by the ROOT keyword. By default, the root filename is PHASER. These files can be read back into subsequent runs of Phaser to build up solutions containing more than one molecule in the asymmetric unit.<br />
<br />
"PHASER.sol" files are generated by all modes (rotation function modes with VERBOSE output), and contain the current idea of potential molecular replacement solutions.<br />
<br />
"PHASER.rlist" files are generated by the rotation function modes, and are used as input for performing translation functions.<br />
<br />
For simple MR cases you don't really need to know how to define molecular replacement solutions. However, for difficult cases you might need to edit the files "PHASER.sol" and "PHASER.rlist" files manually<br />
<br />
=== "sol" Files===<br />
SOLUtion 6DIM keywords describe Ensembles that have been oriented by a rotation search and positioned by a translation search. Each Ensemble in the asymmetric unit has its own SOLUtion keyword. When more than one (potential) molecular replacement solution is present, the solutions are separated with the SOLUTION SET keywords.<br />
<br />
==="rlist" Files===<br />
These files define a rotation function list. The peak list is given with a series of SOLUtion TRIAl keywords.<br />
<br />
If a partial solution is already known, then the information for the currently "known" parts of the asymmetric unit is given in the form used for the PHASER.sol file, followed by the list of trial orientations for which a translation function is to be performed.<br />
<br />
===Fixed partial structure===<br />
If you have the coordinates of a partial solution with the pdb coordinates of the known structure in the correct orientation and position, then you can force Phaser to use these coordinates. Use the SOLUTION keyword to fix a rotation of 0 0 0 and a position of 0 0 0 for these coordinates.<br />
<br />
==How to Select Peaks==<br />
<br />
<br />
<br />
The selection of peaks saved for output in the rotation and translation functions can be done in four different ways.<br />
*'''Select by Percentage'''<br />
*: Percentage of the top peak, where the value of the top peak is defined as 100% and the value of the mean is defined as 0%.<br />
*: Default, cutoff=75%. This criteria has the advantange that at least one peak (the top peak) always survives the selection. If the top solution is clear, then only the one solution will be output, but if the distribution of peaks is rather flat, then many peaks will be output for testing in the next part of the MR procedure (e.g. many peaks selected from the rotation function for testing with a translation function). <br />
*'''Select by Z-score'''<br />
*: Number of standard deviations (sigmas) over the mean (the Z-score). <br />
*: Absolute significance test. Not all searches will produce output if the cutoff value is too high (e.g. 5 sigma). <br />
*'''Select by Number'''<br />
*: Number of top peaks to select. <br />
*: If the distribution is very flat then it might be better to select a fixed large number (e.g. 1000) of top rotation peaks for testing in the translation function.<br />
*'''No selection'''<br />
*: All peaks are selected. <br />
*: Enables full 6 dimensional searches, where all the solutions from the rotation function are output for testing in the translation function. This should never be necessary; it would be much faster and probably just as likely to work if the top 1000 peaks were used in this way.<br />
<br />
[[Image:Phaser_selection.gif| Selection criteria]]<br />
<br />
Peaks can also be clustered or not clustered prior to selection in steps 1 and 2.<br />
*'''Clustering Off'''<br />
: All high peaks on the search grid are selected<br />
*'''Clustering On'''<br />
: Points on the search grid with higher neighbouring points are removed from the selection<br />
<br />
<br />
[[Image:Phaser_clustering.gif| Clustering]]<br />
<br />
==How to Control Output==<br />
The output of Phaser can be controlled with optional keywords. <br />
<br />
The ROOT keyword is not compulsory (the default root filename is "PHASER"), but should always be given, so that your jobs have separate and meaningful output filenames.<br />
<br />
The TOPFiles keyword controls the number of potential MR solutions for which PDB and (in the appropriate modes) MTZ files are produced.<br />
<br />
For the MR_AUTO, MR_RNP and MR_LLG modes, unless HKLOut OFF is given as an optional keyword, Phaser produces an MTZ file with "SigmaA" type weighted Fourier map coefficients for producing electron density maps for rebuilding.<br />
<br />
{| class="wikitable" style="text-align:left" width=100%<br />
|-<br />
! MTZ Column Labels !! Description<br />
|-<br />
| FWT/PHWT || Amplitude and phase for 2''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| DELFWT/PHDELWT || Amplitude and phase for ''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| FOM || ''m'', analogous to the "Sim" weight, to estimate the reliability of &alpha;<sub>calc</sub><br />
|-<br />
| HLA/HLB/HLC/HLD || Hendrickson-Lattman coefficients encoding the phase probability distribution<br />
|}<br />
<br />
==Translational Non-crystallographic Symmetry==<br />
<br />
<span style="color:crimson">'''*Warning*''' Solution by MR in the presence of translational non-crystallographic symmetry is not fully automated.</span><br />
<br />
Phaser calculates correction factors for the expected intensities in the presence of translational non-crystallographic symmetry (tNCS), and is able to solve structures with complex patterns of tNCS. '''However, the use of Phaser in the presence of tNCS requires the nature of the tNCS to be understood by the user.''' In simple cases, solution is no more difficult than solution without tNCS, but in complex cases, separate Phaser runs with tNCS turned on and off, and/or the use of different tNCS vectors, may be necessary.<br />
<br />
The output of Phaser will help the user in detecting and understanding the tNCS, but '''the tNCS is not completely characterised by Phaser'''. The default behaviour may or may not be correct for the particular crystal under study.<br />
<br />
Characterization of the tNCS involves understanding the number of copies of the molecule in the asymmetric unit and the translation vectors between them. Molecules related by a tNCS vector will have an associated peak in the native Patterson. Phaser calculates the native Patterson (MODE TNCS) and lists the peaks that are more than 20% of the origin peak. Any given crystal with tNCS may have one or more peaks meeting this criteria.<br />
<br />
===Default tNCS detection and correction===<br />
<span style="color:crimson">Documentation for Phaser-2.7.16 and above</span><br />
<br />
====No tNCS====<br />
No tNCS correction is applied by default if there is<br />
# no peak in the native Patterson <br />
# more than one peak in the native Patterson over 20% of the origin and these peaks are not all the result of a commensurate modulation<br />
<br />
====Pairs of molecules====<br />
By default, if Phaser detects a peak in the native Patterson then Phaser will search for molecules in pairs related by the tNCS vector given by the peak in the native Patterson.<br />
<br />
This will be the correct behaviour if and only if there are an even number of copies of the molecule in the asymmetric unit, clustered into two groups related by a single tNCS vector. There will only be one significant peak in the native Patterson. Fortunately, this is a reasonably common scenario.<br />
<br />
Phaser refines the relative orientation of the molecules in the two groups (rotations of up to 10 degrees will still give rise to a significant native Patterson peak) and uses this information to generate expected intensity factors for the reflections. Solution should be straightforward, with the usual caveat for MR that there is a sufficiently good model.<br />
<br />
Where there is a single peak in the native Patterson, it is often located at a position half way along a unit cell axis or diagonal, representing a pseudo-halving of the unit cell dimensions. However, Phaser is by no means restricted to these sorts of pseudo-cells in its handling of two-fold tNCS, and the tNCS vector can be in a general position.<br />
<br />
===Non-default tNCS correction===<br />
====Higher order tNCS====<br />
Frequently, tNCS does not associate 2 clusters of molecules in the asymmetric unit, but rather there are 3 or more (n) clusters of molecules associated by a series of vectors that are multiples of 1, 2, 3 ... (n-1) times a basic translation vector. Where n times the basic translation vector equates to (very close to) integer multiples of unit cell axes, the tNCS represents a pseudo-cell, and this case is known as commensurate modulation. <br />
<br />
Phaser attempts to automatically detect commensurate modulation. The peaks of the native Patterson are analyzed to find the n-fold relationship. The series will not generally have all peaks the same height. Lower peaks in the series represent relationships where the relative rotations between related molecules are larger. Missing peaks in the series may be below the default 20% of origin cut-off. This can be lowered with TNCS PATT PERCENT <x><br />
<br />
Phaser then sets TNCS NMOL <n> and the vector for the tNCS, and searches for ensembles in multiples of NMOL.<br />
<br />
When there are more than two molecules related by tNCS, Phaser does not refine the orientations between the molecules related by the tNCS.<br />
<br />
However, as for two-fold tNCS, Phaser is not restricted to these sorts of pseudo-cells and the basic tNCS vector can be in a general position, as can the number of copies.<br />
<br />
'''The automatic detection may not give the true tNCS relationship'''. For example, the true commensurate modulation may be a factor of the NMOL automatically detected by Phaser, or there may not be commensurate modulation at all, or commensurate modulation may not be found with the default Pattesron peak height cutoff. In difficult cases, please inspect the Patterson for peaks.<br />
<br />
====Complex tNCS====<br />
If there are many molecules in the asymmetric unit but they are not all related by tNCS, or there are sub-groups of molecules related by different tNCS vectors, then the modulations of the expected intensities due to the tNCS will be much less significant than the cases described above. '''In these cases it is possible that structure solution will be achieved without any tNCS correction factors being applied.''' Indeed, searching for all the copies as tNCS-related multiples when some molecules are not related by tNCS will cause structure solution to fail. To turn off the automatic detection and use of tNCS use the keyword TNCS USE OFF.<br />
<br />
If turning off the TNCS correction factors fails to give a solution, then a good approach is to proceed step-wise. Consider the highest native Patterson peak first and determine that nature of the tNCS associated with it. Use the appropriate correction factors to locate all the molecules with this tNCS. Then take the second independent native Patterson peak and apply the correction factors associated with it to find the second set of molecules, fixing the first, etc. Finally, turn TNCS off to find any orphan molecules.</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Molecular_Replacement&diff=2434Molecular Replacement2018-02-08T16:22:31Z<p>Rdo20: </p>
<hr />
<div><div style="margin-left: 25px; float: right;">__TOC__</div><br />
<br />
'''Quicklink to example scripts''' -> [[MR using keyword input]]<br />
<br />
'''Quicklink to phaser.famos (find_alt_orig_sym_mate) documentation''' -> [[Famos]]<br />
<br />
Phaser should be able to solve most structures with the Automated Molecular Replacement mode, and this is the first mode that you should try. Give Phaser your data ([[#How to Define Data|How to Define Data]]) and your models ([[#How to Define Models|How to Define Models]]), tell Phaser what to search for, and a list of possible spacegroups (in the same point group).<br />
<br />
If this doesn't work (see [[#Has Phaser Solved It?| Has Phaser Solved It?]]), you can try selecting peaks of lower significance in the rotation function in case the real orientation was not within the selection criteria. By default peaks above 75% of the top peak are selected (see [[#How to Select Peaks| How to Select Peaks]]). See [[#What to do in Difficult Cases| What to do in Difficult Cases]] for more hints and tips. If the automated molecular replacement mode doesn't work even with non-default input you need to run the modes of Phaser separately. The possibilities are endless - you can even try exhaustive searches (translations of all orientations) if you want - but experience has shown that most structures that can be solved by Phaser can be solved by relatively simple strategies.<br />
<br />
==Automated Molecular Replacement==<br />
Automated Molecular Replacement combines the anisotropy correction, likelihood enhanced fast rotation function, likelihood enhanced fast translation function, packing and refinement modes for multiple search models and a set of possible spacegroups to automatically solve a structure by molecular replacement. Top solutions are output to the files FILEROOT.sol, FILEROOT.#.mtz and FILEROOT.#.pdb (where "#" refers to the sorted solution number, 1 being the best, and only 1 is output by default). Many structures can be solved by running an automated molecular replacement search with defaults, giving the ensembles that you expect to be easiest to find first.<br />
<br />
At the completion of Molecular Replacement you may wish to place your solutions on a common origin with a previous solution, for which [[Famos | Famos ]] can be used.<br />
<br />
[[Image:Phaser_MR_auto.gif|Flow Diagram for Automated MR]]<br />
<br />
==Should Phaser Solve It?==<br />
The difficulty of a molecular replacement problem depends primarily on two major factors: how well the model will be able to explain the diffraction data (which depends both on the accuracy of the model and on its completeness), and how many reflections can be explained, at least in part. Each reflection provides a piece of information that helps to identify correct MR solutions.<br />
<br />
It is possible to make a reasonable prediction of whether or not a solution will be found. If the quality of the model (its accuracy and completeness) can be estimated, then the expected contribution of each reflection to the total LLG can also be estimated. From a large battery of tests, we know that an LLG of 40 or greater usually indicates a correct solution (at least in the absence of complicating factors such as translational non-crystallographic symmetry, tNCS). Building on this understanding, if it is estimated that the LLG will be 60 or less, then Phaser will assume that the problem is a difficult one, and will implement search procedures optimised for difficult problems.<br />
<br />
==What Resolution of Data Should be Used?==<br />
The signal for a molecular replacement solution should be very clear if the expected value of the LLG is much higher than the minimum required to be fairly certain of a solution. Currently Phaser aims for a minimum LLG of 120 and, if it is possible to achieve an even higher value, given the quality of the model and the quantity of diffraction data, then the resolution for the initial search is limited to the value required to achieve an expected LLG of 120. Data to the full resolution are still used for a final rigid-body refinement, or in a second pass if a clear solution is not found in the first attempt.<br />
<br />
However, if the model is expected to have a large RMS error (based usually on the correlation between sequence identity and RMS error), then data to high resolution will not contribute any significant signal. Regardless of the expected LLG at the highest resolution limit, the resolution used is limited to 1.8 times the estimated RMS error of the model, because this resolution limit gives about 99% of the LLG that could be achieved.<br />
<br />
Because Phaser implements strategies designed to solve structures with as much confidence as possible, as efficiently as possible, it is best to leave the choice of resolution to Phaser, at least in the first instance.<br />
<br />
==Has Phaser Solved It?==<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! TF Z-score !! Have I solved it?<br />
|-<br />
| less than 5 || no<br />
|-<br />
| 5 - 6 || unlikely<br />
|-<br />
| 6 - 7 || possibly<br />
|-<br />
| 7 - 8 || probably<br />
|-<br />
| more than 8* ||definitely<br />
|-<br />
|colspan="2" style="text-align: center;" | *''6 for 1st model in monoclinic space groups''<br />
|} <br />
<br />
Ideally, a unique solution with a strong signal will be found at the end of the search. If you are searching for multiple components, then ideally the search for each component will also give a strong signal. However if the signal-to-noise of your search is low, there will be noise peaks and multiple ambiguous solutions. Signal-to-noise is judged using the '''Z-score''', which is computed by comparing the LLG values from the rotation or translation search with LLG values for a set of random rotations or translations. The mean and the RMS deviation from the mean are computed from the random set, then the Z-score for a search peak is defined as its LLG minus the mean, all divided by the RMS deviation, ''i.e. '' '''the number of standard deviations above (or below) the mean. '''<br />
<br />
For a rotation function, the correct orientation may be well down the list with a Z-score (number of standard deviations above the mean value, or RFZ) under 4, and it is often not possible to identify the correct orientation until a translation function is performed and yields a clear solution. Note that the signal-to-noise of the rotation function drops with increasing number of primitive symmetry operations (the number of different orientations for symmetry-related molecules), because there is more uncertainty about how the structure factor contributions from symmetry-related copies will add up.<br />
<br />
For a translation function the correct solution will generally have a Z-score (TFZ) over 5 and be well separated from the rest of the solutions. Of course, there will always be exceptions! The table gives a very rough guide to interpreting TFZ scores. This table will be updated, as we learn more from systematic molecular replacement trials.<br />
<br />
When you are searching for multiple components, the signal may be low for the first few components but, as the model becomes more complete, the signal should become stronger. Finding a clear solution for a new component is a good sign that the partial solution to which that component was added was indeed correct.<br />
<br />
You should always at least glance through the summary of the logfile. One thing to look for, in particular, is whether any translation solutions with a high Z-score have been rejected by the packing step. By default up to 5 percent of marker atoms (C-alpha atoms for protein) are allowed to be involved in clashes. A solution with more clashes may still be correct, and the clashes may arise only because of differences in small surface loops. If this happens, repeat the run allowing a suitable number of clashes. Note that, unless there is specific evidence in the logfile that a high TFZ-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond the default will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
<br />
Note that, by default, Phaser will produce a single PDB file corresponding to the top solution found (if any), so finding a single PDB file in your output directory is not an indication that the search succeeded! You have to look, at least, at the summary of the logfile, or at the list of possible solutions in the .sol file that is produced if you run Phaser from ccp4i or command-line scripts.<br />
<br />
==Annotation==<br />
<br />
A highly compact summary of the history of the statistics of a solution is given in the SOLUTION SET in the .sol file. This is a good place to start your analysis of the output. The annotation gives the Z-score of the solution at each rotation and translation function, the number of clashes in the packing, and the refined LLG.<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! Annotation !! Meaning<br />
|-<br />
| RFZ= || Rotation Function Z-score<br />
|-<br />
| TFZ= || Translation Function Z-score<br />
|-<br />
| PAK= || Number of packing clashes<br />
|-<br />
| LLG= || LLG after refinement. Will be repeated when a low resolution refinement is followed by a high resolution refinement.<br />
|-<br />
| TFZ== || Translation Function Z-score equivalent, only calculated for the top solution after refinement (or for the number of top files specified by TOPFILES)<br />
|-<br />
| RF++ || Rotation angle from previous strong solution has been used in the addition of next solution<br />
|-<br />
| RF*0 || Rotation angle 000 identified by low R-factor of input model<br />
|-<br />
| TFZ=* || First molecule in P1 (arbitrary origin, no Translation Function required)<br />
|-<br />
| TF*0 || Translation vector 000 identified by low R-factor of input model<br />
|-<br />
| (&&nbsp;... & ...) || Set of TFZ PAK and LLG values for placements that were amalgamated (more than one placement from a single Translation Function)<br />
|-<br />
| LLG+=(...&nbsp;&&nbsp;...)&nbsp;|| Set of LLG values calculated during amalgamation, which will always be increasing in value<br />
|-<br />
| +TNCS || Components added by Translational NCS relation<br />
|-<br />
| *T=<i>n</i> || Solution matches template solution <i>n</i><br />
|} <br />
<br />
Two versions of TFZ (the translation function Z-score) now appear for each component. The first ("TFZ=") is the Z-score from the actual translation search, which depends on the accuracy of the orientation used for that search. The second ("TFZ==") is the TFZ-equivalent, which indicates what the TFZ score would have been with the correct (refined) orientation. You should see the TFZ-equivalent is high at least for the final components of the solution, and that the LLG (log-likelihood gain) increases as each component of the solution is added. For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU SET RFZ=10.7 TFZ=24.3 PAK=0 LLG=472 TFZ==24.7 RFZ=6.4 TFZ=24.4 PAK=0 LLG=1006 TFZ==29.7 LLG=1006 TFZ==29.7<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
Note that the Euler angles in Phaser follow the same convention as those defined for the Crowther fast rotation function, i.e. z-y-z (rotate around the z-axis, followed by the new y-axis, followed by the new z-axis).<br />
<br />
==History==<br />
<br />
A highly compact summary of the history of the peak positions of a solution is given in the SOLUTION HISTORY in the .sol file. Together with the SOLUTION SET annotation, this is useful in your analysis of the output. <br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! History !! Meaning<br />
|-<br />
| RF/TF(r/t:n) || (r) Rotation Function peak number/(t) Translation Function peak number for the rotation function : (n) number of peak in final merged and sorted list<br />
|-<br />
| PAK(n:m) || (n) input solution number : (m) output solution number after packing condition applied<br />
|-<br />
| RNP(m,a,b,c,... : p) || All input peaks amalgamated after refinement to give output solution number (m and others): (p) output solution number<br />
|-<br />
| FUSE(A,B,C) || Solution numbers merged in amalgamation<br />
|} <br />
<br />
For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU HISTORY RF/TF(1/1:1)PAK(1:1)RNP(1:1)RNP(1:1)<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
A more complicated structure solution may have<br />
<br />
SOLU HISTORY RF/TF(7/1:10)PAK(10:10)RNP(10,12,13,11,17,16,18,25,3,8,22,21,20,7,969,6,5,201,9,4,390,2,1,19:1)RNP(1:1)<br />
<br />
==What to do in Difficult Cases==<br />
<br />
Not every structure can be solved by molecular replacement, but the right strategy can push the limits. What to do when the default jobs fail depends on why your structure is difficult.<br />
*'''Flexible Structure'''<br />
*:The relative orientations of the domains may be different in your crystal than in the model. If that may be the case, break the model into separate PDB files containing rigid-body units, enter these as separate ensembles, and search for them separately. If you find a convincing solution for one domain, but fail to find a solution for the next domain, you can take advantage of the knowledge that its orientation is likely to be similar to that of the first domain. The ROTAte&nbsp;AROUnd option of the brute rotation search can be used to restrict the search to orientations within, say, 30 degrees of that of the known domain. Allow for close approach of the domains by increasing the allowed clashes with the PACK keyword by, say, 1 for each domain break that you introduce. Note that it is possible to use the brute rotation search as part of the automated molecular replacement pipeline, by changing the choice of the type of rotation search. Alternatively, you could try generating a series of models perturbed by normal modes, with the NMAPdb keyword. One of these may duplicate the hinge motion and provide a good single model.<br />
*'''Poor or Incomplete Model'''<br />
*:Signal-to-noise is reduced by coordinate errors or incompleteness of the model. Since the rotation search has lower signal to begin with than the translation search, it is usually more severely affected. For this reason, it can be very useful to use the subsequent translation search as a way to choose among many (say 1000) orientations. THe MR_AUTO FAST search mode automatically reduces the cutoff for accepting peaks from the fast rotation function if the decault pass does not find a solution with a high z-score, but you can manually reduce this further with the PEAKS and PURGE keywords. You can also try turning off the clustering of fast rotation function peaks because the correct orientation may sit on the shoulder of a peak in the rotation function. <br />
*:As shown convincingly by Schwarzenbacher ''et al.'' (Schwarzenbacher, Godzik, Grzechnik &amp; Jaroszewski, ''Acta Cryst.'' D'''60''', 1229-1236, 2004), judicious editing can make a significant difference in the quality of a distant model. In a number of tests with their data on models below 30% sequence identity, we have found that Phaser works best with a "mixed model" (non-identical sidechains longer than Ser replaced by Ser). In agreement with their results, the best models are generally derived using more sophisticated alignment protocols, such as their FFAS protocol. Use [http://www.phenix-online.org/documentation/sculptor.htm phenix.sculptor] to edit your model.<br />
*'''High Degree of Non-crystallographic Symmetry'''<br />
*:If there are clear peaks in the self-rotation function, you can expect orientations to be related by this known NCS. Methods to automatically use such information will be implemented in a future version of Phaser. In the meantime, you can work out for yourself the orientations that would be consistent with NCS and use the ROTAte&nbsp;AROUnd option to sample similar orientations. Alternatively, you may have an oligomeric model and expect similar NCS in the crystal. First search with the oligomeric model; if this fails, search with a monomer. If that succeeds, you can again use the ROTAte&nbsp;AROUnd option to force a subsequent monomer to adopt an orientation similar to the one you expect.<br />
*'''What <u>not</u> to do'''<br />
*:The automated mode of Phaser is fast when Phaser finds a high Z-score solution to your problem. When Phaser cannot find a solution with a significant Z-score, it "thrashes", meaning it maintains a list of 100-1000's of low Z-score potential solutions and tries to improve them. This can lead to exceptionally long Phaser runs (over a week of CPU time). Such runs are possible because the highly automated script allows many consecutive MR jobs to be run without you having to manually set 100-1000's of jobs running and keep track of the results. "Thrashing" generally does not produce a solution: solutions generally appear relatively quickly or not at all. It is more useful to go back and analyse your models and your data to see where improvements can be made. Your system manager will appreciate you terminating these jobs.<br />
*:It is also not a good idea to effectively remove the packing test. Unless there is specific evidence in the logfile that a high TF-function Z-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond a few (e.g. 1-5) will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
*'''Other suggestions'''<br />
*:Phaser has powerful input, output and scripting facilities that allow a large number of possibilities for altering default behaviour and forcing Phaser to do what you think it should. However, you will need to read the information in the manual below to take advantage of these facilities!<br />
<br />
==How to Define Data==<br />
You need to tell Phaser the name of the mtz file containing your data and the columns in the mtz file to be used using the HKLIn and LABIn keywords. Additional keywords (BINS CELL OUTLier RESOlution SPACegroup) define how the data are used.<br />
<br />
==How to Define Models==<br />
Phaser must be given the models that it will use for molecular replacement. A model in Phaser is referred to as an "ensemble", even when it is described by a single file. This is because it is possible to provide a set of aligned structures as an ensemble, from which a statistically-weighted averaged model is calculated. A molecular replacement model is provided either as one or more aligned pdb files, or as an electron density map, entered as structure factors in an mtz file. Each ensemble is treated as a separate type of rigid body to be placed in the molecular replacement solution. An ensemble should only be defined once, even if there are several copies of the molecule in the asymmetric unit.<br />
<br />
Fundamental to the way in which Phaser uses MR models (either from coordinates or maps) is to estimate how the accuracy of the model falls off as a function of resolution, represented by the Sigma(A) curve. To generate the Sigma(A) curve, Phaser needs to know the RMS coordinate error expected for the model and the fraction of the scattering power in the asymmetric unit that this model contributes.<br />
<br />
A Babinet-style correction is used to account for the effects of disordered solvent on the completeness of the model at low resolution.<br />
<br />
Molecular replacement models are defined with the ENSEmble keyword and the COMPosition keyword. The ENSEmble keyword gives (amongst other things) the RMS deviation for the Sigma(A) curve. The COMPosition keyword is used to deduce the fraction of the scattering power in the asymmetric unit that each ensemble contributes. The composition of the asymmetric unit is defined either by entering the molecular weights or sequences of the components in the asymmetric unit, and giving the number of copies of each. Expert users can also enter the fraction of the scattering of each component directly, although the composition must still be entered for the absolute scale calculation. Please note that the composition supplied to Phaser has to include everything in the asymmetric unit, not just what is being looked for in the current search!<br />
<br />
===Building an Ensemble from Coordinates===<br />
The RMS deviation is determined directly from RMS or indirectly from IDENtity in the ENSEmble<br />
keyword using a formula that depends on the sequence identity and the number of residues in the model.<br />
<br />
The RMS deviation estimated from ID may be an underestimate of the true value if there is a slight conformational change between the model and target structures. To find a solution in these cases it may be necessary to increase the RMS from the default value generated from the ID, by say 0.5 Angstroms. On the other hand, when Phaser succeeds in solving a structure from a model with sequence identity much below 30%, it is often found that the fold is preserved better than the average for that level of sequence identity. So it may be worth submitting a run in which the RMS error is set at, say, 1.5, even if the sequence identity is low. The table below can be used as a guide as to the default RMS value corresponding to ID.<br />
<br />
If you construct a model by homology modelling, remember that the RMS error you expect is essentially the error you expect from the template structure (if not worse!). So specify the sequence identity of the template, not of the homology model.<br />
<br />
Only the model with the highest sequence identity is reported in the output pdb file. Also, HETATM cards in the input pdb file are ignored in the calculation of the structure factors for the ensemble, but are carried through to the output pdb file. Thus, the phases on the output mtz file (which come from the structure factors of the ensemble) do not correspond to those that would be calculated from the output pdb file, when there is more than one pdb file in an ensemble and/or the pdbfile(s) have HETATM records.<br />
<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|+ '''Initial estimate of RMS deviation in Angstrom: Number of residues in model (upper row) versus sequence identity (left column)'''<br />
|-<br />
! !! #50 !! #100 !! #200 !! #300 !! #400 !! #600 !! #850 !! #1000 !! #1500 !! #2000<br />
|-<br />
|'''ID=0%''' || 1.579 || 1.689 || 1.875 || 2.030 || 2.164 || 2.391 || 2.625 || 2.748 || 3.093 || 3.375<br />
|-<br />
|'''ID=10%''' || 1.356 || 1.451 || 1.610 || 1.743 || 1.858 || 2.053 || 2.255 || 2.360 || 2.657 || 2.899<br />
|-<br />
|'''ID=20%''' || 1.165 || 1.246 || 1.383 || 1.497 || 1.596 || 1.764 || 1.936 || 2.027 || 2.281 || 2.489<br />
|-<br />
|'''ID=30%''' || 1.000 || 1.070 || 1.188 || 1.286 || 1.371 || 1.515 || 1.663 || 1.741 || 1.959 || 2.138<br />
|-<br />
|'''ID=40%''' || 0.859 || 0.919 || 1.020 || 1.104 || 1.177 || 1.301 || 1.428 || 1.495 || 1.683 || 1.836<br />
|-<br />
|'''ID=50%''' || 0.738 || 0.789 || 0.876 || 0.948 || 1.011 || 1.117 || 1.227 || 1.284 || 1.445 || 1.577<br />
|-<br />
|'''ID=60%''' || 0.634 || 0.678 || 0.752 || 0.814 || 0.868 || 0.959 || 1.053 || 1.103 || 1.241 || 1.354<br />
|-<br />
|'''ID=70%''' || 0.544 || 0.582 || 0.646 || 0.699 || 0.746 || 0.824 || 0.905 || 0.947 || 1.066 || 1.163<br />
|-<br />
|'''ID=80%''' || 0.467 || 0.500 || 0.555 || 0.601 || 0.640 || 0.708 || 0.777 || 0.813 || 0.915 || 0.999<br />
|-<br />
|'''ID=90%''' || 0.401 || 0.429 || 0.477 || 0.516 || 0.550 || 0.608 || 0.667 || 0.698 || 0.786 || 0.858<br />
|-<br />
|'''ID=100%''' || 0.345 || 0.369 || 0.409 || 0.443 || 0.472 || 0.522 || 0.573 || 0.600 || 0.675 || 0.737<br />
|-<br />
|}<br />
<br />
<br />
====Coordinate Editing====<br />
=====HETATM/LIGANDS=====<br />
Phaser ignores the scattering from HETATM records. The HETATM records are carried though to output with occupancy set to zero. Ligands will therefore not contribute to the scattering used for molecular replacement. The exceptions to this rule are the HETATM records for MSE (seleno-methionine) MSO (seleno-methionine selenoxide) CSE (seleno-cysteine) CSO (seleno-cysteine selenoxide) ALY (acetyllysine) MLY (n-dimethyl-lysine) and MLZ (n-methyl-lysine) which are used in the scattering and carried through to output with their original occupancy. If you wish to include any HETATM records in the scattering the record name use the keyword ENSE modlid HETATOM ON<br />
<br />
=====WATER=====<br />
Water molecules (identified by the residue name OW WAT HOH H2O OH2 MOH WTR or TIP) are deleted from the pdb file on input, are not used in the scattering and are not carried through to file output. If you want to retain water molecules you will need to change the residue name to something other than this (e.g. WWW) so that the atoms are not identified as water. To include the water molecules in the scattering, the HETATM records will also have to be changed to ATOM records as described above.<br />
<br />
===Building an Ensemble from Electron Density===<br />
When using density as a model, it is necessary to specify both the extent (x,y,z limits) of the cut-out region of density, and the centre of this region. With coordinates, Phaser can work this out by itself. This information is needed, for instance, to decide how large rotational steps can be in the rotation search and to carry out the molecular transform interpolation correctly. In the case of electron density, the RMS value does not have the same physical meaning that it has when the model is specified by atomic coordinates, but it is used to judge how the accuracy of the calculated structure factors drops off with resolution. A suitable value for RMS can be obtained, in the case of density from an experimentally-phased map, by choosing a value that makes the SigmaA curve fall off with resolution similarly to the mean figures-of-merit. In the case of density from an EM image reconstruction, the RMS value should make the SigmaA curve fall off similarly to a Fourier correlation curve used to judge the resolution of the EM image.<br />
<br />
For detailed information, including a tutorial with example scripts, see<br />
[[Using Electron Density as a Model| Using density as a model]]<br />
<br />
==How to Define Composition==<br />
The composition defines the total amount of protein and nucleic acid that you have in the asymmetric unit not the fraction of the asymmetric unit that you are searching for.<br />
<br />
===Default Composition===<br />
For convenience, the composition defaults to 50% protein scattering by volume (the average for protein crystals). It is better to enter it explicitly, even if only to check that you have correctly deduced the probable content of your crystal. If your crystal has higher or lower solvent content than this, or contains nucleic acid, then the composition should be entered explicitly.<br />
===Composition by Solvent Content===<br />
Scattering is determined from the solvent content of the crystal, assuming that the crystal contains protein only, and the average distribution of amino acids in protein. If your crystal contains nucleic acid or your protein has an unusual amino acid distribution then the composition should be entered explicitly using the MW or sequence options.<br />
===Composition by Number of Residues in ASU===<br />
Scattering is determined from the number of residues in the asymmetric unit, assuming that the crystal contains protein only or nucleic acid only, and assuming an average distribution of residues for either. If your crystal contains a mixture then the composition should be entered explicitly using the MW or sequence options. If your crystal has an unusual residue distribution then the composition should be entered explicitly using the sequence options.<br />
===Composition by Molecular Weight===<br />
The composition is calculated from the molecular weight of the protein and nucleic acid assuming the protein and nucleic acid have the average distribution of amino acids and bases. If your protein or nucleic acid has an unusual amino acid or base distribution the composition should be entered by sequence. You can mix compositions entered by molecular weight with those entered by sequence.<br />
===Composition by Sequence===<br />
The composition is calculated from the amino acid sequence of the protein and the base sequence of the nucleic acid in fasta format. You can mix compositions entered by molecular weight with those entered by sequence. Individual atoms can be added to the composition with the COMPOSITION ATOM keyword. This allows the explicit addition of heavy atoms in the structure e.g. Fe atoms.<br />
===Composition by Percentage Scattering===<br />
The fraction scattering of each ensemble can be entered directly. The fraction scattering of each ensemble is normally automatically worked out from the average scattering from each ensemble (calculated from the pdb files if entered as coordinates, or from the protein and nucleic acid molecular weights if entered as a map) divided by the total scattering given by the composition, but entering the fraction scattering directly overrides this calculation. This option is for use when the pdb files of the models in the ensemble are unusual e.g. consist only of C-alpha atoms, or only of hydrogen atoms (as in the CLOUDS method for NMR).<br />
<br />
==How to Define Solutions==<br />
Phaser writes out files ending in ".sol" and ".rlist" that contain the solution information from the job. The root of the files is given by the ROOT keyword. By default, the root filename is PHASER. These files can be read back into subsequent runs of Phaser to build up solutions containing more than one molecule in the asymmetric unit.<br />
<br />
"PHASER.sol" files are generated by all modes (rotation function modes with VERBOSE output), and contain the current idea of potential molecular replacement solutions.<br />
<br />
"PHASER.rlist" files are generated by the rotation function modes, and are used as input for performing translation functions.<br />
<br />
For simple MR cases you don't really need to know how to define molecular replacement solutions. However, for difficult cases you might need to edit the files "PHASER.sol" and "PHASER.rlist" files manually<br />
<br />
=== "sol" Files===<br />
SOLUtion 6DIM keywords describe Ensembles that have been oriented by a rotation search and positioned by a translation search. Each Ensemble in the asymmetric unit has its own SOLUtion keyword. When more than one (potential) molecular replacement solution is present, the solutions are separated with the SOLUTION SET keywords.<br />
<br />
==="rlist" Files===<br />
These files define a rotation function list. The peak list is given with a series of SOLUtion TRIAl keywords.<br />
<br />
If a partial solution is already known, then the information for the currently "known" parts of the asymmetric unit is given in the form used for the PHASER.sol file, followed by the list of trial orientations for which a translation function is to be performed.<br />
<br />
===Fixed partial structure===<br />
If you have the coordinates of a partial solution with the pdb coordinates of the known structure in the correct orientation and position, then you can force Phaser to use these coordinates. Use the SOLUTION keyword to fix a rotation of 0 0 0 and a position of 0 0 0 for these coordinates.<br />
<br />
==How to Select Peaks==<br />
<br />
<br />
<br />
The selection of peaks saved for output in the rotation and translation functions can be done in four different ways.<br />
*'''Select by Percentage'''<br />
*: Percentage of the top peak, where the value of the top peak is defined as 100% and the value of the mean is defined as 0%.<br />
*: Default, cutoff=75%. This criteria has the advantange that at least one peak (the top peak) always survives the selection. If the top solution is clear, then only the one solution will be output, but if the distribution of peaks is rather flat, then many peaks will be output for testing in the next part of the MR procedure (e.g. many peaks selected from the rotation function for testing with a translation function). <br />
*'''Select by Z-score'''<br />
*: Number of standard deviations (sigmas) over the mean (the Z-score). <br />
*: Absolute significance test. Not all searches will produce output if the cutoff value is too high (e.g. 5 sigma). <br />
*'''Select by Number'''<br />
*: Number of top peaks to select. <br />
*: If the distribution is very flat then it might be better to select a fixed large number (e.g. 1000) of top rotation peaks for testing in the translation function.<br />
*'''No selection'''<br />
*: All peaks are selected. <br />
*: Enables full 6 dimensional searches, where all the solutions from the rotation function are output for testing in the translation function. This should never be necessary; it would be much faster and probably just as likely to work if the top 1000 peaks were used in this way.<br />
<br />
[[Image:Phaser_selection.gif| Selection criteria]]<br />
<br />
Peaks can also be clustered or not clustered prior to selection in steps 1 and 2.<br />
*'''Clustering Off'''<br />
: All high peaks on the search grid are selected<br />
*'''Clustering On'''<br />
: Points on the search grid with higher neighbouring points are removed from the selection<br />
<br />
<br />
[[Image:Phaser_clustering.gif| Clustering]]<br />
<br />
==How to Control Output==<br />
The output of Phaser can be controlled with optional keywords. <br />
<br />
The ROOT keyword is not compulsory (the default root filename is "PHASER"), but should always be given, so that your jobs have separate and meaningful output filenames.<br />
<br />
The TOPFiles keyword controls the number of potential MR solutions for which PDB and (in the appropriate modes) MTZ files are produced.<br />
<br />
For the MR_AUTO, MR_RNP and MR_LLG modes, unless HKLOut OFF is given as an optional keyword, Phaser produces an MTZ file with "SigmaA" type weighted Fourier map coefficients for producing electron density maps for rebuilding.<br />
<br />
{| class="wikitable" style="text-align:left" width=100%<br />
|-<br />
! MTZ Column Labels !! Description<br />
|-<br />
| FWT/PHWT || Amplitude and phase for 2''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| DELFWT/PHDELWT || Amplitude and phase for ''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| FOM || ''m'', analogous to the "Sim" weight, to estimate the reliability of &alpha;<sub>calc</sub><br />
|-<br />
| HLA/HLB/HLC/HLD || Hendrickson-Lattman coefficients encoding the phase probability distribution<br />
|}<br />
<br />
==Translational Non-crystallographic Symmetry==<br />
<br />
<span style="color:crimson">'''*Warning*''' Solution by MR in the presence of translational non-crystallographic symmetry is not fully automated.</span><br />
<br />
Phaser calculates correction factors for the expected intensities in the presence of translational non-crystallographic symmetry (tNCS), and is able to solve structures with complex patterns of tNCS. '''However, the use of Phaser in the presence of tNCS requires the nature of the tNCS to be understood by the user.''' In simple cases, solution is no more difficult than solution without tNCS, but in complex cases, separate Phaser runs with tNCS turned on and off, and/or the use of different tNCS vectors, may be necessary.<br />
<br />
The output of Phaser will help the user in detecting and understanding the tNCS, but '''the tNCS is not completely characterised by Phaser'''. The default behaviour may or may not be correct for the particular crystal under study.<br />
<br />
Characterization of the tNCS involves understanding the number of copies of the molecule in the asymmetric unit and the translation vectors between them. Molecules related by a tNCS vector will have an associated peak in the native Patterson. Phaser calculates the native Patterson (MODE TNCS) and lists the peaks that are more than 20% of the origin peak. Any given crystal with tNCS may have one or more peaks meeting this criteria.<br />
<br />
===Default tNCS detection and correction===<br />
<span style="color:crimson">Documentation for Phaser-2.7.16 and above</span><br />
<br />
====No tNCS====<br />
No tNCS correction is applied by default if there is<br />
# no peak in the native Patterson <br />
# more than one peak in the native Patterson over 20% of the origin and these peaks are not all the result of a commensurate modulation<br />
<br />
====Pairs of molecules====<br />
By default, if Phaser detects a peak in the native Patterson then Phaser will search for molecules in pairs related by the tNCS vector given by the peak in the native Patterson.<br />
<br />
This will be the correct behaviour if and only if there are an even number of copies of the molecule in the asymmetric unit, clustered into two groups related by a single tNCS vector. There will only be one significant peak in the native Patterson. Fortunately, this is a reasonably common scenario.<br />
<br />
Phaser refines the relative orientation of the molecules in the two groups (rotations of up to 10 degrees will still give rise to a significant native Patterson peak) and uses this information to generate expected intensity factors for the reflections. Solution should be straightforward, with the usual caveat for MR that there is a sufficiently good model.<br />
<br />
Where there is a single peak in the native Patterson, it is often located at a position half way along a unit cell axis or diagonal, representing a pseudo-halving of the unit cell dimensions. However, Phaser is by no means restricted to these sorts of pseudo-cells in its handling of two-fold tNCS, and the tNCS vector can be in a general position.<br />
<br />
===Non-default tNCS correction===<br />
====Higher order tNCS====<br />
Frequently, tNCS does not associate 2 clusters of molecules in the asymmetric unit, but rather there are 3 or more (n) clusters of molecules associated by a series of vectors that are multiples of 1, 2, 3 ... (n-1) times a basic translation vector. Where n times the basic translation vector equates to (very close to) integer multiples of unit cell axes, the tNCS represents a pseudo-cell, and this case is known as commensurate modulation. <br />
<br />
Phaser attempts to automatically detect commensurate modulation. The peaks of the native Patterson are analyzed to find the n-fold relationship. The series will not generally have all peaks the same height. Lower peaks in the series represent relationships where the relative rotations between related molecules are larger. Missing peaks in the series may be below the default 20% of origin cut-off. This can be lowered with TNCS PATT PERCENT <x><br />
<br />
Phaser then sets TNCS NMOL <n> and the vector for the tNCS, and searches for ensembles in multiples of NMOL.<br />
<br />
When there are more than two molecules related by tNCS, Phaser does not refine the orientations between the molecules related by the tNCS.<br />
<br />
However, as for two-fold tNCS, Phaser is not restricted to these sorts of pseudo-cells and the basic tNCS vector can be in a general position, as can the number of copies.<br />
<br />
'''The automatic detection may not give the true tNCS relationship'''. For example, the true commensurate modulation may be a factor of the NMOL automatically detected by Phaser, or there may not be commensurate modulation at all, or commensurate modulation may not be found with the default Pattesron peak height cutoff. In difficult cases, please inspect the Patterson for peaks.<br />
<br />
====Complex tNCS====<br />
If there are many molecules in the asymmetric unit but they are not all related by tNCS, or there are sub-groups of molecules related by different tNCS vectors, then the modulations of the expected intensities due to the tNCS will be much less significant than the cases described above. '''In these cases it is possible that structure solution will be achieved without any tNCS correction factors being applied.''' Indeed, searching for all the copies as tNCS-related multiples when some molecules are not related by tNCS will cause structure solution to fail. To turn off the automatic detection and use of tNCS use the keyword TNCS USE OFF.<br />
<br />
If turning off the TNCS correction factors fails to give a solution, then a good approach is to proceed step-wise. Consider the highest native Patterson peak first and determine that nature of the tNCS associated with it. Use the appropriate correction factors to locate all the molecules with this tNCS. Then take the second independent native Patterson peak and apply the correction factors associated with it to find the second set of molecules, fixing the first, etc. Finally, turn TNCS off to find any orphan molecules.</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Molecular_Replacement&diff=2433Molecular Replacement2018-02-08T16:09:27Z<p>Rdo20: /* Building an Ensemble from Coordinates */</p>
<hr />
<div><div style="margin-left: 25px; float: right;">__TOC__</div><br />
<br />
'''Quicklink to example scripts''' -> [[MR using keyword input]]<br />
<br />
'''Quicklink to phaser.famos (find_alt_orig_sym_mate) documentation''' -> [[Famos]]<br />
<br />
Phaser should be able to solve most structures with the Automated Molecular Replacement mode, and this is the first mode that you should try. Give Phaser your data ([[#How to Define Data|How to Define Data]]) and your models ([[#How to Define Models|How to Define Models]]), tell Phaser what to search for, and a list of possible spacegroups (in the same point group).<br />
<br />
If this doesn't work (see [[#Has Phaser Solved It?| Has Phaser Solved It?]]), you can try selecting peaks of lower significance in the rotation function in case the real orientation was not within the selection criteria. By default peaks above 75% of the top peak are selected (see [[#How to Select Peaks| How to Select Peaks]]). See [[#What to do in Difficult Cases| What to do in Difficult Cases]] for more hints and tips. If the automated molecular replacement mode doesn't work even with non-default input you need to run the modes of Phaser separately. The possibilities are endless - you can even try exhaustive searches (translations of all orientations) if you want - but experience has shown that most structures that can be solved by Phaser can be solved by relatively simple strategies.<br />
<br />
==Automated Molecular Replacement==<br />
Automated Molecular Replacement combines the anisotropy correction, likelihood enhanced fast rotation function, likelihood enhanced fast translation function, packing and refinement modes for multiple search models and a set of possible spacegroups to automatically solve a structure by molecular replacement. Top solutions are output to the files FILEROOT.sol, FILEROOT.#.mtz and FILEROOT.#.pdb (where "#" refers to the sorted solution number, 1 being the best, and only 1 is output by default). Many structures can be solved by running an automated molecular replacement search with defaults, giving the ensembles that you expect to be easiest to find first.<br />
<br />
At the completion of Molecular Replacement you may wish to place your solutions on a common origin with a previous solution, for which [[Famos | Famos ]] can be used.<br />
<br />
[[Image:Phaser_MR_auto.gif|Flow Diagram for Automated MR]]<br />
<br />
==Should Phaser Solve It?==<br />
The difficulty of a molecular replacement problem depends primarily on two major factors: how well the model will be able to explain the diffraction data (which depends both on the accuracy of the model and on its completeness), and how many reflections can be explained, at least in part. Each reflection provides a piece of information that helps to identify correct MR solutions.<br />
<br />
It is possible to make a reasonable prediction of whether or not a solution will be found. If the quality of the model (its accuracy and completeness) can be estimated, then the expected contribution of each reflection to the total LLG can also be estimated. From a large battery of tests, we know that an LLG of 40 or greater usually indicates a correct solution (at least in the absence of complicating factors such as translational non-crystallographic symmetry, tNCS). Building on this understanding, if it is estimated that the LLG will be 60 or less, then Phaser will assume that the problem is a difficult one, and will implement search procedures optimised for difficult problems.<br />
<br />
==What Resolution of Data Should be Used?==<br />
The signal for a molecular replacement solution should be very clear if the expected value of the LLG is much higher than the minimum required to be fairly certain of a solution. Currently Phaser aims for a minimum LLG of 120 and, if it is possible to achieve an even higher value, given the quality of the model and the quantity of diffraction data, then the resolution for the initial search is limited to the value required to achieve an expected LLG of 120. Data to the full resolution are still used for a final rigid-body refinement, or in a second pass if a clear solution is not found in the first attempt.<br />
<br />
However, if the model is expected to have a large RMS error (based usually on the correlation between sequence identity and RMS error), then data to high resolution will not contribute any significant signal. Regardless of the expected LLG at the highest resolution limit, the resolution used is limited to 1.8 times the estimated RMS error of the model, because this resolution limit gives about 99% of the LLG that could be achieved.<br />
<br />
Because Phaser implements strategies designed to solve structures with as much confidence as possible, as efficiently as possible, it is best to leave the choice of resolution to Phaser, at least in the first instance.<br />
<br />
==Has Phaser Solved It?==<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! TF Z-score !! Have I solved it?<br />
|-<br />
| less than 5 || no<br />
|-<br />
| 5 - 6 || unlikely<br />
|-<br />
| 6 - 7 || possibly<br />
|-<br />
| 7 - 8 || probably<br />
|-<br />
| more than 8* ||definitely<br />
|-<br />
|colspan="2" style="text-align: center;" | *''6 for 1st model in monoclinic space groups''<br />
|} <br />
<br />
Ideally, a unique solution with a strong signal will be found at the end of the search. If you are searching for multiple components, then ideally the search for each component will also give a strong signal. However if the signal-to-noise of your search is low, there will be noise peaks and multiple ambiguous solutions. Signal-to-noise is judged using the '''Z-score''', which is computed by comparing the LLG values from the rotation or translation search with LLG values for a set of random rotations or translations. The mean and the RMS deviation from the mean are computed from the random set, then the Z-score for a search peak is defined as its LLG minus the mean, all divided by the RMS deviation, ''i.e. '' '''the number of standard deviations above (or below) the mean. '''<br />
<br />
For a rotation function, the correct orientation may be well down the list with a Z-score (number of standard deviations above the mean value, or RFZ) under 4, and it is often not possible to identify the correct orientation until a translation function is performed and yields a clear solution. Note that the signal-to-noise of the rotation function drops with increasing number of primitive symmetry operations (the number of different orientations for symmetry-related molecules), because there is more uncertainty about how the structure factor contributions from symmetry-related copies will add up.<br />
<br />
For a translation function the correct solution will generally have a Z-score (TFZ) over 5 and be well separated from the rest of the solutions. Of course, there will always be exceptions! The table gives a very rough guide to interpreting TFZ scores. This table will be updated, as we learn more from systematic molecular replacement trials.<br />
<br />
When you are searching for multiple components, the signal may be low for the first few components but, as the model becomes more complete, the signal should become stronger. Finding a clear solution for a new component is a good sign that the partial solution to which that component was added was indeed correct.<br />
<br />
You should always at least glance through the summary of the logfile. One thing to look for, in particular, is whether any translation solutions with a high Z-score have been rejected by the packing step. By default up to 5 percent of marker atoms (C-alpha atoms for protein) are allowed to be involved in clashes. A solution with more clashes may still be correct, and the clashes may arise only because of differences in small surface loops. If this happens, repeat the run allowing a suitable number of clashes. Note that, unless there is specific evidence in the logfile that a high TFZ-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond the default will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
<br />
Note that, by default, Phaser will produce a single PDB file corresponding to the top solution found (if any), so finding a single PDB file in your output directory is not an indication that the search succeeded! You have to look, at least, at the summary of the logfile, or at the list of possible solutions in the .sol file that is produced if you run Phaser from ccp4i or command-line scripts.<br />
<br />
==Annotation==<br />
<br />
A highly compact summary of the history of the statistics of a solution is given in the SOLUTION SET in the .sol file. This is a good place to start your analysis of the output. The annotation gives the Z-score of the solution at each rotation and translation function, the number of clashes in the packing, and the refined LLG.<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! Annotation !! Meaning<br />
|-<br />
| RFZ= || Rotation Function Z-score<br />
|-<br />
| TFZ= || Translation Function Z-score<br />
|-<br />
| PAK= || Number of packing clashes<br />
|-<br />
| LLG= || LLG after refinement. Will be repeated when a low resolution refinement is followed by a high resolution refinement.<br />
|-<br />
| TFZ== || Translation Function Z-score equivalent, only calculated for the top solution after refinement (or for the number of top files specified by TOPFILES)<br />
|-<br />
| RF++ || Rotation angle from previous strong solution has been used in the addition of next solution<br />
|-<br />
| RF*0 || Rotation angle 000 identified by low R-factor of input model<br />
|-<br />
| TFZ=* || First molecule in P1 (arbitrary origin, no Translation Function required)<br />
|-<br />
| TF*0 || Translation vector 000 identified by low R-factor of input model<br />
|-<br />
| (&&nbsp;... & ...) || Set of TFZ PAK and LLG values for placements that were amalgamated (more than one placement from a single Translation Function)<br />
|-<br />
| LLG+=(...&nbsp;&&nbsp;...)&nbsp;|| Set of LLG values calculated during amalgamation, which will always be increasing in value<br />
|-<br />
| +TNCS || Components added by Translational NCS relation<br />
|-<br />
| *T=<i>n</i> || Solution matches template solution <i>n</i><br />
|} <br />
<br />
Two versions of TFZ (the translation function Z-score) now appear for each component. The first ("TFZ=") is the Z-score from the actual translation search, which depends on the accuracy of the orientation used for that search. The second ("TFZ==") is the TFZ-equivalent, which indicates what the TFZ score would have been with the correct (refined) orientation. You should see the TFZ-equivalent is high at least for the final components of the solution, and that the LLG (log-likelihood gain) increases as each component of the solution is added. For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU SET RFZ=10.7 TFZ=24.3 PAK=0 LLG=472 TFZ==24.7 RFZ=6.4 TFZ=24.4 PAK=0 LLG=1006 TFZ==29.7 LLG=1006 TFZ==29.7<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
Note that the Euler angles in Phaser follow the same convention as those defined for the Crowther fast rotation function, i.e. z-y-z (rotate around the z-axis, followed by the new y-axis, followed by the new z-axis).<br />
<br />
==History==<br />
<br />
A highly compact summary of the history of the peak positions of a solution is given in the SOLUTION HISTORY in the .sol file. Together with the SOLUTION SET annotation, this is useful in your analysis of the output. <br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! History !! Meaning<br />
|-<br />
| RF/TF(r/t:n) || (r) Rotation Function peak number/(t) Translation Function peak number for the rotation function : (n) number of peak in final merged and sorted list<br />
|-<br />
| PAK(n:m) || (n) input solution number : (m) output solution number after packing condition applied<br />
|-<br />
| RNP(m,a,b,c,... : p) || All input peaks amalgamated after refinement to give output solution number (m and others): (p) output solution number<br />
|-<br />
| FUSE(A,B,C) || Solution numbers merged in amalgamation<br />
|} <br />
<br />
For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU HISTORY RF/TF(1/1:1)PAK(1:1)RNP(1:1)RNP(1:1)<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
A more complicated structure solution may have<br />
<br />
SOLU HISTORY RF/TF(7/1:10)PAK(10:10)RNP(10,12,13,11,17,16,18,25,3,8,22,21,20,7,969,6,5,201,9,4,390,2,1,19:1)RNP(1:1)<br />
<br />
==What to do in Difficult Cases==<br />
<br />
Not every structure can be solved by molecular replacement, but the right strategy can push the limits. What to do when the default jobs fail depends on why your structure is difficult.<br />
*'''Flexible Structure'''<br />
*:The relative orientations of the domains may be different in your crystal than in the model. If that may be the case, break the model into separate PDB files containing rigid-body units, enter these as separate ensembles, and search for them separately. If you find a convincing solution for one domain, but fail to find a solution for the next domain, you can take advantage of the knowledge that its orientation is likely to be similar to that of the first domain. The ROTAte&nbsp;AROUnd option of the brute rotation search can be used to restrict the search to orientations within, say, 30 degrees of that of the known domain. Allow for close approach of the domains by increasing the allowed clashes with the PACK keyword by, say, 1 for each domain break that you introduce. Note that it is possible to use the brute rotation search as part of the automated molecular replacement pipeline, by changing the choice of the type of rotation search. Alternatively, you could try generating a series of models perturbed by normal modes, with the NMAPdb keyword. One of these may duplicate the hinge motion and provide a good single model.<br />
*'''Poor or Incomplete Model'''<br />
*:Signal-to-noise is reduced by coordinate errors or incompleteness of the model. Since the rotation search has lower signal to begin with than the translation search, it is usually more severely affected. For this reason, it can be very useful to use the subsequent translation search as a way to choose among many (say 1000) orientations. THe MR_AUTO FAST search mode automatically reduces the cutoff for accepting peaks from the fast rotation function if the decault pass does not find a solution with a high z-score, but you can manually reduce this further with the PEAKS and PURGE keywords. You can also try turning off the clustering of fast rotation function peaks because the correct orientation may sit on the shoulder of a peak in the rotation function. <br />
*:As shown convincingly by Schwarzenbacher ''et al.'' (Schwarzenbacher, Godzik, Grzechnik &amp; Jaroszewski, ''Acta Cryst.'' D'''60''', 1229-1236, 2004), judicious editing can make a significant difference in the quality of a distant model. In a number of tests with their data on models below 30% sequence identity, we have found that Phaser works best with a "mixed model" (non-identical sidechains longer than Ser replaced by Ser). In agreement with their results, the best models are generally derived using more sophisticated alignment protocols, such as their FFAS protocol. Use [http://www.phenix-online.org/documentation/sculptor.htm phenix.sculptor] to edit your model.<br />
*'''High Degree of Non-crystallographic Symmetry'''<br />
*:If there are clear peaks in the self-rotation function, you can expect orientations to be related by this known NCS. Methods to automatically use such information will be implemented in a future version of Phaser. In the meantime, you can work out for yourself the orientations that would be consistent with NCS and use the ROTAte&nbsp;AROUnd option to sample similar orientations. Alternatively, you may have an oligomeric model and expect similar NCS in the crystal. First search with the oligomeric model; if this fails, search with a monomer. If that succeeds, you can again use the ROTAte&nbsp;AROUnd option to force a subsequent monomer to adopt an orientation similar to the one you expect.<br />
*'''What <u>not</u> to do'''<br />
*:The automated mode of Phaser is fast when Phaser finds a high Z-score solution to your problem. When Phaser cannot find a solution with a significant Z-score, it "thrashes", meaning it maintains a list of 100-1000's of low Z-score potential solutions and tries to improve them. This can lead to exceptionally long Phaser runs (over a week of CPU time). Such runs are possible because the highly automated script allows many consecutive MR jobs to be run without you having to manually set 100-1000's of jobs running and keep track of the results. "Thrashing" generally does not produce a solution: solutions generally appear relatively quickly or not at all. It is more useful to go back and analyse your models and your data to see where improvements can be made. Your system manager will appreciate you terminating these jobs.<br />
*:It is also not a good idea to effectively remove the packing test. Unless there is specific evidence in the logfile that a high TF-function Z-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond a few (e.g. 1-5) will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
*'''Other suggestions'''<br />
*:Phaser has powerful input, output and scripting facilities that allow a large number of possibilities for altering default behaviour and forcing Phaser to do what you think it should. However, you will need to read the information in the manual below to take advantage of these facilities!<br />
<br />
==How to Define Data==<br />
You need to tell Phaser the name of the mtz file containing your data and the columns in the mtz file to be used using the HKLIn and LABIn keywords. Additional keywords (BINS CELL OUTLier RESOlution SPACegroup) define how the data are used.<br />
<br />
==How to Define Models==<br />
Phaser must be given the models that it will use for molecular replacement. A model in Phaser is referred to as an "ensemble", even when it is described by a single file. This is because it is possible to provide a set of aligned structures as an ensemble, from which a statistically-weighted averaged model is calculated. A molecular replacement model is provided either as one or more aligned pdb files, or as an electron density map, entered as structure factors in an mtz file. Each ensemble is treated as a separate type of rigid body to be placed in the molecular replacement solution. An ensemble should only be defined once, even if there are several copies of the molecule in the asymmetric unit.<br />
<br />
Fundamental to the way in which Phaser uses MR models (either from coordinates or maps) is to estimate how the accuracy of the model falls off as a function of resolution, represented by the Sigma(A) curve. To generate the Sigma(A) curve, Phaser needs to know the RMS coordinate error expected for the model and the fraction of the scattering power in the asymmetric unit that this model contributes.<br />
<br />
A Babinet-style correction is used to account for the effects of disordered solvent on the completeness of the model at low resolution.<br />
<br />
Molecular replacement models are defined with the ENSEmble keyword and the COMPosition keyword. The ENSEmble keyword gives (amongst other things) the RMS deviation for the Sigma(A) curve. The COMPosition keyword is used to deduce the fraction of the scattering power in the asymmetric unit that each ensemble contributes. The composition of the asymmetric unit is defined either by entering the molecular weights or sequences of the components in the asymmetric unit, and giving the number of copies of each. Expert users can also enter the fraction of the scattering of each component directly, although the composition must still be entered for the absolute scale calculation. Please note that the composition supplied to Phaser has to include everything in the asymmetric unit, not just what is being looked for in the current search!<br />
<br />
===Building an Ensemble from Coordinates===<br />
The RMS deviation is determined directly from RMS or indirectly from IDENtity in the ENSEmble<br />
keyword using a formula that depends on the sequence identity and the number of residues in the model.<br />
<br />
The RMS deviation estimated from ID may be an underestimate of the true value if there is a slight conformational change between the model and target structures. To find a solution in these cases it may be necessary to increase the RMS from the default value generated from the ID, by say 0.5 Ångstroms. On the other hand, when Phaser succeeds in solving a structure from a model with sequence identity much below 30%, it is often found that the fold is preserved better than the average for that level of sequence identity. So it may be worth submitting a run in which the RMS error is set at, say, 1.5, even if the sequence identity is low. The table below can be used as a guide as to the default RMS value corresponding to ID.<br />
<br />
If you construct a model by homology modelling, remember that the RMS error you expect is essentially the error you expect from the template structure (if not worse!). So specify the sequence identity of the template, not of the homology model.<br />
<br />
Only the model with the highest sequence identity is reported in the output pdb file. Also, HETATM cards in the input pdb file are ignored in the calculation of the structure factors for the ensemble, but are carried through to the output pdb file. Thus, the phases on the output mtz file (which come from the structure factors of the ensemble) do not correspond to those that would be calculated from the output pdb file, when there is more than one pdb file in an ensemble and/or the pdbfile(s) have HETATM records.<br />
<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|+ '''Initial estimate of RMS deviation in Angstrom: Number of residues in model (upper row) versus sequence identity (left column)'''<br />
|-<br />
! !! #50 !! #100 !! #200 !! #300 !! #400 !! #600 !! #850 !! #1000 !! #1500 !! #2000<br />
|-<br />
|'''ID=0%''' || 1.579 || 1.689 || 1.875 || 2.030 || 2.164 || 2.391 || 2.625 || 2.748 || 3.093 || 3.375<br />
|-<br />
|'''ID=10%''' || 1.356 || 1.451 || 1.610 || 1.743 || 1.858 || 2.053 || 2.255 || 2.360 || 2.657 || 2.899<br />
|-<br />
|'''ID=20%''' || 1.165 || 1.246 || 1.383 || 1.497 || 1.596 || 1.764 || 1.936 || 2.027 || 2.281 || 2.489<br />
|-<br />
|'''ID=30%''' || 1.000 || 1.070 || 1.188 || 1.286 || 1.371 || 1.515 || 1.663 || 1.741 || 1.959 || 2.138<br />
|-<br />
|'''ID=40%''' || 0.859 || 0.919 || 1.020 || 1.104 || 1.177 || 1.301 || 1.428 || 1.495 || 1.683 || 1.836<br />
|-<br />
|'''ID=50%''' || 0.738 || 0.789 || 0.876 || 0.948 || 1.011 || 1.117 || 1.227 || 1.284 || 1.445 || 1.577<br />
|-<br />
|'''ID=60%''' || 0.634 || 0.678 || 0.752 || 0.814 || 0.868 || 0.959 || 1.053 || 1.103 || 1.241 || 1.354<br />
|-<br />
|'''ID=70%''' || 0.544 || 0.582 || 0.646 || 0.699 || 0.746 || 0.824 || 0.905 || 0.947 || 1.066 || 1.163<br />
|-<br />
|'''ID=80%''' || 0.467 || 0.500 || 0.555 || 0.601 || 0.640 || 0.708 || 0.777 || 0.813 || 0.915 || 0.999<br />
|-<br />
|'''ID=90%''' || 0.401 || 0.429 || 0.477 || 0.516 || 0.550 || 0.608 || 0.667 || 0.698 || 0.786 || 0.858<br />
|-<br />
|'''ID=100%''' || 0.345 || 0.369 || 0.409 || 0.443 || 0.472 || 0.522 || 0.573 || 0.600 || 0.675 || 0.737<br />
|-<br />
|}<br />
<br />
<br />
====Coordinate Editing====<br />
=====HETATM/LIGANDS=====<br />
Phaser ignores the scattering from HETATM records. The HETATM records are carried though to output with occupancy set to zero. Ligands will therefore not contribute to the scattering used for molecular replacement. The exceptions to this rule are the HETATM records for MSE (seleno-methionine) MSO (seleno-methionine selenoxide) CSE (seleno-cysteine) CSO (seleno-cysteine selenoxide) ALY (acetyllysine) MLY (n-dimethyl-lysine) and MLZ (n-methyl-lysine) which are used in the scattering and carried through to output with their original occupancy. If you wish to include any HETATM records in the scattering the record name use the keyword ENSE modlid HETATOM ON<br />
<br />
=====WATER=====<br />
Water molecules (identified by the residue name OW WAT HOH H2O OH2 MOH WTR or TIP) are deleted from the pdb file on input, are not used in the scattering and are not carried through to file output. If you want to retain water molecules you will need to change the residue name to something other than this (e.g. WWW) so that the atoms are not identified as water. To include the water molecules in the scattering, the HETATM records will also have to be changed to ATOM records as described above.<br />
<br />
===Building an Ensemble from Electron Density===<br />
When using density as a model, it is necessary to specify both the extent (x,y,z limits) of the cut-out region of density, and the centre of this region. With coordinates, Phaser can work this out by itself. This information is needed, for instance, to decide how large rotational steps can be in the rotation search and to carry out the molecular transform interpolation correctly. In the case of electron density, the RMS value does not have the same physical meaning that it has when the model is specified by atomic coordinates, but it is used to judge how the accuracy of the calculated structure factors drops off with resolution. A suitable value for RMS can be obtained, in the case of density from an experimentally-phased map, by choosing a value that makes the SigmaA curve fall off with resolution similarly to the mean figures-of-merit. In the case of density from an EM image reconstruction, the RMS value should make the SigmaA curve fall off similarly to a Fourier correlation curve used to judge the resolution of the EM image.<br />
<br />
For detailed information, including a tutorial with example scripts, see<br />
[[Using Electron Density as a Model| Using density as a model]]<br />
<br />
==How to Define Composition==<br />
The composition defines the total amount of protein and nucleic acid that you have in the asymmetric unit not the fraction of the asymmetric unit that you are searching for.<br />
<br />
===Default Composition===<br />
For convenience, the composition defaults to 50% protein scattering by volume (the average for protein crystals). It is better to enter it explicitly, even if only to check that you have correctly deduced the probable content of your crystal. If your crystal has higher or lower solvent content than this, or contains nucleic acid, then the composition should be entered explicitly.<br />
===Composition by Solvent Content===<br />
Scattering is determined from the solvent content of the crystal, assuming that the crystal contains protein only, and the average distribution of amino acids in protein. If your crystal contains nucleic acid or your protein has an unusual amino acid distribution then the composition should be entered explicitly using the MW or sequence options.<br />
===Composition by Number of Residues in ASU===<br />
Scattering is determined from the number of residues in the asymmetric unit, assuming that the crystal contains protein only or nucleic acid only, and assuming an average distribution of residues for either. If your crystal contains a mixture then the composition should be entered explicitly using the MW or sequence options. If your crystal has an unusual residue distribution then the composition should be entered explicitly using the sequence options.<br />
===Composition by Molecular Weight===<br />
The composition is calculated from the molecular weight of the protein and nucleic acid assuming the protein and nucleic acid have the average distribution of amino acids and bases. If your protein or nucleic acid has an unusual amino acid or base distribution the composition should be entered by sequence. You can mix compositions entered by molecular weight with those entered by sequence.<br />
===Composition by Sequence===<br />
The composition is calculated from the amino acid sequence of the protein and the base sequence of the nucleic acid in fasta format. You can mix compositions entered by molecular weight with those entered by sequence. Individual atoms can be added to the composition with the COMPOSITION ATOM keyword. This allows the explicit addition of heavy atoms in the structure e.g. Fe atoms.<br />
===Composition by Percentage Scattering===<br />
The fraction scattering of each ensemble can be entered directly. The fraction scattering of each ensemble is normally automatically worked out from the average scattering from each ensemble (calculated from the pdb files if entered as coordinates, or from the protein and nucleic acid molecular weights if entered as a map) divided by the total scattering given by the composition, but entering the fraction scattering directly overrides this calculation. This option is for use when the pdb files of the models in the ensemble are unusual e.g. consist only of C-alpha atoms, or only of hydrogen atoms (as in the CLOUDS method for NMR).<br />
<br />
==How to Define Solutions==<br />
Phaser writes out files ending in ".sol" and ".rlist" that contain the solution information from the job. The root of the files is given by the ROOT keyword. By default, the root filename is PHASER. These files can be read back into subsequent runs of Phaser to build up solutions containing more than one molecule in the asymmetric unit.<br />
<br />
"PHASER.sol" files are generated by all modes (rotation function modes with VERBOSE output), and contain the current idea of potential molecular replacement solutions.<br />
<br />
"PHASER.rlist" files are generated by the rotation function modes, and are used as input for performing translation functions.<br />
<br />
For simple MR cases you don't really need to know how to define molecular replacement solutions. However, for difficult cases you might need to edit the files "PHASER.sol" and "PHASER.rlist" files manually<br />
<br />
=== "sol" Files===<br />
SOLUtion 6DIM keywords describe Ensembles that have been oriented by a rotation search and positioned by a translation search. Each Ensemble in the asymmetric unit has its own SOLUtion keyword. When more than one (potential) molecular replacement solution is present, the solutions are separated with the SOLUTION SET keywords.<br />
<br />
==="rlist" Files===<br />
These files define a rotation function list. The peak list is given with a series of SOLUtion TRIAl keywords.<br />
<br />
If a partial solution is already known, then the information for the currently "known" parts of the asymmetric unit is given in the form used for the PHASER.sol file, followed by the list of trial orientations for which a translation function is to be performed.<br />
<br />
===Fixed partial structure===<br />
If you have the coordinates of a partial solution with the pdb coordinates of the known structure in the correct orientation and position, then you can force Phaser to use these coordinates. Use the SOLUTION keyword to fix a rotation of 0 0 0 and a position of 0 0 0 for these coordinates.<br />
<br />
==How to Select Peaks==<br />
<br />
<br />
<br />
The selection of peaks saved for output in the rotation and translation functions can be done in four different ways.<br />
*'''Select by Percentage'''<br />
*: Percentage of the top peak, where the value of the top peak is defined as 100% and the value of the mean is defined as 0%.<br />
*: Default, cutoff=75%. This criteria has the advantange that at least one peak (the top peak) always survives the selection. If the top solution is clear, then only the one solution will be output, but if the distribution of peaks is rather flat, then many peaks will be output for testing in the next part of the MR procedure (e.g. many peaks selected from the rotation function for testing with a translation function). <br />
*'''Select by Z-score'''<br />
*: Number of standard deviations (sigmas) over the mean (the Z-score). <br />
*: Absolute significance test. Not all searches will produce output if the cutoff value is too high (e.g. 5 sigma). <br />
*'''Select by Number'''<br />
*: Number of top peaks to select. <br />
*: If the distribution is very flat then it might be better to select a fixed large number (e.g. 1000) of top rotation peaks for testing in the translation function.<br />
*'''No selection'''<br />
*: All peaks are selected. <br />
*: Enables full 6 dimensional searches, where all the solutions from the rotation function are output for testing in the translation function. This should never be necessary; it would be much faster and probably just as likely to work if the top 1000 peaks were used in this way.<br />
<br />
[[Image:Phaser_selection.gif| Selection criteria]]<br />
<br />
Peaks can also be clustered or not clustered prior to selection in steps 1 and 2.<br />
*'''Clustering Off'''<br />
: All high peaks on the search grid are selected<br />
*'''Clustering On'''<br />
: Points on the search grid with higher neighbouring points are removed from the selection<br />
<br />
<br />
[[Image:Phaser_clustering.gif| Clustering]]<br />
<br />
==How to Control Output==<br />
The output of Phaser can be controlled with optional keywords. <br />
<br />
The ROOT keyword is not compulsory (the default root filename is "PHASER"), but should always be given, so that your jobs have separate and meaningful output filenames.<br />
<br />
The TOPFiles keyword controls the number of potential MR solutions for which PDB and (in the appropriate modes) MTZ files are produced.<br />
<br />
For the MR_AUTO, MR_RNP and MR_LLG modes, unless HKLOut OFF is given as an optional keyword, Phaser produces an MTZ file with "SigmaA" type weighted Fourier map coefficients for producing electron density maps for rebuilding.<br />
<br />
{| class="wikitable" style="text-align:left" width=100%<br />
|-<br />
! MTZ Column Labels !! Description<br />
|-<br />
| FWT/PHWT || Amplitude and phase for 2''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| DELFWT/PHDELWT || Amplitude and phase for ''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| FOM || ''m'', analogous to the "Sim" weight, to estimate the reliability of &alpha;<sub>calc</sub><br />
|-<br />
| HLA/HLB/HLC/HLD || Hendrickson-Lattman coefficients encoding the phase probability distribution<br />
|}<br />
<br />
==Translational Non-crystallographic Symmetry==<br />
<br />
<span style="color:crimson">'''*Warning*''' Solution by MR in the presence of translational non-crystallographic symmetry is not fully automated.</span><br />
<br />
Phaser calculates correction factors for the expected intensities in the presence of translational non-crystallographic symmetry (tNCS), and is able to solve structures with complex patterns of tNCS. '''However, the use of Phaser in the presence of tNCS requires the nature of the tNCS to be understood by the user.''' In simple cases, solution is no more difficult than solution without tNCS, but in complex cases, separate Phaser runs with tNCS turned on and off, and/or the use of different tNCS vectors, may be necessary.<br />
<br />
The output of Phaser will help the user in detecting and understanding the tNCS, but '''the tNCS is not completely characterised by Phaser'''. The default behaviour may or may not be correct for the particular crystal under study.<br />
<br />
Characterization of the tNCS involves understanding the number of copies of the molecule in the asymmetric unit and the translation vectors between them. Molecules related by a tNCS vector will have an associated peak in the native Patterson. Phaser calculates the native Patterson (MODE TNCS) and lists the peaks that are more than 20% of the origin peak. Any given crystal with tNCS may have one or more peaks meeting this criteria.<br />
<br />
===Default tNCS detection and correction===<br />
<span style="color:crimson">Documentation for Phaser-2.7.16 and above</span><br />
<br />
====No tNCS====<br />
No tNCS correction is applied by default if there is<br />
# no peak in the native Patterson <br />
# more than one peak in the native Patterson over 20% of the origin and these peaks are not all the result of a commensurate modulation<br />
<br />
====Pairs of molecules====<br />
By default, if Phaser detects a peak in the native Patterson then Phaser will search for molecules in pairs related by the tNCS vector given by the peak in the native Patterson.<br />
<br />
This will be the correct behaviour if and only if there are an even number of copies of the molecule in the asymmetric unit, clustered into two groups related by a single tNCS vector. There will only be one significant peak in the native Patterson. Fortunately, this is a reasonably common scenario.<br />
<br />
Phaser refines the relative orientation of the molecules in the two groups (rotations of up to 10 degrees will still give rise to a significant native Patterson peak) and uses this information to generate expected intensity factors for the reflections. Solution should be straightforward, with the usual caveat for MR that there is a sufficiently good model.<br />
<br />
Where there is a single peak in the native Patterson, it is often located at a position half way along a unit cell axis or diagonal, representing a pseudo-halving of the unit cell dimensions. However, Phaser is by no means restricted to these sorts of pseudo-cells in its handling of two-fold tNCS, and the tNCS vector can be in a general position.<br />
<br />
===Non-default tNCS correction===<br />
====Higher order tNCS====<br />
Frequently, tNCS does not associate 2 clusters of molecules in the asymmetric unit, but rather there are 3 or more (n) clusters of molecules associated by a series of vectors that are multiples of 1, 2, 3 ... (n-1) times a basic translation vector. Where n times the basic translation vector equates to (very close to) integer multiples of unit cell axes, the tNCS represents a pseudo-cell, and this case is known as commensurate modulation. <br />
<br />
Phaser attempts to automatically detect commensurate modulation. The peaks of the native Patterson are analyzed to find the n-fold relationship. The series will not generally have all peaks the same height. Lower peaks in the series represent relationships where the relative rotations between related molecules are larger. Missing peaks in the series may be below the default 20% of origin cut-off. This can be lowered with TNCS PATT PERCENT <x><br />
<br />
Phaser then sets TNCS NMOL <n> and the vector for the tNCS, and searches for ensembles in multiples of NMOL.<br />
<br />
When there are more than two molecules related by tNCS, Phaser does not refine the orientations between the molecules related by the tNCS.<br />
<br />
However, as for two-fold tNCS, Phaser is not restricted to these sorts of pseudo-cells and the basic tNCS vector can be in a general position, as can the number of copies.<br />
<br />
'''The automatic detection may not give the true tNCS relationship'''. For example, the true commensurate modulation may be a factor of the NMOL automatically detected by Phaser, or there may not be commensurate modulation at all, or commensurate modulation may not be found with the default Pattesron peak height cutoff. In difficult cases, please inspect the Patterson for peaks.<br />
<br />
====Complex tNCS====<br />
If there are many molecules in the asymmetric unit but they are not all related by tNCS, or there are sub-groups of molecules related by different tNCS vectors, then the modulations of the expected intensities due to the tNCS will be much less significant than the cases described above. '''In these cases it is possible that structure solution will be achieved without any tNCS correction factors being applied.''' Indeed, searching for all the copies as tNCS-related multiples when some molecules are not related by tNCS will cause structure solution to fail. To turn off the automatic detection and use of tNCS use the keyword TNCS USE OFF.<br />
<br />
If turning off the TNCS correction factors fails to give a solution, then a good approach is to proceed step-wise. Consider the highest native Patterson peak first and determine that nature of the tNCS associated with it. Use the appropriate correction factors to locate all the molecules with this tNCS. Then take the second independent native Patterson peak and apply the correction factors associated with it to find the second set of molecules, fixing the first, etc. Finally, turn TNCS off to find any orphan molecules.</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Molecular_Replacement&diff=2432Molecular Replacement2018-02-08T16:07:56Z<p>Rdo20: </p>
<hr />
<div><div style="margin-left: 25px; float: right;">__TOC__</div><br />
<br />
'''Quicklink to example scripts''' -> [[MR using keyword input]]<br />
<br />
'''Quicklink to phaser.famos (find_alt_orig_sym_mate) documentation''' -> [[Famos]]<br />
<br />
Phaser should be able to solve most structures with the Automated Molecular Replacement mode, and this is the first mode that you should try. Give Phaser your data ([[#How to Define Data|How to Define Data]]) and your models ([[#How to Define Models|How to Define Models]]), tell Phaser what to search for, and a list of possible spacegroups (in the same point group).<br />
<br />
If this doesn't work (see [[#Has Phaser Solved It?| Has Phaser Solved It?]]), you can try selecting peaks of lower significance in the rotation function in case the real orientation was not within the selection criteria. By default peaks above 75% of the top peak are selected (see [[#How to Select Peaks| How to Select Peaks]]). See [[#What to do in Difficult Cases| What to do in Difficult Cases]] for more hints and tips. If the automated molecular replacement mode doesn't work even with non-default input you need to run the modes of Phaser separately. The possibilities are endless - you can even try exhaustive searches (translations of all orientations) if you want - but experience has shown that most structures that can be solved by Phaser can be solved by relatively simple strategies.<br />
<br />
==Automated Molecular Replacement==<br />
Automated Molecular Replacement combines the anisotropy correction, likelihood enhanced fast rotation function, likelihood enhanced fast translation function, packing and refinement modes for multiple search models and a set of possible spacegroups to automatically solve a structure by molecular replacement. Top solutions are output to the files FILEROOT.sol, FILEROOT.#.mtz and FILEROOT.#.pdb (where "#" refers to the sorted solution number, 1 being the best, and only 1 is output by default). Many structures can be solved by running an automated molecular replacement search with defaults, giving the ensembles that you expect to be easiest to find first.<br />
<br />
At the completion of Molecular Replacement you may wish to place your solutions on a common origin with a previous solution, for which [[Famos | Famos ]] can be used.<br />
<br />
[[Image:Phaser_MR_auto.gif|Flow Diagram for Automated MR]]<br />
<br />
==Should Phaser Solve It?==<br />
The difficulty of a molecular replacement problem depends primarily on two major factors: how well the model will be able to explain the diffraction data (which depends both on the accuracy of the model and on its completeness), and how many reflections can be explained, at least in part. Each reflection provides a piece of information that helps to identify correct MR solutions.<br />
<br />
It is possible to make a reasonable prediction of whether or not a solution will be found. If the quality of the model (its accuracy and completeness) can be estimated, then the expected contribution of each reflection to the total LLG can also be estimated. From a large battery of tests, we know that an LLG of 40 or greater usually indicates a correct solution (at least in the absence of complicating factors such as translational non-crystallographic symmetry, tNCS). Building on this understanding, if it is estimated that the LLG will be 60 or less, then Phaser will assume that the problem is a difficult one, and will implement search procedures optimised for difficult problems.<br />
<br />
==What Resolution of Data Should be Used?==<br />
The signal for a molecular replacement solution should be very clear if the expected value of the LLG is much higher than the minimum required to be fairly certain of a solution. Currently Phaser aims for a minimum LLG of 120 and, if it is possible to achieve an even higher value, given the quality of the model and the quantity of diffraction data, then the resolution for the initial search is limited to the value required to achieve an expected LLG of 120. Data to the full resolution are still used for a final rigid-body refinement, or in a second pass if a clear solution is not found in the first attempt.<br />
<br />
However, if the model is expected to have a large RMS error (based usually on the correlation between sequence identity and RMS error), then data to high resolution will not contribute any significant signal. Regardless of the expected LLG at the highest resolution limit, the resolution used is limited to 1.8 times the estimated RMS error of the model, because this resolution limit gives about 99% of the LLG that could be achieved.<br />
<br />
Because Phaser implements strategies designed to solve structures with as much confidence as possible, as efficiently as possible, it is best to leave the choice of resolution to Phaser, at least in the first instance.<br />
<br />
==Has Phaser Solved It?==<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! TF Z-score !! Have I solved it?<br />
|-<br />
| less than 5 || no<br />
|-<br />
| 5 - 6 || unlikely<br />
|-<br />
| 6 - 7 || possibly<br />
|-<br />
| 7 - 8 || probably<br />
|-<br />
| more than 8* ||definitely<br />
|-<br />
|colspan="2" style="text-align: center;" | *''6 for 1st model in monoclinic space groups''<br />
|} <br />
<br />
Ideally, a unique solution with a strong signal will be found at the end of the search. If you are searching for multiple components, then ideally the search for each component will also give a strong signal. However if the signal-to-noise of your search is low, there will be noise peaks and multiple ambiguous solutions. Signal-to-noise is judged using the '''Z-score''', which is computed by comparing the LLG values from the rotation or translation search with LLG values for a set of random rotations or translations. The mean and the RMS deviation from the mean are computed from the random set, then the Z-score for a search peak is defined as its LLG minus the mean, all divided by the RMS deviation, ''i.e. '' '''the number of standard deviations above (or below) the mean. '''<br />
<br />
For a rotation function, the correct orientation may be well down the list with a Z-score (number of standard deviations above the mean value, or RFZ) under 4, and it is often not possible to identify the correct orientation until a translation function is performed and yields a clear solution. Note that the signal-to-noise of the rotation function drops with increasing number of primitive symmetry operations (the number of different orientations for symmetry-related molecules), because there is more uncertainty about how the structure factor contributions from symmetry-related copies will add up.<br />
<br />
For a translation function the correct solution will generally have a Z-score (TFZ) over 5 and be well separated from the rest of the solutions. Of course, there will always be exceptions! The table gives a very rough guide to interpreting TFZ scores. This table will be updated, as we learn more from systematic molecular replacement trials.<br />
<br />
When you are searching for multiple components, the signal may be low for the first few components but, as the model becomes more complete, the signal should become stronger. Finding a clear solution for a new component is a good sign that the partial solution to which that component was added was indeed correct.<br />
<br />
You should always at least glance through the summary of the logfile. One thing to look for, in particular, is whether any translation solutions with a high Z-score have been rejected by the packing step. By default up to 5 percent of marker atoms (C-alpha atoms for protein) are allowed to be involved in clashes. A solution with more clashes may still be correct, and the clashes may arise only because of differences in small surface loops. If this happens, repeat the run allowing a suitable number of clashes. Note that, unless there is specific evidence in the logfile that a high TFZ-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond the default will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
<br />
Note that, by default, Phaser will produce a single PDB file corresponding to the top solution found (if any), so finding a single PDB file in your output directory is not an indication that the search succeeded! You have to look, at least, at the summary of the logfile, or at the list of possible solutions in the .sol file that is produced if you run Phaser from ccp4i or command-line scripts.<br />
<br />
==Annotation==<br />
<br />
A highly compact summary of the history of the statistics of a solution is given in the SOLUTION SET in the .sol file. This is a good place to start your analysis of the output. The annotation gives the Z-score of the solution at each rotation and translation function, the number of clashes in the packing, and the refined LLG.<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! Annotation !! Meaning<br />
|-<br />
| RFZ= || Rotation Function Z-score<br />
|-<br />
| TFZ= || Translation Function Z-score<br />
|-<br />
| PAK= || Number of packing clashes<br />
|-<br />
| LLG= || LLG after refinement. Will be repeated when a low resolution refinement is followed by a high resolution refinement.<br />
|-<br />
| TFZ== || Translation Function Z-score equivalent, only calculated for the top solution after refinement (or for the number of top files specified by TOPFILES)<br />
|-<br />
| RF++ || Rotation angle from previous strong solution has been used in the addition of next solution<br />
|-<br />
| RF*0 || Rotation angle 000 identified by low R-factor of input model<br />
|-<br />
| TFZ=* || First molecule in P1 (arbitrary origin, no Translation Function required)<br />
|-<br />
| TF*0 || Translation vector 000 identified by low R-factor of input model<br />
|-<br />
| (&&nbsp;... & ...) || Set of TFZ PAK and LLG values for placements that were amalgamated (more than one placement from a single Translation Function)<br />
|-<br />
| LLG+=(...&nbsp;&&nbsp;...)&nbsp;|| Set of LLG values calculated during amalgamation, which will always be increasing in value<br />
|-<br />
| +TNCS || Components added by Translational NCS relation<br />
|-<br />
| *T=<i>n</i> || Solution matches template solution <i>n</i><br />
|} <br />
<br />
Two versions of TFZ (the translation function Z-score) now appear for each component. The first ("TFZ=") is the Z-score from the actual translation search, which depends on the accuracy of the orientation used for that search. The second ("TFZ==") is the TFZ-equivalent, which indicates what the TFZ score would have been with the correct (refined) orientation. You should see the TFZ-equivalent is high at least for the final components of the solution, and that the LLG (log-likelihood gain) increases as each component of the solution is added. For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU SET RFZ=10.7 TFZ=24.3 PAK=0 LLG=472 TFZ==24.7 RFZ=6.4 TFZ=24.4 PAK=0 LLG=1006 TFZ==29.7 LLG=1006 TFZ==29.7<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
Note that the Euler angles in Phaser follow the same convention as those defined for the Crowther fast rotation function, i.e. z-y-z (rotate around the z-axis, followed by the new y-axis, followed by the new z-axis).<br />
<br />
==History==<br />
<br />
A highly compact summary of the history of the peak positions of a solution is given in the SOLUTION HISTORY in the .sol file. Together with the SOLUTION SET annotation, this is useful in your analysis of the output. <br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! History !! Meaning<br />
|-<br />
| RF/TF(r/t:n) || (r) Rotation Function peak number/(t) Translation Function peak number for the rotation function : (n) number of peak in final merged and sorted list<br />
|-<br />
| PAK(n:m) || (n) input solution number : (m) output solution number after packing condition applied<br />
|-<br />
| RNP(m,a,b,c,... : p) || All input peaks amalgamated after refinement to give output solution number (m and others): (p) output solution number<br />
|-<br />
| FUSE(A,B,C) || Solution numbers merged in amalgamation<br />
|} <br />
<br />
For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU HISTORY RF/TF(1/1:1)PAK(1:1)RNP(1:1)RNP(1:1)<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
A more complicated structure solution may have<br />
<br />
SOLU HISTORY RF/TF(7/1:10)PAK(10:10)RNP(10,12,13,11,17,16,18,25,3,8,22,21,20,7,969,6,5,201,9,4,390,2,1,19:1)RNP(1:1)<br />
<br />
==What to do in Difficult Cases==<br />
<br />
Not every structure can be solved by molecular replacement, but the right strategy can push the limits. What to do when the default jobs fail depends on why your structure is difficult.<br />
*'''Flexible Structure'''<br />
*:The relative orientations of the domains may be different in your crystal than in the model. If that may be the case, break the model into separate PDB files containing rigid-body units, enter these as separate ensembles, and search for them separately. If you find a convincing solution for one domain, but fail to find a solution for the next domain, you can take advantage of the knowledge that its orientation is likely to be similar to that of the first domain. The ROTAte&nbsp;AROUnd option of the brute rotation search can be used to restrict the search to orientations within, say, 30 degrees of that of the known domain. Allow for close approach of the domains by increasing the allowed clashes with the PACK keyword by, say, 1 for each domain break that you introduce. Note that it is possible to use the brute rotation search as part of the automated molecular replacement pipeline, by changing the choice of the type of rotation search. Alternatively, you could try generating a series of models perturbed by normal modes, with the NMAPdb keyword. One of these may duplicate the hinge motion and provide a good single model.<br />
*'''Poor or Incomplete Model'''<br />
*:Signal-to-noise is reduced by coordinate errors or incompleteness of the model. Since the rotation search has lower signal to begin with than the translation search, it is usually more severely affected. For this reason, it can be very useful to use the subsequent translation search as a way to choose among many (say 1000) orientations. THe MR_AUTO FAST search mode automatically reduces the cutoff for accepting peaks from the fast rotation function if the decault pass does not find a solution with a high z-score, but you can manually reduce this further with the PEAKS and PURGE keywords. You can also try turning off the clustering of fast rotation function peaks because the correct orientation may sit on the shoulder of a peak in the rotation function. <br />
*:As shown convincingly by Schwarzenbacher ''et al.'' (Schwarzenbacher, Godzik, Grzechnik &amp; Jaroszewski, ''Acta Cryst.'' D'''60''', 1229-1236, 2004), judicious editing can make a significant difference in the quality of a distant model. In a number of tests with their data on models below 30% sequence identity, we have found that Phaser works best with a "mixed model" (non-identical sidechains longer than Ser replaced by Ser). In agreement with their results, the best models are generally derived using more sophisticated alignment protocols, such as their FFAS protocol. Use [http://www.phenix-online.org/documentation/sculptor.htm phenix.sculptor] to edit your model.<br />
*'''High Degree of Non-crystallographic Symmetry'''<br />
*:If there are clear peaks in the self-rotation function, you can expect orientations to be related by this known NCS. Methods to automatically use such information will be implemented in a future version of Phaser. In the meantime, you can work out for yourself the orientations that would be consistent with NCS and use the ROTAte&nbsp;AROUnd option to sample similar orientations. Alternatively, you may have an oligomeric model and expect similar NCS in the crystal. First search with the oligomeric model; if this fails, search with a monomer. If that succeeds, you can again use the ROTAte&nbsp;AROUnd option to force a subsequent monomer to adopt an orientation similar to the one you expect.<br />
*'''What <u>not</u> to do'''<br />
*:The automated mode of Phaser is fast when Phaser finds a high Z-score solution to your problem. When Phaser cannot find a solution with a significant Z-score, it "thrashes", meaning it maintains a list of 100-1000's of low Z-score potential solutions and tries to improve them. This can lead to exceptionally long Phaser runs (over a week of CPU time). Such runs are possible because the highly automated script allows many consecutive MR jobs to be run without you having to manually set 100-1000's of jobs running and keep track of the results. "Thrashing" generally does not produce a solution: solutions generally appear relatively quickly or not at all. It is more useful to go back and analyse your models and your data to see where improvements can be made. Your system manager will appreciate you terminating these jobs.<br />
*:It is also not a good idea to effectively remove the packing test. Unless there is specific evidence in the logfile that a high TF-function Z-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond a few (e.g. 1-5) will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
*'''Other suggestions'''<br />
*:Phaser has powerful input, output and scripting facilities that allow a large number of possibilities for altering default behaviour and forcing Phaser to do what you think it should. However, you will need to read the information in the manual below to take advantage of these facilities!<br />
<br />
==How to Define Data==<br />
You need to tell Phaser the name of the mtz file containing your data and the columns in the mtz file to be used using the HKLIn and LABIn keywords. Additional keywords (BINS CELL OUTLier RESOlution SPACegroup) define how the data are used.<br />
<br />
==How to Define Models==<br />
Phaser must be given the models that it will use for molecular replacement. A model in Phaser is referred to as an "ensemble", even when it is described by a single file. This is because it is possible to provide a set of aligned structures as an ensemble, from which a statistically-weighted averaged model is calculated. A molecular replacement model is provided either as one or more aligned pdb files, or as an electron density map, entered as structure factors in an mtz file. Each ensemble is treated as a separate type of rigid body to be placed in the molecular replacement solution. An ensemble should only be defined once, even if there are several copies of the molecule in the asymmetric unit.<br />
<br />
Fundamental to the way in which Phaser uses MR models (either from coordinates or maps) is to estimate how the accuracy of the model falls off as a function of resolution, represented by the Sigma(A) curve. To generate the Sigma(A) curve, Phaser needs to know the RMS coordinate error expected for the model and the fraction of the scattering power in the asymmetric unit that this model contributes.<br />
<br />
A Babinet-style correction is used to account for the effects of disordered solvent on the completeness of the model at low resolution.<br />
<br />
Molecular replacement models are defined with the ENSEmble keyword and the COMPosition keyword. The ENSEmble keyword gives (amongst other things) the RMS deviation for the Sigma(A) curve. The COMPosition keyword is used to deduce the fraction of the scattering power in the asymmetric unit that each ensemble contributes. The composition of the asymmetric unit is defined either by entering the molecular weights or sequences of the components in the asymmetric unit, and giving the number of copies of each. Expert users can also enter the fraction of the scattering of each component directly, although the composition must still be entered for the absolute scale calculation. Please note that the composition supplied to Phaser has to include everything in the asymmetric unit, not just what is being looked for in the current search!<br />
<br />
===Building an Ensemble from Coordinates===<br />
The RMS deviation is determined directly from RMS or indirectly from IDENtity in the ENSEmble<br />
keyword using a formula that depends on the sequence identity and the number of residues in the model.<br />
<br />
The RMS deviation estimated from ID may be an underestimate of the true value if there is a slight conformational change between the model and target structures. To find a solution in these cases it may be necessary to increase the RMS from the default value generated from the ID, by say 0.5 Ångstroms. On the other hand, when Phaser succeeds in solving a structure from a model with sequence identity much below 30%, it is often found that the fold is preserved better than the average for that level of sequence identity. So it may be worth submitting a run in which the RMS error is set at, say, 1.5, even if the sequence identity is low. The table below can be used as a guide as to the default RMS value corresponding to ID.<br />
<br />
If you construct a model by homology modelling, remember that the RMS error you expect is essentially the error you expect from the template structure (if not worse!). So specify the sequence identity of the template, not of the homology model.<br />
<br />
Only the model with the highest sequence identity is reported in the output pdb file. Also, HETATM cards in the input pdb file are ignored in the calculation of the structure factors for the ensemble, but are carried through to the output pdb file. Thus, the phases on the output mtz file (which come from the structure factors of the ensemble) do not correspond to those that would be calculated from the output pdb file, when there is more than one pdb file in an ensemble and/or the pdbfile(s) have HETATM records.<br />
<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|+ '''Initial estimate of RMS deviation: Number of residues in model (upper row) versus sequence identity (left column)'''<br />
|-<br />
! !! #50 !! #100 !! #200 !! #300 !! #400 !! #600 !! #850 !! #1000 !! #1500 !! #2000<br />
|-<br />
|'''ID=0%''' || 1.579 || 1.689 || 1.875 || 2.030 || 2.164 || 2.391 || 2.625 || 2.748 || 3.093 || 3.375<br />
|-<br />
|'''ID=10%''' || 1.356 || 1.451 || 1.610 || 1.743 || 1.858 || 2.053 || 2.255 || 2.360 || 2.657 || 2.899<br />
|-<br />
|'''ID=20%''' || 1.165 || 1.246 || 1.383 || 1.497 || 1.596 || 1.764 || 1.936 || 2.027 || 2.281 || 2.489<br />
|-<br />
|'''ID=30%''' || 1.000 || 1.070 || 1.188 || 1.286 || 1.371 || 1.515 || 1.663 || 1.741 || 1.959 || 2.138<br />
|-<br />
|'''ID=40%''' || 0.859 || 0.919 || 1.020 || 1.104 || 1.177 || 1.301 || 1.428 || 1.495 || 1.683 || 1.836<br />
|-<br />
|'''ID=50%''' || 0.738 || 0.789 || 0.876 || 0.948 || 1.011 || 1.117 || 1.227 || 1.284 || 1.445 || 1.577<br />
|-<br />
|'''ID=60%''' || 0.634 || 0.678 || 0.752 || 0.814 || 0.868 || 0.959 || 1.053 || 1.103 || 1.241 || 1.354<br />
|-<br />
|'''ID=70%''' || 0.544 || 0.582 || 0.646 || 0.699 || 0.746 || 0.824 || 0.905 || 0.947 || 1.066 || 1.163<br />
|-<br />
|'''ID=80%''' || 0.467 || 0.500 || 0.555 || 0.601 || 0.640 || 0.708 || 0.777 || 0.813 || 0.915 || 0.999<br />
|-<br />
|'''ID=90%''' || 0.401 || 0.429 || 0.477 || 0.516 || 0.550 || 0.608 || 0.667 || 0.698 || 0.786 || 0.858<br />
|-<br />
|'''ID=100%''' || 0.345 || 0.369 || 0.409 || 0.443 || 0.472 || 0.522 || 0.573 || 0.600 || 0.675 || 0.737<br />
|-<br />
|}<br />
<br />
<br />
====Coordinate Editing====<br />
=====HETATM/LIGANDS=====<br />
Phaser ignores the scattering from HETATM records. The HETATM records are carried though to output with occupancy set to zero. Ligands will therefore not contribute to the scattering used for molecular replacement. The exceptions to this rule are the HETATM records for MSE (seleno-methionine) MSO (seleno-methionine selenoxide) CSE (seleno-cysteine) CSO (seleno-cysteine selenoxide) ALY (acetyllysine) MLY (n-dimethyl-lysine) and MLZ (n-methyl-lysine) which are used in the scattering and carried through to output with their original occupancy. If you wish to include any HETATM records in the scattering the record name use the keyword ENSE modlid HETATOM ON<br />
<br />
=====WATER=====<br />
Water molecules (identified by the residue name OW WAT HOH H2O OH2 MOH WTR or TIP) are deleted from the pdb file on input, are not used in the scattering and are not carried through to file output. If you want to retain water molecules you will need to change the residue name to something other than this (e.g. WWW) so that the atoms are not identified as water. To include the water molecules in the scattering, the HETATM records will also have to be changed to ATOM records as described above.<br />
<br />
===Building an Ensemble from Electron Density===<br />
When using density as a model, it is necessary to specify both the extent (x,y,z limits) of the cut-out region of density, and the centre of this region. With coordinates, Phaser can work this out by itself. This information is needed, for instance, to decide how large rotational steps can be in the rotation search and to carry out the molecular transform interpolation correctly. In the case of electron density, the RMS value does not have the same physical meaning that it has when the model is specified by atomic coordinates, but it is used to judge how the accuracy of the calculated structure factors drops off with resolution. A suitable value for RMS can be obtained, in the case of density from an experimentally-phased map, by choosing a value that makes the SigmaA curve fall off with resolution similarly to the mean figures-of-merit. In the case of density from an EM image reconstruction, the RMS value should make the SigmaA curve fall off similarly to a Fourier correlation curve used to judge the resolution of the EM image.<br />
<br />
For detailed information, including a tutorial with example scripts, see<br />
[[Using Electron Density as a Model| Using density as a model]]<br />
<br />
==How to Define Composition==<br />
The composition defines the total amount of protein and nucleic acid that you have in the asymmetric unit not the fraction of the asymmetric unit that you are searching for.<br />
<br />
===Default Composition===<br />
For convenience, the composition defaults to 50% protein scattering by volume (the average for protein crystals). It is better to enter it explicitly, even if only to check that you have correctly deduced the probable content of your crystal. If your crystal has higher or lower solvent content than this, or contains nucleic acid, then the composition should be entered explicitly.<br />
===Composition by Solvent Content===<br />
Scattering is determined from the solvent content of the crystal, assuming that the crystal contains protein only, and the average distribution of amino acids in protein. If your crystal contains nucleic acid or your protein has an unusual amino acid distribution then the composition should be entered explicitly using the MW or sequence options.<br />
===Composition by Number of Residues in ASU===<br />
Scattering is determined from the number of residues in the asymmetric unit, assuming that the crystal contains protein only or nucleic acid only, and assuming an average distribution of residues for either. If your crystal contains a mixture then the composition should be entered explicitly using the MW or sequence options. If your crystal has an unusual residue distribution then the composition should be entered explicitly using the sequence options.<br />
===Composition by Molecular Weight===<br />
The composition is calculated from the molecular weight of the protein and nucleic acid assuming the protein and nucleic acid have the average distribution of amino acids and bases. If your protein or nucleic acid has an unusual amino acid or base distribution the composition should be entered by sequence. You can mix compositions entered by molecular weight with those entered by sequence.<br />
===Composition by Sequence===<br />
The composition is calculated from the amino acid sequence of the protein and the base sequence of the nucleic acid in fasta format. You can mix compositions entered by molecular weight with those entered by sequence. Individual atoms can be added to the composition with the COMPOSITION ATOM keyword. This allows the explicit addition of heavy atoms in the structure e.g. Fe atoms.<br />
===Composition by Percentage Scattering===<br />
The fraction scattering of each ensemble can be entered directly. The fraction scattering of each ensemble is normally automatically worked out from the average scattering from each ensemble (calculated from the pdb files if entered as coordinates, or from the protein and nucleic acid molecular weights if entered as a map) divided by the total scattering given by the composition, but entering the fraction scattering directly overrides this calculation. This option is for use when the pdb files of the models in the ensemble are unusual e.g. consist only of C-alpha atoms, or only of hydrogen atoms (as in the CLOUDS method for NMR).<br />
<br />
==How to Define Solutions==<br />
Phaser writes out files ending in ".sol" and ".rlist" that contain the solution information from the job. The root of the files is given by the ROOT keyword. By default, the root filename is PHASER. These files can be read back into subsequent runs of Phaser to build up solutions containing more than one molecule in the asymmetric unit.<br />
<br />
"PHASER.sol" files are generated by all modes (rotation function modes with VERBOSE output), and contain the current idea of potential molecular replacement solutions.<br />
<br />
"PHASER.rlist" files are generated by the rotation function modes, and are used as input for performing translation functions.<br />
<br />
For simple MR cases you don't really need to know how to define molecular replacement solutions. However, for difficult cases you might need to edit the files "PHASER.sol" and "PHASER.rlist" files manually<br />
<br />
=== "sol" Files===<br />
SOLUtion 6DIM keywords describe Ensembles that have been oriented by a rotation search and positioned by a translation search. Each Ensemble in the asymmetric unit has its own SOLUtion keyword. When more than one (potential) molecular replacement solution is present, the solutions are separated with the SOLUTION SET keywords.<br />
<br />
==="rlist" Files===<br />
These files define a rotation function list. The peak list is given with a series of SOLUtion TRIAl keywords.<br />
<br />
If a partial solution is already known, then the information for the currently "known" parts of the asymmetric unit is given in the form used for the PHASER.sol file, followed by the list of trial orientations for which a translation function is to be performed.<br />
<br />
===Fixed partial structure===<br />
If you have the coordinates of a partial solution with the pdb coordinates of the known structure in the correct orientation and position, then you can force Phaser to use these coordinates. Use the SOLUTION keyword to fix a rotation of 0 0 0 and a position of 0 0 0 for these coordinates.<br />
<br />
==How to Select Peaks==<br />
<br />
<br />
<br />
The selection of peaks saved for output in the rotation and translation functions can be done in four different ways.<br />
*'''Select by Percentage'''<br />
*: Percentage of the top peak, where the value of the top peak is defined as 100% and the value of the mean is defined as 0%.<br />
*: Default, cutoff=75%. This criteria has the advantange that at least one peak (the top peak) always survives the selection. If the top solution is clear, then only the one solution will be output, but if the distribution of peaks is rather flat, then many peaks will be output for testing in the next part of the MR procedure (e.g. many peaks selected from the rotation function for testing with a translation function). <br />
*'''Select by Z-score'''<br />
*: Number of standard deviations (sigmas) over the mean (the Z-score). <br />
*: Absolute significance test. Not all searches will produce output if the cutoff value is too high (e.g. 5 sigma). <br />
*'''Select by Number'''<br />
*: Number of top peaks to select. <br />
*: If the distribution is very flat then it might be better to select a fixed large number (e.g. 1000) of top rotation peaks for testing in the translation function.<br />
*'''No selection'''<br />
*: All peaks are selected. <br />
*: Enables full 6 dimensional searches, where all the solutions from the rotation function are output for testing in the translation function. This should never be necessary; it would be much faster and probably just as likely to work if the top 1000 peaks were used in this way.<br />
<br />
[[Image:Phaser_selection.gif| Selection criteria]]<br />
<br />
Peaks can also be clustered or not clustered prior to selection in steps 1 and 2.<br />
*'''Clustering Off'''<br />
: All high peaks on the search grid are selected<br />
*'''Clustering On'''<br />
: Points on the search grid with higher neighbouring points are removed from the selection<br />
<br />
<br />
[[Image:Phaser_clustering.gif| Clustering]]<br />
<br />
==How to Control Output==<br />
The output of Phaser can be controlled with optional keywords. <br />
<br />
The ROOT keyword is not compulsory (the default root filename is "PHASER"), but should always be given, so that your jobs have separate and meaningful output filenames.<br />
<br />
The TOPFiles keyword controls the number of potential MR solutions for which PDB and (in the appropriate modes) MTZ files are produced.<br />
<br />
For the MR_AUTO, MR_RNP and MR_LLG modes, unless HKLOut OFF is given as an optional keyword, Phaser produces an MTZ file with "SigmaA" type weighted Fourier map coefficients for producing electron density maps for rebuilding.<br />
<br />
{| class="wikitable" style="text-align:left" width=100%<br />
|-<br />
! MTZ Column Labels !! Description<br />
|-<br />
| FWT/PHWT || Amplitude and phase for 2''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| DELFWT/PHDELWT || Amplitude and phase for ''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| FOM || ''m'', analogous to the "Sim" weight, to estimate the reliability of &alpha;<sub>calc</sub><br />
|-<br />
| HLA/HLB/HLC/HLD || Hendrickson-Lattman coefficients encoding the phase probability distribution<br />
|}<br />
<br />
==Translational Non-crystallographic Symmetry==<br />
<br />
<span style="color:crimson">'''*Warning*''' Solution by MR in the presence of translational non-crystallographic symmetry is not fully automated.</span><br />
<br />
Phaser calculates correction factors for the expected intensities in the presence of translational non-crystallographic symmetry (tNCS), and is able to solve structures with complex patterns of tNCS. '''However, the use of Phaser in the presence of tNCS requires the nature of the tNCS to be understood by the user.''' In simple cases, solution is no more difficult than solution without tNCS, but in complex cases, separate Phaser runs with tNCS turned on and off, and/or the use of different tNCS vectors, may be necessary.<br />
<br />
The output of Phaser will help the user in detecting and understanding the tNCS, but '''the tNCS is not completely characterised by Phaser'''. The default behaviour may or may not be correct for the particular crystal under study.<br />
<br />
Characterization of the tNCS involves understanding the number of copies of the molecule in the asymmetric unit and the translation vectors between them. Molecules related by a tNCS vector will have an associated peak in the native Patterson. Phaser calculates the native Patterson (MODE TNCS) and lists the peaks that are more than 20% of the origin peak. Any given crystal with tNCS may have one or more peaks meeting this criteria.<br />
<br />
===Default tNCS detection and correction===<br />
<span style="color:crimson">Documentation for Phaser-2.7.16 and above</span><br />
<br />
====No tNCS====<br />
No tNCS correction is applied by default if there is<br />
# no peak in the native Patterson <br />
# more than one peak in the native Patterson over 20% of the origin and these peaks are not all the result of a commensurate modulation<br />
<br />
====Pairs of molecules====<br />
By default, if Phaser detects a peak in the native Patterson then Phaser will search for molecules in pairs related by the tNCS vector given by the peak in the native Patterson.<br />
<br />
This will be the correct behaviour if and only if there are an even number of copies of the molecule in the asymmetric unit, clustered into two groups related by a single tNCS vector. There will only be one significant peak in the native Patterson. Fortunately, this is a reasonably common scenario.<br />
<br />
Phaser refines the relative orientation of the molecules in the two groups (rotations of up to 10 degrees will still give rise to a significant native Patterson peak) and uses this information to generate expected intensity factors for the reflections. Solution should be straightforward, with the usual caveat for MR that there is a sufficiently good model.<br />
<br />
Where there is a single peak in the native Patterson, it is often located at a position half way along a unit cell axis or diagonal, representing a pseudo-halving of the unit cell dimensions. However, Phaser is by no means restricted to these sorts of pseudo-cells in its handling of two-fold tNCS, and the tNCS vector can be in a general position.<br />
<br />
===Non-default tNCS correction===<br />
====Higher order tNCS====<br />
Frequently, tNCS does not associate 2 clusters of molecules in the asymmetric unit, but rather there are 3 or more (n) clusters of molecules associated by a series of vectors that are multiples of 1, 2, 3 ... (n-1) times a basic translation vector. Where n times the basic translation vector equates to (very close to) integer multiples of unit cell axes, the tNCS represents a pseudo-cell, and this case is known as commensurate modulation. <br />
<br />
Phaser attempts to automatically detect commensurate modulation. The peaks of the native Patterson are analyzed to find the n-fold relationship. The series will not generally have all peaks the same height. Lower peaks in the series represent relationships where the relative rotations between related molecules are larger. Missing peaks in the series may be below the default 20% of origin cut-off. This can be lowered with TNCS PATT PERCENT <x><br />
<br />
Phaser then sets TNCS NMOL <n> and the vector for the tNCS, and searches for ensembles in multiples of NMOL.<br />
<br />
When there are more than two molecules related by tNCS, Phaser does not refine the orientations between the molecules related by the tNCS.<br />
<br />
However, as for two-fold tNCS, Phaser is not restricted to these sorts of pseudo-cells and the basic tNCS vector can be in a general position, as can the number of copies.<br />
<br />
'''The automatic detection may not give the true tNCS relationship'''. For example, the true commensurate modulation may be a factor of the NMOL automatically detected by Phaser, or there may not be commensurate modulation at all, or commensurate modulation may not be found with the default Pattesron peak height cutoff. In difficult cases, please inspect the Patterson for peaks.<br />
<br />
====Complex tNCS====<br />
If there are many molecules in the asymmetric unit but they are not all related by tNCS, or there are sub-groups of molecules related by different tNCS vectors, then the modulations of the expected intensities due to the tNCS will be much less significant than the cases described above. '''In these cases it is possible that structure solution will be achieved without any tNCS correction factors being applied.''' Indeed, searching for all the copies as tNCS-related multiples when some molecules are not related by tNCS will cause structure solution to fail. To turn off the automatic detection and use of tNCS use the keyword TNCS USE OFF.<br />
<br />
If turning off the TNCS correction factors fails to give a solution, then a good approach is to proceed step-wise. Consider the highest native Patterson peak first and determine that nature of the tNCS associated with it. Use the appropriate correction factors to locate all the molecules with this tNCS. Then take the second independent native Patterson peak and apply the correction factors associated with it to find the second set of molecules, fixing the first, etc. Finally, turn TNCS off to find any orphan molecules.</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Molecular_Replacement&diff=2431Molecular Replacement2018-02-08T15:14:15Z<p>Rdo20: /* Building an Ensemble from Coordinates */</p>
<hr />
<div><div style="margin-left: 25px; float: right;">__TOC__</div><br />
<br />
'''Quicklink to example scripts''' → [[MR using keyword input]]<br />
<br />
'''Quicklink to phaser.famos (find_alt_orig_sym_mate) documentation''' → [[Famos]]<br />
<br />
Phaser should be able to solve most structures with the Automated Molecular Replacement mode, and this is the first mode that you should try. Give Phaser your data ([[#How to Define Data|How to Define Data]]) and your models ([[#How to Define Models|How to Define Models]]), tell Phaser what to search for, and a list of possible spacegroups (in the same point group).<br />
<br />
If this doesn't work (see [[#Has Phaser Solved It?| Has Phaser Solved It?]]), you can try selecting peaks of lower significance in the rotation function in case the real orientation was not within the selection criteria. By default peaks above 75% of the top peak are selected (see [[#How to Select Peaks| How to Select Peaks]]). See [[#What to do in Difficult Cases| What to do in Difficult Cases]] for more hints and tips. If the automated molecular replacement mode doesn't work even with non-default input you need to run the modes of Phaser separately. The possibilities are endless - you can even try exhaustive searches (translations of all orientations) if you want - but experience has shown that most structures that can be solved by Phaser can be solved by relatively simple strategies.<br />
<br />
==Automated Molecular Replacement==<br />
Automated Molecular Replacement combines the anisotropy correction, likelihood enhanced fast rotation function, likelihood enhanced fast translation function, packing and refinement modes for multiple search models and a set of possible spacegroups to automatically solve a structure by molecular replacement. Top solutions are output to the files FILEROOT.sol, FILEROOT.#.mtz and FILEROOT.#.pdb (where "#" refers to the sorted solution number, 1 being the best, and only 1 is output by default). Many structures can be solved by running an automated molecular replacement search with defaults, giving the ensembles that you expect to be easiest to find first.<br />
<br />
At the completion of Molecular Replacement you may wish to place your solutions on a common origin with a previous solution, for which [[Famos | Famos ]] can be used.<br />
<br />
[[Image:Phaser_MR_auto.gif|Flow Diagram for Automated MR]]<br />
<br />
==Should Phaser Solve It?==<br />
The difficulty of a molecular replacement problem depends primarily on two major factors: how well the model will be able to explain the diffraction data (which depends both on the accuracy of the model and on its completeness), and how many reflections can be explained, at least in part. Each reflection provides a piece of information that helps to identify correct MR solutions.<br />
<br />
It is possible to make a reasonable prediction of whether or not a solution will be found. If the quality of the model (its accuracy and completeness) can be estimated, then the expected contribution of each reflection to the total LLG can also be estimated. From a large battery of tests, we know that an LLG of 40 or greater usually indicates a correct solution (at least in the absence of complicating factors such as translational non-crystallographic symmetry, tNCS). Building on this understanding, if it is estimated that the LLG will be 60 or less, then Phaser will assume that the problem is a difficult one, and will implement search procedures optimised for difficult problems.<br />
<br />
==What Resolution of Data Should be Used?==<br />
The signal for a molecular replacement solution should be very clear if the expected value of the LLG is much higher than the minimum required to be fairly certain of a solution. Currently Phaser aims for a minimum LLG of 120 and, if it is possible to achieve an even higher value, given the quality of the model and the quantity of diffraction data, then the resolution for the initial search is limited to the value required to achieve an expected LLG of 120. Data to the full resolution are still used for a final rigid-body refinement, or in a second pass if a clear solution is not found in the first attempt.<br />
<br />
However, if the model is expected to have a large RMS error (based usually on the correlation between sequence identity and RMS error), then data to high resolution will not contribute any significant signal. Regardless of the expected LLG at the highest resolution limit, the resolution used is limited to 1.8 times the estimated RMS error of the model, because this resolution limit gives about 99% of the LLG that could be achieved.<br />
<br />
Because Phaser implements strategies designed to solve structures with as much confidence as possible, as efficiently as possible, it is best to leave the choice of resolution to Phaser, at least in the first instance.<br />
<br />
==Has Phaser Solved It?==<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! TF Z-score !! Have I solved it?<br />
|-<br />
| less than 5 || no<br />
|-<br />
| 5 - 6 || unlikely<br />
|-<br />
| 6 - 7 || possibly<br />
|-<br />
| 7 - 8 || probably<br />
|-<br />
| more than 8* ||definitely<br />
|-<br />
|colspan="2" style="text-align: center;" | *''6 for 1st model in monoclinic space groups''<br />
|} <br />
<br />
Ideally, a unique solution with a strong signal will be found at the end of the search. If you are searching for multiple components, then ideally the search for each component will also give a strong signal. However if the signal-to-noise of your search is low, there will be noise peaks and multiple ambiguous solutions. Signal-to-noise is judged using the '''Z-score''', which is computed by comparing the LLG values from the rotation or translation search with LLG values for a set of random rotations or translations. The mean and the RMS deviation from the mean are computed from the random set, then the Z-score for a search peak is defined as its LLG minus the mean, all divided by the RMS deviation, ''i.e. '' '''the number of standard deviations above (or below) the mean. '''<br />
<br />
For a rotation function, the correct orientation may be well down the list with a Z-score (number of standard deviations above the mean value, or RFZ) under 4, and it is often not possible to identify the correct orientation until a translation function is performed and yields a clear solution. Note that the signal-to-noise of the rotation function drops with increasing number of primitive symmetry operations (the number of different orientations for symmetry-related molecules), because there is more uncertainty about how the structure factor contributions from symmetry-related copies will add up.<br />
<br />
For a translation function the correct solution will generally have a Z-score (TFZ) over 5 and be well separated from the rest of the solutions. Of course, there will always be exceptions! The table gives a very rough guide to interpreting TFZ scores. This table will be updated, as we learn more from systematic molecular replacement trials.<br />
<br />
When you are searching for multiple components, the signal may be low for the first few components but, as the model becomes more complete, the signal should become stronger. Finding a clear solution for a new component is a good sign that the partial solution to which that component was added was indeed correct.<br />
<br />
You should always at least glance through the summary of the logfile. One thing to look for, in particular, is whether any translation solutions with a high Z-score have been rejected by the packing step. By default up to 5 percent of marker atoms (C-alpha atoms for protein) are allowed to be involved in clashes. A solution with more clashes may still be correct, and the clashes may arise only because of differences in small surface loops. If this happens, repeat the run allowing a suitable number of clashes. Note that, unless there is specific evidence in the logfile that a high TFZ-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond the default will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
<br />
Note that, by default, Phaser will produce a single PDB file corresponding to the top solution found (if any), so finding a single PDB file in your output directory is not an indication that the search succeeded! You have to look, at least, at the summary of the logfile, or at the list of possible solutions in the .sol file that is produced if you run Phaser from ccp4i or command-line scripts.<br />
<br />
==Annotation==<br />
<br />
A highly compact summary of the history of the statistics of a solution is given in the SOLUTION SET in the .sol file. This is a good place to start your analysis of the output. The annotation gives the Z-score of the solution at each rotation and translation function, the number of clashes in the packing, and the refined LLG.<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! Annotation !! Meaning<br />
|-<br />
| RFZ= || Rotation Function Z-score<br />
|-<br />
| TFZ= || Translation Function Z-score<br />
|-<br />
| PAK= || Number of packing clashes<br />
|-<br />
| LLG= || LLG after refinement. Will be repeated when a low resolution refinement is followed by a high resolution refinement.<br />
|-<br />
| TFZ== || Translation Function Z-score equivalent, only calculated for the top solution after refinement (or for the number of top files specified by TOPFILES)<br />
|-<br />
| RF++ || Rotation angle from previous strong solution has been used in the addition of next solution<br />
|-<br />
| RF*0 || Rotation angle 000 identified by low R-factor of input model<br />
|-<br />
| TFZ=* || First molecule in P1 (arbitrary origin, no Translation Function required)<br />
|-<br />
| TF*0 || Translation vector 000 identified by low R-factor of input model<br />
|-<br />
| (&&nbsp;... & ...) || Set of TFZ PAK and LLG values for placements that were amalgamated (more than one placement from a single Translation Function)<br />
|-<br />
| LLG+=(...&nbsp;&&nbsp;...)&nbsp;|| Set of LLG values calculated during amalgamation, which will always be increasing in value<br />
|-<br />
| +TNCS || Components added by Translational NCS relation<br />
|-<br />
| *T=<i>n</i> || Solution matches template solution <i>n</i><br />
|} <br />
<br />
Two versions of TFZ (the translation function Z-score) now appear for each component. The first ("TFZ=") is the Z-score from the actual translation search, which depends on the accuracy of the orientation used for that search. The second ("TFZ==") is the TFZ-equivalent, which indicates what the TFZ score would have been with the correct (refined) orientation. You should see the TFZ-equivalent is high at least for the final components of the solution, and that the LLG (log-likelihood gain) increases as each component of the solution is added. For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU SET RFZ=10.7 TFZ=24.3 PAK=0 LLG=472 TFZ==24.7 RFZ=6.4 TFZ=24.4 PAK=0 LLG=1006 TFZ==29.7 LLG=1006 TFZ==29.7<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
Note that the Euler angles in Phaser follow the same convention as those defined for the Crowther fast rotation function, i.e. z-y-z (rotate around the z-axis, followed by the new y-axis, followed by the new z-axis).<br />
<br />
==History==<br />
<br />
A highly compact summary of the history of the peak positions of a solution is given in the SOLUTION HISTORY in the .sol file. Together with the SOLUTION SET annotation, this is useful in your analysis of the output. <br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! History !! Meaning<br />
|-<br />
| RF/TF(r/t:n) || (r) Rotation Function peak number/(t) Translation Function peak number for the rotation function : (n) number of peak in final merged and sorted list<br />
|-<br />
| PAK(n:m) || (n) input solution number : (m) output solution number after packing condition applied<br />
|-<br />
| RNP(m,a,b,c,... : p) || All input peaks amalgamated after refinement to give output solution number (m and others): (p) output solution number<br />
|-<br />
| FUSE(A,B,C) || Solution numbers merged in amalgamation<br />
|} <br />
<br />
For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU HISTORY RF/TF(1/1:1)PAK(1:1)RNP(1:1)RNP(1:1)<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
A more complicated structure solution may have<br />
<br />
SOLU HISTORY RF/TF(7/1:10)PAK(10:10)RNP(10,12,13,11,17,16,18,25,3,8,22,21,20,7,969,6,5,201,9,4,390,2,1,19:1)RNP(1:1)<br />
<br />
==What to do in Difficult Cases==<br />
<br />
Not every structure can be solved by molecular replacement, but the right strategy can push the limits. What to do when the default jobs fail depends on why your structure is difficult.<br />
*'''Flexible Structure'''<br />
*:The relative orientations of the domains may be different in your crystal than in the model. If that may be the case, break the model into separate PDB files containing rigid-body units, enter these as separate ensembles, and search for them separately. If you find a convincing solution for one domain, but fail to find a solution for the next domain, you can take advantage of the knowledge that its orientation is likely to be similar to that of the first domain. The ROTAte&nbsp;AROUnd option of the brute rotation search can be used to restrict the search to orientations within, say, 30 degrees of that of the known domain. Allow for close approach of the domains by increasing the allowed clashes with the PACK keyword by, say, 1 for each domain break that you introduce. Note that it is possible to use the brute rotation search as part of the automated molecular replacement pipeline, by changing the choice of the type of rotation search. Alternatively, you could try generating a series of models perturbed by normal modes, with the NMAPdb keyword. One of these may duplicate the hinge motion and provide a good single model.<br />
*'''Poor or Incomplete Model'''<br />
*:Signal-to-noise is reduced by coordinate errors or incompleteness of the model. Since the rotation search has lower signal to begin with than the translation search, it is usually more severely affected. For this reason, it can be very useful to use the subsequent translation search as a way to choose among many (say 1000) orientations. THe MR_AUTO FAST search mode automatically reduces the cutoff for accepting peaks from the fast rotation function if the decault pass does not find a solution with a high z-score, but you can manually reduce this further with the PEAKS and PURGE keywords. You can also try turning off the clustering of fast rotation function peaks because the correct orientation may sit on the shoulder of a peak in the rotation function. <br />
*:As shown convincingly by Schwarzenbacher ''et al.'' (Schwarzenbacher, Godzik, Grzechnik &amp; Jaroszewski, ''Acta Cryst.'' D'''60''', 1229-1236, 2004), judicious editing can make a significant difference in the quality of a distant model. In a number of tests with their data on models below 30% sequence identity, we have found that Phaser works best with a "mixed model" (non-identical sidechains longer than Ser replaced by Ser). In agreement with their results, the best models are generally derived using more sophisticated alignment protocols, such as their FFAS protocol. Use [http://www.phenix-online.org/documentation/sculptor.htm phenix.sculptor] to edit your model.<br />
*'''High Degree of Non-crystallographic Symmetry'''<br />
*:If there are clear peaks in the self-rotation function, you can expect orientations to be related by this known NCS. Methods to automatically use such information will be implemented in a future version of Phaser. In the meantime, you can work out for yourself the orientations that would be consistent with NCS and use the ROTAte&nbsp;AROUnd option to sample similar orientations. Alternatively, you may have an oligomeric model and expect similar NCS in the crystal. First search with the oligomeric model; if this fails, search with a monomer. If that succeeds, you can again use the ROTAte&nbsp;AROUnd option to force a subsequent monomer to adopt an orientation similar to the one you expect.<br />
*'''What <u>not</u> to do'''<br />
*:The automated mode of Phaser is fast when Phaser finds a high Z-score solution to your problem. When Phaser cannot find a solution with a significant Z-score, it "thrashes", meaning it maintains a list of 100-1000's of low Z-score potential solutions and tries to improve them. This can lead to exceptionally long Phaser runs (over a week of CPU time). Such runs are possible because the highly automated script allows many consecutive MR jobs to be run without you having to manually set 100-1000's of jobs running and keep track of the results. "Thrashing" generally does not produce a solution: solutions generally appear relatively quickly or not at all. It is more useful to go back and analyse your models and your data to see where improvements can be made. Your system manager will appreciate you terminating these jobs.<br />
*:It is also not a good idea to effectively remove the packing test. Unless there is specific evidence in the logfile that a high TF-function Z-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond a few (e.g. 1-5) will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
*'''Other suggestions'''<br />
*:Phaser has powerful input, output and scripting facilities that allow a large number of possibilities for altering default behaviour and forcing Phaser to do what you think it should. However, you will need to read the information in the manual below to take advantage of these facilities!<br />
<br />
==How to Define Data==<br />
You need to tell Phaser the name of the mtz file containing your data and the columns in the mtz file to be used using the HKLIn and LABIn keywords. Additional keywords (BINS CELL OUTLier RESOlution SPACegroup) define how the data are used.<br />
<br />
==How to Define Models==<br />
Phaser must be given the models that it will use for molecular replacement. A model in Phaser is referred to as an "ensemble", even when it is described by a single file. This is because it is possible to provide a set of aligned structures as an ensemble, from which a statistically-weighted averaged model is calculated. A molecular replacement model is provided either as one or more aligned pdb files, or as an electron density map, entered as structure factors in an mtz file. Each ensemble is treated as a separate type of rigid body to be placed in the molecular replacement solution. An ensemble should only be defined once, even if there are several copies of the molecule in the asymmetric unit.<br />
<br />
Fundamental to the way in which Phaser uses MR models (either from coordinates or maps) is to estimate how the accuracy of the model falls off as a function of resolution, represented by the Sigma(A) curve. To generate the Sigma(A) curve, Phaser needs to know the RMS coordinate error expected for the model and the fraction of the scattering power in the asymmetric unit that this model contributes.<br />
<br />
A Babinet-style correction is used to account for the effects of disordered solvent on the completeness of the model at low resolution.<br />
<br />
Molecular replacement models are defined with the ENSEmble keyword and the COMPosition keyword. The ENSEmble keyword gives (amongst other things) the RMS deviation for the Sigma(A) curve. The COMPosition keyword is used to deduce the fraction of the scattering power in the asymmetric unit that each ensemble contributes. The composition of the asymmetric unit is defined either by entering the molecular weights or sequences of the components in the asymmetric unit, and giving the number of copies of each. Expert users can also enter the fraction of the scattering of each component directly, although the composition must still be entered for the absolute scale calculation. Please note that the composition supplied to Phaser has to include everything in the asymmetric unit, not just what is being looked for in the current search!<br />
<br />
===Building an Ensemble from Coordinates===<br />
The RMS deviation is determined directly from RMS or indirectly from IDENtity in the ENSEmble<br />
keyword using a formula that depends on the sequence identity and the number of residues in the model.<br />
<br />
The RMS deviation estimated from ID may be an underestimate of the true value if there is a slight conformational change between the model and target structures. To find a solution in these cases it may be necessary to increase the RMS from the default value generated from the ID, by say 0.5 Ångstroms. On the other hand, when Phaser succeeds in solving a structure from a model with sequence identity much below 30%, it is often found that the fold is preserved better than the average for that level of sequence identity. So it may be worth submitting a run in which the RMS error is set at, say, 1.5, even if the sequence identity is low. The table below can be used as a guide as to the default RMS value corresponding to ID.<br />
<br />
If you construct a model by homology modelling, remember that the RMS error you expect is essentially the error you expect from the template structure (if not worse!). So specify the sequence identity of the template, not of the homology model.<br />
<br />
Only the model with the highest sequence identity is reported in the output pdb file. Also, HETATM cards in the input pdb file are ignored in the calculation of the structure factors for the ensemble, but are carried through to the output pdb file. Thus, the phases on the output mtz file (which come from the structure factors of the ensemble) do not correspond to those that would be calculated from the output pdb file, when there is more than one pdb file in an ensemble and/or the pdbfile(s) have HETATM records.<br />
<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|+ '''Initial estimate of RMS deviation: Number of residues in model (upper row) versus sequence identity (left column)'''<br />
|-<br />
! !! #50 !! #100 !! #200 !! #300 !! #400 !! #600 !! #850 !! #1000 !! #1500 !! #2000<br />
|-<br />
|'''ID=0%''' || 1.579 || 1.689 || 1.875 || 2.030 || 2.164 || 2.391 || 2.625 || 2.748 || 3.093 || 3.375<br />
|-<br />
|'''ID=10%''' || 1.356 || 1.451 || 1.610 || 1.743 || 1.858 || 2.053 || 2.255 || 2.360 || 2.657 || 2.899<br />
|-<br />
|'''ID=20%''' || 1.165 || 1.246 || 1.383 || 1.497 || 1.596 || 1.764 || 1.936 || 2.027 || 2.281 || 2.489<br />
|-<br />
|'''ID=30%''' || 1.000 || 1.070 || 1.188 || 1.286 || 1.371 || 1.515 || 1.663 || 1.741 || 1.959 || 2.138<br />
|-<br />
|'''ID=40%''' || 0.859 || 0.919 || 1.020 || 1.104 || 1.177 || 1.301 || 1.428 || 1.495 || 1.683 || 1.836<br />
|-<br />
|'''ID=50%''' || 0.738 || 0.789 || 0.876 || 0.948 || 1.011 || 1.117 || 1.227 || 1.284 || 1.445 || 1.577<br />
|-<br />
|'''ID=60%''' || 0.634 || 0.678 || 0.752 || 0.814 || 0.868 || 0.959 || 1.053 || 1.103 || 1.241 || 1.354<br />
|-<br />
|'''ID=70%''' || 0.544 || 0.582 || 0.646 || 0.699 || 0.746 || 0.824 || 0.905 || 0.947 || 1.066 || 1.163<br />
|-<br />
|'''ID=80%''' || 0.467 || 0.500 || 0.555 || 0.601 || 0.640 || 0.708 || 0.777 || 0.813 || 0.915 || 0.999<br />
|-<br />
|'''ID=90%''' || 0.401 || 0.429 || 0.477 || 0.516 || 0.550 || 0.608 || 0.667 || 0.698 || 0.786 || 0.858<br />
|-<br />
|'''ID=100%''' || 0.345 || 0.369 || 0.409 || 0.443 || 0.472 || 0.522 || 0.573 || 0.600 || 0.675 || 0.737<br />
|-<br />
|}<br />
<br />
<br />
====Coordinate Editing====<br />
=====HETATM/LIGANDS=====<br />
Phaser ignores the scattering from HETATM records. The HETATM records are carried though to output with occupancy set to zero. Ligands will therefore not contribute to the scattering used for molecular replacement. The exceptions to this rule are the HETATM records for MSE (seleno-methionine) MSO (seleno-methionine selenoxide) CSE (seleno-cysteine) CSO (seleno-cysteine selenoxide) ALY (acetyllysine) MLY (n-dimethyl-lysine) and MLZ (n-methyl-lysine) which are used in the scattering and carried through to output with their original occupancy. If you wish to include any HETATM records in the scattering the record name use the keyword ENSE modlid HETATOM ON<br />
<br />
=====WATER=====<br />
Water molecules (identified by the residue name OW WAT HOH H2O OH2 MOH WTR or TIP) are deleted from the pdb file on input, are not used in the scattering and are not carried through to file output. If you want to retain water molecules you will need to change the residue name to something other than this (e.g. WWW) so that the atoms are not identified as water. To include the water molecules in the scattering, the HETATM records will also have to be changed to ATOM records as described above.<br />
<br />
===Building an Ensemble from Electron Density===<br />
When using density as a model, it is necessary to specify both the extent (x,y,z limits) of the cut-out region of density, and the centre of this region. With coordinates, Phaser can work this out by itself. This information is needed, for instance, to decide how large rotational steps can be in the rotation search and to carry out the molecular transform interpolation correctly. In the case of electron density, the RMS value does not have the same physical meaning that it has when the model is specified by atomic coordinates, but it is used to judge how the accuracy of the calculated structure factors drops off with resolution. A suitable value for RMS can be obtained, in the case of density from an experimentally-phased map, by choosing a value that makes the SigmaA curve fall off with resolution similarly to the mean figures-of-merit. In the case of density from an EM image reconstruction, the RMS value should make the SigmaA curve fall off similarly to a Fourier correlation curve used to judge the resolution of the EM image.<br />
<br />
For detailed information, including a tutorial with example scripts, see<br />
[[Using Electron Density as a Model| Using density as a model]]<br />
<br />
==How to Define Composition==<br />
The composition defines the total amount of protein and nucleic acid that you have in the asymmetric unit not the fraction of the asymmetric unit that you are searching for.<br />
<br />
===Default Composition===<br />
For convenience, the composition defaults to 50% protein scattering by volume (the average for protein crystals). It is better to enter it explicitly, even if only to check that you have correctly deduced the probable content of your crystal. If your crystal has higher or lower solvent content than this, or contains nucleic acid, then the composition should be entered explicitly.<br />
===Composition by Solvent Content===<br />
Scattering is determined from the solvent content of the crystal, assuming that the crystal contains protein only, and the average distribution of amino acids in protein. If your crystal contains nucleic acid or your protein has an unusual amino acid distribution then the composition should be entered explicitly using the MW or sequence options.<br />
===Composition by Number of Residues in ASU===<br />
Scattering is determined from the number of residues in the asymmetric unit, assuming that the crystal contains protein only or nucleic acid only, and assuming an average distribution of residues for either. If your crystal contains a mixture then the composition should be entered explicitly using the MW or sequence options. If your crystal has an unusual residue distribution then the composition should be entered explicitly using the sequence options.<br />
===Composition by Molecular Weight===<br />
The composition is calculated from the molecular weight of the protein and nucleic acid assuming the protein and nucleic acid have the average distribution of amino acids and bases. If your protein or nucleic acid has an unusual amino acid or base distribution the composition should be entered by sequence. You can mix compositions entered by molecular weight with those entered by sequence.<br />
===Composition by Sequence===<br />
The composition is calculated from the amino acid sequence of the protein and the base sequence of the nucleic acid in fasta format. You can mix compositions entered by molecular weight with those entered by sequence. Individual atoms can be added to the composition with the COMPOSITION ATOM keyword. This allows the explicit addition of heavy atoms in the structure e.g. Fe atoms.<br />
===Composition by Percentage Scattering===<br />
The fraction scattering of each ensemble can be entered directly. The fraction scattering of each ensemble is normally automatically worked out from the average scattering from each ensemble (calculated from the pdb files if entered as coordinates, or from the protein and nucleic acid molecular weights if entered as a map) divided by the total scattering given by the composition, but entering the fraction scattering directly overrides this calculation. This option is for use when the pdb files of the models in the ensemble are unusual e.g. consist only of C-alpha atoms, or only of hydrogen atoms (as in the CLOUDS method for NMR).<br />
<br />
==How to Define Solutions==<br />
Phaser writes out files ending in ".sol" and ".rlist" that contain the solution information from the job. The root of the files is given by the ROOT keyword. By default, the root filename is PHASER. These files can be read back into subsequent runs of Phaser to build up solutions containing more than one molecule in the asymmetric unit.<br />
<br />
"PHASER.sol" files are generated by all modes (rotation function modes with VERBOSE output), and contain the current idea of potential molecular replacement solutions.<br />
<br />
"PHASER.rlist" files are generated by the rotation function modes, and are used as input for performing translation functions.<br />
<br />
For simple MR cases you don't really need to know how to define molecular replacement solutions. However, for difficult cases you might need to edit the files "PHASER.sol" and "PHASER.rlist" files manually<br />
<br />
=== "sol" Files===<br />
SOLUtion 6DIM keywords describe Ensembles that have been oriented by a rotation search and positioned by a translation search. Each Ensemble in the asymmetric unit has its own SOLUtion keyword. When more than one (potential) molecular replacement solution is present, the solutions are separated with the SOLUTION SET keywords.<br />
<br />
==="rlist" Files===<br />
These files define a rotation function list. The peak list is given with a series of SOLUtion TRIAl keywords.<br />
<br />
If a partial solution is already known, then the information for the currently "known" parts of the asymmetric unit is given in the form used for the PHASER.sol file, followed by the list of trial orientations for which a translation function is to be performed.<br />
<br />
===Fixed partial structure===<br />
If you have the coordinates of a partial solution with the pdb coordinates of the known structure in the correct orientation and position, then you can force Phaser to use these coordinates. Use the SOLUTION keyword to fix a rotation of 0 0 0 and a position of 0 0 0 for these coordinates.<br />
<br />
==How to Select Peaks==<br />
<br />
<br />
<br />
The selection of peaks saved for output in the rotation and translation functions can be done in four different ways.<br />
*'''Select by Percentage'''<br />
*: Percentage of the top peak, where the value of the top peak is defined as 100% and the value of the mean is defined as 0%.<br />
*: Default, cutoff=75%. This criteria has the advantange that at least one peak (the top peak) always survives the selection. If the top solution is clear, then only the one solution will be output, but if the distribution of peaks is rather flat, then many peaks will be output for testing in the next part of the MR procedure (e.g. many peaks selected from the rotation function for testing with a translation function). <br />
*'''Select by Z-score'''<br />
*: Number of standard deviations (sigmas) over the mean (the Z-score). <br />
*: Absolute significance test. Not all searches will produce output if the cutoff value is too high (e.g. 5 sigma). <br />
*'''Select by Number'''<br />
*: Number of top peaks to select. <br />
*: If the distribution is very flat then it might be better to select a fixed large number (e.g. 1000) of top rotation peaks for testing in the translation function.<br />
*'''No selection'''<br />
*: All peaks are selected. <br />
*: Enables full 6 dimensional searches, where all the solutions from the rotation function are output for testing in the translation function. This should never be necessary; it would be much faster and probably just as likely to work if the top 1000 peaks were used in this way.<br />
<br />
[[Image:Phaser_selection.gif| Selection criteria]]<br />
<br />
Peaks can also be clustered or not clustered prior to selection in steps 1 and 2.<br />
*'''Clustering Off'''<br />
: All high peaks on the search grid are selected<br />
*'''Clustering On'''<br />
: Points on the search grid with higher neighbouring points are removed from the selection<br />
<br />
<br />
[[Image:Phaser_clustering.gif| Clustering]]<br />
<br />
==How to Control Output==<br />
The output of Phaser can be controlled with optional keywords. <br />
<br />
The ROOT keyword is not compulsory (the default root filename is "PHASER"), but should always be given, so that your jobs have separate and meaningful output filenames.<br />
<br />
The TOPFiles keyword controls the number of potential MR solutions for which PDB and (in the appropriate modes) MTZ files are produced.<br />
<br />
For the MR_AUTO, MR_RNP and MR_LLG modes, unless HKLOut OFF is given as an optional keyword, Phaser produces an MTZ file with "SigmaA" type weighted Fourier map coefficients for producing electron density maps for rebuilding.<br />
<br />
{| class="wikitable" style="text-align:left" width=100%<br />
|-<br />
! MTZ Column Labels !! Description<br />
|-<br />
| FWT/PHWT || Amplitude and phase for 2''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| DELFWT/PHDELWT || Amplitude and phase for ''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| FOM || ''m'', analogous to the "Sim" weight, to estimate the reliability of &alpha;<sub>calc</sub><br />
|-<br />
| HLA/HLB/HLC/HLD || Hendrickson-Lattman coefficients encoding the phase probability distribution<br />
|}<br />
<br />
==Translational Non-crystallographic Symmetry==<br />
<br />
<span style="color:crimson">'''*Warning*''' Solution by MR in the presence of translational non-crystallographic symmetry is not fully automated.</span><br />
<br />
Phaser calculates correction factors for the expected intensities in the presence of translational non-crystallographic symmetry (tNCS), and is able to solve structures with complex patterns of tNCS. '''However, the use of Phaser in the presence of tNCS requires the nature of the tNCS to be understood by the user.''' In simple cases, solution is no more difficult than solution without tNCS, but in complex cases, separate Phaser runs with tNCS turned on and off, and/or the use of different tNCS vectors, may be necessary.<br />
<br />
The output of Phaser will help the user in detecting and understanding the tNCS, but '''the tNCS is not completely characterised by Phaser'''. The default behaviour may or may not be correct for the particular crystal under study.<br />
<br />
Characterization of the tNCS involves understanding the number of copies of the molecule in the asymmetric unit and the translation vectors between them. Molecules related by a tNCS vector will have an associated peak in the native Patterson. Phaser calculates the native Patterson (MODE TNCS) and lists the peaks that are more than 20% of the origin peak. Any given crystal with tNCS may have one or more peaks meeting this criteria.<br />
<br />
===Default tNCS detection and correction===<br />
<span style="color:crimson">Documentation for Phaser-2.7.16 and above</span><br />
<br />
====No tNCS====<br />
No tNCS correction is applied by default if there is<br />
# no peak in the native Patterson <br />
# more than one peak in the native Patterson over 20% of the origin and these peaks are not all the result of a commensurate modulation<br />
<br />
====Pairs of molecules====<br />
By default, if Phaser detects a peak in the native Patterson then Phaser will search for molecules in pairs related by the tNCS vector given by the peak in the native Patterson.<br />
<br />
This will be the correct behaviour if and only if there are an even number of copies of the molecule in the asymmetric unit, clustered into two groups related by a single tNCS vector. There will only be one significant peak in the native Patterson. Fortunately, this is a reasonably common scenario.<br />
<br />
Phaser refines the relative orientation of the molecules in the two groups (rotations of up to 10 degrees will still give rise to a significant native Patterson peak) and uses this information to generate expected intensity factors for the reflections. Solution should be straightforward, with the usual caveat for MR that there is a sufficiently good model.<br />
<br />
Where there is a single peak in the native Patterson, it is often located at a position half way along a unit cell axis or diagonal, representing a pseudo-halving of the unit cell dimensions. However, Phaser is by no means restricted to these sorts of pseudo-cells in its handling of two-fold tNCS, and the tNCS vector can be in a general position.<br />
<br />
===Non-default tNCS correction===<br />
====Higher order tNCS====<br />
Frequently, tNCS does not associate 2 clusters of molecules in the asymmetric unit, but rather there are 3 or more (n) clusters of molecules associated by a series of vectors that are multiples of 1, 2, 3 ... (n-1) times a basic translation vector. Where n times the basic translation vector equates to (very close to) integer multiples of unit cell axes, the tNCS represents a pseudo-cell, and this case is known as commensurate modulation. <br />
<br />
Phaser attempts to automatically detect commensurate modulation. The peaks of the native Patterson are analyzed to find the n-fold relationship. The series will not generally have all peaks the same height. Lower peaks in the series represent relationships where the relative rotations between related molecules are larger. Missing peaks in the series may be below the default 20% of origin cut-off. This can be lowered with TNCS PATT PERCENT <x><br />
<br />
Phaser then sets TNCS NMOL <n> and the vector for the tNCS, and searches for ensembles in multiples of NMOL.<br />
<br />
When there are more than two molecules related by tNCS, Phaser does not refine the orientations between the molecules related by the tNCS.<br />
<br />
However, as for two-fold tNCS, Phaser is not restricted to these sorts of pseudo-cells and the basic tNCS vector can be in a general position, as can the number of copies.<br />
<br />
'''The automatic detection may not give the true tNCS relationship'''. For example, the true commensurate modulation may be a factor of the NMOL automatically detected by Phaser, or there may not be commensurate modulation at all, or commensurate modulation may not be found with the default Pattesron peak height cutoff. In difficult cases, please inspect the Patterson for peaks.<br />
<br />
====Complex tNCS====<br />
If there are many molecules in the asymmetric unit but they are not all related by tNCS, or there are sub-groups of molecules related by different tNCS vectors, then the modulations of the expected intensities due to the tNCS will be much less significant than the cases described above. '''In these cases it is possible that structure solution will be achieved without any tNCS correction factors being applied.''' Indeed, searching for all the copies as tNCS-related multiples when some molecules are not related by tNCS will cause structure solution to fail. To turn off the automatic detection and use of tNCS use the keyword TNCS USE OFF.<br />
<br />
If turning off the TNCS correction factors fails to give a solution, then a good approach is to proceed step-wise. Consider the highest native Patterson peak first and determine that nature of the tNCS associated with it. Use the appropriate correction factors to locate all the molecules with this tNCS. Then take the second independent native Patterson peak and apply the correction factors associated with it to find the second set of molecules, fixing the first, etc. Finally, turn TNCS off to find any orphan molecules.</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Molecular_Replacement&diff=2430Molecular Replacement2018-02-07T17:37:20Z<p>Rdo20: Reverted edits by Rdo20 (talk) to last revision by Airlie</p>
<hr />
<div><div style="margin-left: 25px; float: right;">__TOC__</div><br />
<br />
'''Quicklink to example scripts''' → [[MR using keyword input]]<br />
<br />
'''Quicklink to phaser.famos (find_alt_orig_sym_mate) documentation''' → [[Famos]]<br />
<br />
Phaser should be able to solve most structures with the Automated Molecular Replacement mode, and this is the first mode that you should try. Give Phaser your data ([[#How to Define Data|How to Define Data]]) and your models ([[#How to Define Models|How to Define Models]]), tell Phaser what to search for, and a list of possible spacegroups (in the same point group).<br />
<br />
If this doesn't work (see [[#Has Phaser Solved It?| Has Phaser Solved It?]]), you can try selecting peaks of lower significance in the rotation function in case the real orientation was not within the selection criteria. By default peaks above 75% of the top peak are selected (see [[#How to Select Peaks| How to Select Peaks]]). See [[#What to do in Difficult Cases| What to do in Difficult Cases]] for more hints and tips. If the automated molecular replacement mode doesn't work even with non-default input you need to run the modes of Phaser separately. The possibilities are endless - you can even try exhaustive searches (translations of all orientations) if you want - but experience has shown that most structures that can be solved by Phaser can be solved by relatively simple strategies.<br />
<br />
==Automated Molecular Replacement==<br />
Automated Molecular Replacement combines the anisotropy correction, likelihood enhanced fast rotation function, likelihood enhanced fast translation function, packing and refinement modes for multiple search models and a set of possible spacegroups to automatically solve a structure by molecular replacement. Top solutions are output to the files FILEROOT.sol, FILEROOT.#.mtz and FILEROOT.#.pdb (where "#" refers to the sorted solution number, 1 being the best, and only 1 is output by default). Many structures can be solved by running an automated molecular replacement search with defaults, giving the ensembles that you expect to be easiest to find first.<br />
<br />
At the completion of Molecular Replacement you may wish to place your solutions on a common origin with a previous solution, for which [[Famos | Famos ]] can be used.<br />
<br />
[[Image:Phaser_MR_auto.gif|Flow Diagram for Automated MR]]<br />
<br />
==Should Phaser Solve It?==<br />
The difficulty of a molecular replacement problem depends primarily on two major factors: how well the model will be able to explain the diffraction data (which depends both on the accuracy of the model and on its completeness), and how many reflections can be explained, at least in part. Each reflection provides a piece of information that helps to identify correct MR solutions.<br />
<br />
It is possible to make a reasonable prediction of whether or not a solution will be found. If the quality of the model (its accuracy and completeness) can be estimated, then the expected contribution of each reflection to the total LLG can also be estimated. From a large battery of tests, we know that an LLG of 40 or greater usually indicates a correct solution (at least in the absence of complicating factors such as translational non-crystallographic symmetry, tNCS). Building on this understanding, if it is estimated that the LLG will be 60 or less, then Phaser will assume that the problem is a difficult one, and will implement search procedures optimised for difficult problems.<br />
<br />
==What Resolution of Data Should be Used?==<br />
The signal for a molecular replacement solution should be very clear if the expected value of the LLG is much higher than the minimum required to be fairly certain of a solution. Currently Phaser aims for a minimum LLG of 120 and, if it is possible to achieve an even higher value, given the quality of the model and the quantity of diffraction data, then the resolution for the initial search is limited to the value required to achieve an expected LLG of 120. Data to the full resolution are still used for a final rigid-body refinement, or in a second pass if a clear solution is not found in the first attempt.<br />
<br />
However, if the model is expected to have a large RMS error (based usually on the correlation between sequence identity and RMS error), then data to high resolution will not contribute any significant signal. Regardless of the expected LLG at the highest resolution limit, the resolution used is limited to 1.8 times the estimated RMS error of the model, because this resolution limit gives about 99% of the LLG that could be achieved.<br />
<br />
Because Phaser implements strategies designed to solve structures with as much confidence as possible, as efficiently as possible, it is best to leave the choice of resolution to Phaser, at least in the first instance.<br />
<br />
==Has Phaser Solved It?==<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! TF Z-score !! Have I solved it?<br />
|-<br />
| less than 5 || no<br />
|-<br />
| 5 - 6 || unlikely<br />
|-<br />
| 6 - 7 || possibly<br />
|-<br />
| 7 - 8 || probably<br />
|-<br />
| more than 8* ||definitely<br />
|-<br />
|colspan="2" style="text-align: center;" | *''6 for 1st model in monoclinic space groups''<br />
|} <br />
<br />
Ideally, a unique solution with a strong signal will be found at the end of the search. If you are searching for multiple components, then ideally the search for each component will also give a strong signal. However if the signal-to-noise of your search is low, there will be noise peaks and multiple ambiguous solutions. Signal-to-noise is judged using the '''Z-score''', which is computed by comparing the LLG values from the rotation or translation search with LLG values for a set of random rotations or translations. The mean and the RMS deviation from the mean are computed from the random set, then the Z-score for a search peak is defined as its LLG minus the mean, all divided by the RMS deviation, ''i.e. '' '''the number of standard deviations above (or below) the mean. '''<br />
<br />
For a rotation function, the correct orientation may be well down the list with a Z-score (number of standard deviations above the mean value, or RFZ) under 4, and it is often not possible to identify the correct orientation until a translation function is performed and yields a clear solution. Note that the signal-to-noise of the rotation function drops with increasing number of primitive symmetry operations (the number of different orientations for symmetry-related molecules), because there is more uncertainty about how the structure factor contributions from symmetry-related copies will add up.<br />
<br />
For a translation function the correct solution will generally have a Z-score (TFZ) over 5 and be well separated from the rest of the solutions. Of course, there will always be exceptions! The table gives a very rough guide to interpreting TFZ scores. This table will be updated, as we learn more from systematic molecular replacement trials.<br />
<br />
When you are searching for multiple components, the signal may be low for the first few components but, as the model becomes more complete, the signal should become stronger. Finding a clear solution for a new component is a good sign that the partial solution to which that component was added was indeed correct.<br />
<br />
You should always at least glance through the summary of the logfile. One thing to look for, in particular, is whether any translation solutions with a high Z-score have been rejected by the packing step. By default up to 5 percent of marker atoms (C-alpha atoms for protein) are allowed to be involved in clashes. A solution with more clashes may still be correct, and the clashes may arise only because of differences in small surface loops. If this happens, repeat the run allowing a suitable number of clashes. Note that, unless there is specific evidence in the logfile that a high TFZ-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond the default will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
<br />
Note that, by default, Phaser will produce a single PDB file corresponding to the top solution found (if any), so finding a single PDB file in your output directory is not an indication that the search succeeded! You have to look, at least, at the summary of the logfile, or at the list of possible solutions in the .sol file that is produced if you run Phaser from ccp4i or command-line scripts.<br />
<br />
==Annotation==<br />
<br />
A highly compact summary of the history of the statistics of a solution is given in the SOLUTION SET in the .sol file. This is a good place to start your analysis of the output. The annotation gives the Z-score of the solution at each rotation and translation function, the number of clashes in the packing, and the refined LLG.<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! Annotation !! Meaning<br />
|-<br />
| RFZ= || Rotation Function Z-score<br />
|-<br />
| TFZ= || Translation Function Z-score<br />
|-<br />
| PAK= || Number of packing clashes<br />
|-<br />
| LLG= || LLG after refinement. Will be repeated when a low resolution refinement is followed by a high resolution refinement.<br />
|-<br />
| TFZ== || Translation Function Z-score equivalent, only calculated for the top solution after refinement (or for the number of top files specified by TOPFILES)<br />
|-<br />
| RF++ || Rotation angle from previous strong solution has been used in the addition of next solution<br />
|-<br />
| RF*0 || Rotation angle 000 identified by low R-factor of input model<br />
|-<br />
| TFZ=* || First molecule in P1 (arbitrary origin, no Translation Function required)<br />
|-<br />
| TF*0 || Translation vector 000 identified by low R-factor of input model<br />
|-<br />
| (&&nbsp;... & ...) || Set of TFZ PAK and LLG values for placements that were amalgamated (more than one placement from a single Translation Function)<br />
|-<br />
| LLG+=(...&nbsp;&&nbsp;...)&nbsp;|| Set of LLG values calculated during amalgamation, which will always be increasing in value<br />
|-<br />
| +TNCS || Components added by Translational NCS relation<br />
|-<br />
| *T=<i>n</i> || Solution matches template solution <i>n</i><br />
|} <br />
<br />
Two versions of TFZ (the translation function Z-score) now appear for each component. The first ("TFZ=") is the Z-score from the actual translation search, which depends on the accuracy of the orientation used for that search. The second ("TFZ==") is the TFZ-equivalent, which indicates what the TFZ score would have been with the correct (refined) orientation. You should see the TFZ-equivalent is high at least for the final components of the solution, and that the LLG (log-likelihood gain) increases as each component of the solution is added. For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU SET RFZ=10.7 TFZ=24.3 PAK=0 LLG=472 TFZ==24.7 RFZ=6.4 TFZ=24.4 PAK=0 LLG=1006 TFZ==29.7 LLG=1006 TFZ==29.7<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
Note that the Euler angles in Phaser follow the same convention as those defined for the Crowther fast rotation function, i.e. z-y-z (rotate around the z-axis, followed by the new y-axis, followed by the new z-axis).<br />
<br />
==History==<br />
<br />
A highly compact summary of the history of the peak positions of a solution is given in the SOLUTION HISTORY in the .sol file. Together with the SOLUTION SET annotation, this is useful in your analysis of the output. <br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! History !! Meaning<br />
|-<br />
| RF/TF(r/t:n) || (r) Rotation Function peak number/(t) Translation Function peak number for the rotation function : (n) number of peak in final merged and sorted list<br />
|-<br />
| PAK(n:m) || (n) input solution number : (m) output solution number after packing condition applied<br />
|-<br />
| RNP(m,a,b,c,... : p) || All input peaks amalgamated after refinement to give output solution number (m and others): (p) output solution number<br />
|-<br />
| FUSE(A,B,C) || Solution numbers merged in amalgamation<br />
|} <br />
<br />
For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU HISTORY RF/TF(1/1:1)PAK(1:1)RNP(1:1)RNP(1:1)<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
A more complicated structure solution may have<br />
<br />
SOLU HISTORY RF/TF(7/1:10)PAK(10:10)RNP(10,12,13,11,17,16,18,25,3,8,22,21,20,7,969,6,5,201,9,4,390,2,1,19:1)RNP(1:1)<br />
<br />
==What to do in Difficult Cases==<br />
<br />
Not every structure can be solved by molecular replacement, but the right strategy can push the limits. What to do when the default jobs fail depends on why your structure is difficult.<br />
*'''Flexible Structure'''<br />
*:The relative orientations of the domains may be different in your crystal than in the model. If that may be the case, break the model into separate PDB files containing rigid-body units, enter these as separate ensembles, and search for them separately. If you find a convincing solution for one domain, but fail to find a solution for the next domain, you can take advantage of the knowledge that its orientation is likely to be similar to that of the first domain. The ROTAte&nbsp;AROUnd option of the brute rotation search can be used to restrict the search to orientations within, say, 30 degrees of that of the known domain. Allow for close approach of the domains by increasing the allowed clashes with the PACK keyword by, say, 1 for each domain break that you introduce. Note that it is possible to use the brute rotation search as part of the automated molecular replacement pipeline, by changing the choice of the type of rotation search. Alternatively, you could try generating a series of models perturbed by normal modes, with the NMAPdb keyword. One of these may duplicate the hinge motion and provide a good single model.<br />
*'''Poor or Incomplete Model'''<br />
*:Signal-to-noise is reduced by coordinate errors or incompleteness of the model. Since the rotation search has lower signal to begin with than the translation search, it is usually more severely affected. For this reason, it can be very useful to use the subsequent translation search as a way to choose among many (say 1000) orientations. THe MR_AUTO FAST search mode automatically reduces the cutoff for accepting peaks from the fast rotation function if the decault pass does not find a solution with a high z-score, but you can manually reduce this further with the PEAKS and PURGE keywords. You can also try turning off the clustering of fast rotation function peaks because the correct orientation may sit on the shoulder of a peak in the rotation function. <br />
*:As shown convincingly by Schwarzenbacher ''et al.'' (Schwarzenbacher, Godzik, Grzechnik &amp; Jaroszewski, ''Acta Cryst.'' D'''60''', 1229-1236, 2004), judicious editing can make a significant difference in the quality of a distant model. In a number of tests with their data on models below 30% sequence identity, we have found that Phaser works best with a "mixed model" (non-identical sidechains longer than Ser replaced by Ser). In agreement with their results, the best models are generally derived using more sophisticated alignment protocols, such as their FFAS protocol. Use [http://www.phenix-online.org/documentation/sculptor.htm phenix.sculptor] to edit your model.<br />
*'''High Degree of Non-crystallographic Symmetry'''<br />
*:If there are clear peaks in the self-rotation function, you can expect orientations to be related by this known NCS. Methods to automatically use such information will be implemented in a future version of Phaser. In the meantime, you can work out for yourself the orientations that would be consistent with NCS and use the ROTAte&nbsp;AROUnd option to sample similar orientations. Alternatively, you may have an oligomeric model and expect similar NCS in the crystal. First search with the oligomeric model; if this fails, search with a monomer. If that succeeds, you can again use the ROTAte&nbsp;AROUnd option to force a subsequent monomer to adopt an orientation similar to the one you expect.<br />
*'''What <u>not</u> to do'''<br />
*:The automated mode of Phaser is fast when Phaser finds a high Z-score solution to your problem. When Phaser cannot find a solution with a significant Z-score, it "thrashes", meaning it maintains a list of 100-1000's of low Z-score potential solutions and tries to improve them. This can lead to exceptionally long Phaser runs (over a week of CPU time). Such runs are possible because the highly automated script allows many consecutive MR jobs to be run without you having to manually set 100-1000's of jobs running and keep track of the results. "Thrashing" generally does not produce a solution: solutions generally appear relatively quickly or not at all. It is more useful to go back and analyse your models and your data to see where improvements can be made. Your system manager will appreciate you terminating these jobs.<br />
*:It is also not a good idea to effectively remove the packing test. Unless there is specific evidence in the logfile that a high TF-function Z-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond a few (e.g. 1-5) will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
*'''Other suggestions'''<br />
*:Phaser has powerful input, output and scripting facilities that allow a large number of possibilities for altering default behaviour and forcing Phaser to do what you think it should. However, you will need to read the information in the manual below to take advantage of these facilities!<br />
<br />
==How to Define Data==<br />
You need to tell Phaser the name of the mtz file containing your data and the columns in the mtz file to be used using the HKLIn and LABIn keywords. Additional keywords (BINS CELL OUTLier RESOlution SPACegroup) define how the data are used.<br />
<br />
==How to Define Models==<br />
Phaser must be given the models that it will use for molecular replacement. A model in Phaser is referred to as an "ensemble", even when it is described by a single file. This is because it is possible to provide a set of aligned structures as an ensemble, from which a statistically-weighted averaged model is calculated. A molecular replacement model is provided either as one or more aligned pdb files, or as an electron density map, entered as structure factors in an mtz file. Each ensemble is treated as a separate type of rigid body to be placed in the molecular replacement solution. An ensemble should only be defined once, even if there are several copies of the molecule in the asymmetric unit.<br />
<br />
Fundamental to the way in which Phaser uses MR models (either from coordinates or maps) is to estimate how the accuracy of the model falls off as a function of resolution, represented by the Sigma(A) curve. To generate the Sigma(A) curve, Phaser needs to know the RMS coordinate error expected for the model and the fraction of the scattering power in the asymmetric unit that this model contributes.<br />
<br />
A Babinet-style correction is used to account for the effects of disordered solvent on the completeness of the model at low resolution.<br />
<br />
Molecular replacement models are defined with the ENSEmble keyword and the COMPosition keyword. The ENSEmble keyword gives (amongst other things) the RMS deviation for the Sigma(A) curve. The COMPosition keyword is used to deduce the fraction of the scattering power in the asymmetric unit that each ensemble contributes. The composition of the asymmetric unit is defined either by entering the molecular weights or sequences of the components in the asymmetric unit, and giving the number of copies of each. Expert users can also enter the fraction of the scattering of each component directly, although the composition must still be entered for the absolute scale calculation. Please note that the composition supplied to Phaser has to include everything in the asymmetric unit, not just what is being looked for in the current search!<br />
<br />
===Building an Ensemble from Coordinates===<br />
The RMS deviation is determined directly from RMS or indirectly from IDENtity in the ENSEmble<br />
keyword using a formula that depends on the sequence identity and the number of residues in the model.<br />
<br />
The RMS deviation estimated from ID may be an underestimate of the true value if there is a slight conformational change between the model and target structures. To find a solution in these cases it may be necessary to increase the RMS from the default value generated from the ID, by say 0.5 Ångstroms. On the other hand, when Phaser succeeds in solving a structure from a model with sequence identity much below 30%, it is often found that the fold is preserved better than the average for that level of sequence identity. So it may be worth submitting a run in which the RMS error is set at, say, 1.5, even if the sequence identity is low. The table below can be used as a guide as to the default RMS value corresponding to ID.<br />
<br />
If you construct a model by homology modelling, remember that the RMS error you expect is essentially the error you expect from the template structure (if not worse!). So specify the sequence identity of the template, not of the homology model.<br />
<br />
Only the model with the highest sequence identity is reported in the output pdb file. Also, HETATM cards in the input pdb file are ignored in the calculation of the structure factors for the ensemble, but are carried through to the output pdb file. Thus, the phases on the output mtz file (which come from the structure factors of the ensemble) do not correspond to those that would be calculated from the output pdb file, when there is more than one pdb file in an ensemble and/or the pdbfile(s) have HETATM records.<br />
<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|colspan="11" style="text-align: center;" | Initial estimate of RMS deviation<br />
|-<br />
|colspan="11" style="text-align: center;" | Number of residues in model versus sequence identity<br />
|-<br />
! !! #50 !! #100 !! #200 !! #300 !! #400 !! #600 !! #850 !! #1000 !! #1500 !! #2000<br />
|-<br />
|'''ID=0%''' || 1.579 || 1.689 || 1.875 || 2.030 || 2.164 || 2.391 || 2.625 || 2.748 || 3.093 || 3.375<br />
|-<br />
|'''ID=10%''' || 1.356 || 1.451 || 1.610 || 1.743 || 1.858 || 2.053 || 2.255 || 2.360 || 2.657 || 2.899<br />
|-<br />
|'''ID=20%''' || 1.165 || 1.246 || 1.383 || 1.497 || 1.596 || 1.764 || 1.936 || 2.027 || 2.281 || 2.489<br />
|-<br />
|'''ID=30%''' || 1.000 || 1.070 || 1.188 || 1.286 || 1.371 || 1.515 || 1.663 || 1.741 || 1.959 || 2.138<br />
|-<br />
|'''ID=40%''' || 0.859 || 0.919 || 1.020 || 1.104 || 1.177 || 1.301 || 1.428 || 1.495 || 1.683 || 1.836<br />
|-<br />
|'''ID=50%''' || 0.738 || 0.789 || 0.876 || 0.948 || 1.011 || 1.117 || 1.227 || 1.284 || 1.445 || 1.577<br />
|-<br />
|'''ID=60%''' || 0.634 || 0.678 || 0.752 || 0.814 || 0.868 || 0.959 || 1.053 || 1.103 || 1.241 || 1.354<br />
|-<br />
|'''ID=70%''' || 0.544 || 0.582 || 0.646 || 0.699 || 0.746 || 0.824 || 0.905 || 0.947 || 1.066 || 1.163<br />
|-<br />
|'''ID=80%''' || 0.467 || 0.500 || 0.555 || 0.601 || 0.640 || 0.708 || 0.777 || 0.813 || 0.915 || 0.999<br />
|-<br />
|'''ID=90%''' || 0.401 || 0.429 || 0.477 || 0.516 || 0.550 || 0.608 || 0.667 || 0.698 || 0.786 || 0.858<br />
|-<br />
|'''ID=100%''' || 0.345 || 0.369 || 0.409 || 0.443 || 0.472 || 0.522 || 0.573 || 0.600 || 0.675 || 0.737<br />
|-<br />
|}<br />
<br />
<br />
====Coordinate Editing====<br />
=====HETATM/LIGANDS=====<br />
Phaser ignores the scattering from HETATM records. The HETATM records are carried though to output with occupancy set to zero. Ligands will therefore not contribute to the scattering used for molecular replacement. The exceptions to this rule are the HETATM records for MSE (seleno-methionine) MSO (seleno-methionine selenoxide) CSE (seleno-cysteine) CSO (seleno-cysteine selenoxide) ALY (acetyllysine) MLY (n-dimethyl-lysine) and MLZ (n-methyl-lysine) which are used in the scattering and carried through to output with their original occupancy. If you wish to include any HETATM records in the scattering the record name use the keyword ENSE modlid HETATOM ON<br />
<br />
=====WATER=====<br />
Water molecules (identified by the residue name OW WAT HOH H2O OH2 MOH WTR or TIP) are deleted from the pdb file on input, are not used in the scattering and are not carried through to file output. If you want to retain water molecules you will need to change the residue name to something other than this (e.g. WWW) so that the atoms are not identified as water. To include the water molecules in the scattering, the HETATM records will also have to be changed to ATOM records as described above.<br />
<br />
===Building an Ensemble from Electron Density===<br />
When using density as a model, it is necessary to specify both the extent (x,y,z limits) of the cut-out region of density, and the centre of this region. With coordinates, Phaser can work this out by itself. This information is needed, for instance, to decide how large rotational steps can be in the rotation search and to carry out the molecular transform interpolation correctly. In the case of electron density, the RMS value does not have the same physical meaning that it has when the model is specified by atomic coordinates, but it is used to judge how the accuracy of the calculated structure factors drops off with resolution. A suitable value for RMS can be obtained, in the case of density from an experimentally-phased map, by choosing a value that makes the SigmaA curve fall off with resolution similarly to the mean figures-of-merit. In the case of density from an EM image reconstruction, the RMS value should make the SigmaA curve fall off similarly to a Fourier correlation curve used to judge the resolution of the EM image.<br />
<br />
For detailed information, including a tutorial with example scripts, see<br />
[[Using Electron Density as a Model| Using density as a model]]<br />
<br />
==How to Define Composition==<br />
The composition defines the total amount of protein and nucleic acid that you have in the asymmetric unit not the fraction of the asymmetric unit that you are searching for.<br />
<br />
===Default Composition===<br />
For convenience, the composition defaults to 50% protein scattering by volume (the average for protein crystals). It is better to enter it explicitly, even if only to check that you have correctly deduced the probable content of your crystal. If your crystal has higher or lower solvent content than this, or contains nucleic acid, then the composition should be entered explicitly.<br />
===Composition by Solvent Content===<br />
Scattering is determined from the solvent content of the crystal, assuming that the crystal contains protein only, and the average distribution of amino acids in protein. If your crystal contains nucleic acid or your protein has an unusual amino acid distribution then the composition should be entered explicitly using the MW or sequence options.<br />
===Composition by Number of Residues in ASU===<br />
Scattering is determined from the number of residues in the asymmetric unit, assuming that the crystal contains protein only or nucleic acid only, and assuming an average distribution of residues for either. If your crystal contains a mixture then the composition should be entered explicitly using the MW or sequence options. If your crystal has an unusual residue distribution then the composition should be entered explicitly using the sequence options.<br />
===Composition by Molecular Weight===<br />
The composition is calculated from the molecular weight of the protein and nucleic acid assuming the protein and nucleic acid have the average distribution of amino acids and bases. If your protein or nucleic acid has an unusual amino acid or base distribution the composition should be entered by sequence. You can mix compositions entered by molecular weight with those entered by sequence.<br />
===Composition by Sequence===<br />
The composition is calculated from the amino acid sequence of the protein and the base sequence of the nucleic acid in fasta format. You can mix compositions entered by molecular weight with those entered by sequence. Individual atoms can be added to the composition with the COMPOSITION ATOM keyword. This allows the explicit addition of heavy atoms in the structure e.g. Fe atoms.<br />
===Composition by Percentage Scattering===<br />
The fraction scattering of each ensemble can be entered directly. The fraction scattering of each ensemble is normally automatically worked out from the average scattering from each ensemble (calculated from the pdb files if entered as coordinates, or from the protein and nucleic acid molecular weights if entered as a map) divided by the total scattering given by the composition, but entering the fraction scattering directly overrides this calculation. This option is for use when the pdb files of the models in the ensemble are unusual e.g. consist only of C-alpha atoms, or only of hydrogen atoms (as in the CLOUDS method for NMR).<br />
<br />
==How to Define Solutions==<br />
Phaser writes out files ending in ".sol" and ".rlist" that contain the solution information from the job. The root of the files is given by the ROOT keyword. By default, the root filename is PHASER. These files can be read back into subsequent runs of Phaser to build up solutions containing more than one molecule in the asymmetric unit.<br />
<br />
"PHASER.sol" files are generated by all modes (rotation function modes with VERBOSE output), and contain the current idea of potential molecular replacement solutions.<br />
<br />
"PHASER.rlist" files are generated by the rotation function modes, and are used as input for performing translation functions.<br />
<br />
For simple MR cases you don't really need to know how to define molecular replacement solutions. However, for difficult cases you might need to edit the files "PHASER.sol" and "PHASER.rlist" files manually<br />
<br />
=== "sol" Files===<br />
SOLUtion 6DIM keywords describe Ensembles that have been oriented by a rotation search and positioned by a translation search. Each Ensemble in the asymmetric unit has its own SOLUtion keyword. When more than one (potential) molecular replacement solution is present, the solutions are separated with the SOLUTION SET keywords.<br />
<br />
==="rlist" Files===<br />
These files define a rotation function list. The peak list is given with a series of SOLUtion TRIAl keywords.<br />
<br />
If a partial solution is already known, then the information for the currently "known" parts of the asymmetric unit is given in the form used for the PHASER.sol file, followed by the list of trial orientations for which a translation function is to be performed.<br />
<br />
===Fixed partial structure===<br />
If you have the coordinates of a partial solution with the pdb coordinates of the known structure in the correct orientation and position, then you can force Phaser to use these coordinates. Use the SOLUTION keyword to fix a rotation of 0 0 0 and a position of 0 0 0 for these coordinates.<br />
<br />
==How to Select Peaks==<br />
<br />
<br />
<br />
The selection of peaks saved for output in the rotation and translation functions can be done in four different ways.<br />
*'''Select by Percentage'''<br />
*: Percentage of the top peak, where the value of the top peak is defined as 100% and the value of the mean is defined as 0%.<br />
*: Default, cutoff=75%. This criteria has the advantange that at least one peak (the top peak) always survives the selection. If the top solution is clear, then only the one solution will be output, but if the distribution of peaks is rather flat, then many peaks will be output for testing in the next part of the MR procedure (e.g. many peaks selected from the rotation function for testing with a translation function). <br />
*'''Select by Z-score'''<br />
*: Number of standard deviations (sigmas) over the mean (the Z-score). <br />
*: Absolute significance test. Not all searches will produce output if the cutoff value is too high (e.g. 5 sigma). <br />
*'''Select by Number'''<br />
*: Number of top peaks to select. <br />
*: If the distribution is very flat then it might be better to select a fixed large number (e.g. 1000) of top rotation peaks for testing in the translation function.<br />
*'''No selection'''<br />
*: All peaks are selected. <br />
*: Enables full 6 dimensional searches, where all the solutions from the rotation function are output for testing in the translation function. This should never be necessary; it would be much faster and probably just as likely to work if the top 1000 peaks were used in this way.<br />
<br />
[[Image:Phaser_selection.gif| Selection criteria]]<br />
<br />
Peaks can also be clustered or not clustered prior to selection in steps 1 and 2.<br />
*'''Clustering Off'''<br />
: All high peaks on the search grid are selected<br />
*'''Clustering On'''<br />
: Points on the search grid with higher neighbouring points are removed from the selection<br />
<br />
<br />
[[Image:Phaser_clustering.gif| Clustering]]<br />
<br />
==How to Control Output==<br />
The output of Phaser can be controlled with optional keywords. <br />
<br />
The ROOT keyword is not compulsory (the default root filename is "PHASER"), but should always be given, so that your jobs have separate and meaningful output filenames.<br />
<br />
The TOPFiles keyword controls the number of potential MR solutions for which PDB and (in the appropriate modes) MTZ files are produced.<br />
<br />
For the MR_AUTO, MR_RNP and MR_LLG modes, unless HKLOut OFF is given as an optional keyword, Phaser produces an MTZ file with "SigmaA" type weighted Fourier map coefficients for producing electron density maps for rebuilding.<br />
<br />
{| class="wikitable" style="text-align:left" width=100%<br />
|-<br />
! MTZ Column Labels !! Description<br />
|-<br />
| FWT/PHWT || Amplitude and phase for 2''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| DELFWT/PHDELWT || Amplitude and phase for ''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| FOM || ''m'', analogous to the "Sim" weight, to estimate the reliability of &alpha;<sub>calc</sub><br />
|-<br />
| HLA/HLB/HLC/HLD || Hendrickson-Lattman coefficients encoding the phase probability distribution<br />
|}<br />
<br />
==Translational Non-crystallographic Symmetry==<br />
<br />
<span style="color:crimson">'''*Warning*''' Solution by MR in the presence of translational non-crystallographic symmetry is not fully automated.</span><br />
<br />
Phaser calculates correction factors for the expected intensities in the presence of translational non-crystallographic symmetry (tNCS), and is able to solve structures with complex patterns of tNCS. '''However, the use of Phaser in the presence of tNCS requires the nature of the tNCS to be understood by the user.''' In simple cases, solution is no more difficult than solution without tNCS, but in complex cases, separate Phaser runs with tNCS turned on and off, and/or the use of different tNCS vectors, may be necessary.<br />
<br />
The output of Phaser will help the user in detecting and understanding the tNCS, but '''the tNCS is not completely characterised by Phaser'''. The default behaviour may or may not be correct for the particular crystal under study.<br />
<br />
Characterization of the tNCS involves understanding the number of copies of the molecule in the asymmetric unit and the translation vectors between them. Molecules related by a tNCS vector will have an associated peak in the native Patterson. Phaser calculates the native Patterson (MODE TNCS) and lists the peaks that are more than 20% of the origin peak. Any given crystal with tNCS may have one or more peaks meeting this criteria.<br />
<br />
===Default tNCS detection and correction===<br />
<span style="color:crimson">Documentation for Phaser-2.7.16 and above</span><br />
<br />
====No tNCS====<br />
No tNCS correction is applied by default if there is<br />
# no peak in the native Patterson <br />
# more than one peak in the native Patterson over 20% of the origin and these peaks are not all the result of a commensurate modulation<br />
<br />
====Pairs of molecules====<br />
By default, if Phaser detects a peak in the native Patterson then Phaser will search for molecules in pairs related by the tNCS vector given by the peak in the native Patterson.<br />
<br />
This will be the correct behaviour if and only if there are an even number of copies of the molecule in the asymmetric unit, clustered into two groups related by a single tNCS vector. There will only be one significant peak in the native Patterson. Fortunately, this is a reasonably common scenario.<br />
<br />
Phaser refines the relative orientation of the molecules in the two groups (rotations of up to 10 degrees will still give rise to a significant native Patterson peak) and uses this information to generate expected intensity factors for the reflections. Solution should be straightforward, with the usual caveat for MR that there is a sufficiently good model.<br />
<br />
Where there is a single peak in the native Patterson, it is often located at a position half way along a unit cell axis or diagonal, representing a pseudo-halving of the unit cell dimensions. However, Phaser is by no means restricted to these sorts of pseudo-cells in its handling of two-fold tNCS, and the tNCS vector can be in a general position.<br />
<br />
===Non-default tNCS correction===<br />
====Higher order tNCS====<br />
Frequently, tNCS does not associate 2 clusters of molecules in the asymmetric unit, but rather there are 3 or more (n) clusters of molecules associated by a series of vectors that are multiples of 1, 2, 3 ... (n-1) times a basic translation vector. Where n times the basic translation vector equates to (very close to) integer multiples of unit cell axes, the tNCS represents a pseudo-cell, and this case is known as commensurate modulation. <br />
<br />
Phaser attempts to automatically detect commensurate modulation. The peaks of the native Patterson are analyzed to find the n-fold relationship. The series will not generally have all peaks the same height. Lower peaks in the series represent relationships where the relative rotations between related molecules are larger. Missing peaks in the series may be below the default 20% of origin cut-off. This can be lowered with TNCS PATT PERCENT <x><br />
<br />
Phaser then sets TNCS NMOL <n> and the vector for the tNCS, and searches for ensembles in multiples of NMOL.<br />
<br />
When there are more than two molecules related by tNCS, Phaser does not refine the orientations between the molecules related by the tNCS.<br />
<br />
However, as for two-fold tNCS, Phaser is not restricted to these sorts of pseudo-cells and the basic tNCS vector can be in a general position, as can the number of copies.<br />
<br />
'''The automatic detection may not give the true tNCS relationship'''. For example, the true commensurate modulation may be a factor of the NMOL automatically detected by Phaser, or there may not be commensurate modulation at all, or commensurate modulation may not be found with the default Pattesron peak height cutoff. In difficult cases, please inspect the Patterson for peaks.<br />
<br />
====Complex tNCS====<br />
If there are many molecules in the asymmetric unit but they are not all related by tNCS, or there are sub-groups of molecules related by different tNCS vectors, then the modulations of the expected intensities due to the tNCS will be much less significant than the cases described above. '''In these cases it is possible that structure solution will be achieved without any tNCS correction factors being applied.''' Indeed, searching for all the copies as tNCS-related multiples when some molecules are not related by tNCS will cause structure solution to fail. To turn off the automatic detection and use of tNCS use the keyword TNCS USE OFF.<br />
<br />
If turning off the TNCS correction factors fails to give a solution, then a good approach is to proceed step-wise. Consider the highest native Patterson peak first and determine that nature of the tNCS associated with it. Use the appropriate correction factors to locate all the molecules with this tNCS. Then take the second independent native Patterson peak and apply the correction factors associated with it to find the second set of molecules, fixing the first, etc. Finally, turn TNCS off to find any orphan molecules.</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Molecular_Replacement&diff=2429Molecular Replacement2018-02-07T17:18:34Z<p>Rdo20: /* Building an Ensemble from Coordinates */</p>
<hr />
<div><div style="margin-left: 25px; float: right;">__TOC__</div><br />
<br />
'''Quicklink to example scripts''' : [[MR using keyword input]]<br />
<br />
'''Quicklink to phaser.famos (find_alt_orig_sym_mate) documentation''' : [[Famos]]<br />
<br />
Phaser should be able to solve most structures with the Automated Molecular Replacement mode, and this is the first mode that you should try. Give Phaser your data ([[#How to Define Data|How to Define Data]]) and your models ([[#How to Define Models|How to Define Models]]), tell Phaser what to search for, and a list of possible spacegroups (in the same point group).<br />
<br />
If this doesn't work (see [[#Has Phaser Solved It?| Has Phaser Solved It?]]), you can try selecting peaks of lower significance in the rotation function in case the real orientation was not within the selection criteria. By default peaks above 75% of the top peak are selected (see [[#How to Select Peaks| How to Select Peaks]]). See [[#What to do in Difficult Cases| What to do in Difficult Cases]] for more hints and tips. If the automated molecular replacement mode doesn't work even with non-default input you need to run the modes of Phaser separately. The possibilities are endless - you can even try exhaustive searches (translations of all orientations) if you want - but experience has shown that most structures that can be solved by Phaser can be solved by relatively simple strategies.<br />
<br />
==Automated Molecular Replacement==<br />
Automated Molecular Replacement combines the anisotropy correction, likelihood enhanced fast rotation function, likelihood enhanced fast translation function, packing and refinement modes for multiple search models and a set of possible spacegroups to automatically solve a structure by molecular replacement. Top solutions are output to the files FILEROOT.sol, FILEROOT.#.mtz and FILEROOT.#.pdb (where "#" refers to the sorted solution number, 1 being the best, and only 1 is output by default). Many structures can be solved by running an automated molecular replacement search with defaults, giving the ensembles that you expect to be easiest to find first.<br />
<br />
At the completion of Molecular Replacement you may wish to place your solutions on a common origin with a previous solution, for which [[Famos | Famos ]] can be used.<br />
<br />
[[Image:Phaser_MR_auto.gif|Flow Diagram for Automated MR]]<br />
<br />
==Should Phaser Solve It?==<br />
The difficulty of a molecular replacement problem depends primarily on two major factors: how well the model will be able to explain the diffraction data (which depends both on the accuracy of the model and on its completeness), and how many reflections can be explained, at least in part. Each reflection provides a piece of information that helps to identify correct MR solutions.<br />
<br />
It is possible to make a reasonable prediction of whether or not a solution will be found. If the quality of the model (its accuracy and completeness) can be estimated, then the expected contribution of each reflection to the total LLG can also be estimated. From a large battery of tests, we know that an LLG of 40 or greater usually indicates a correct solution (at least in the absence of complicating factors such as translational non-crystallographic symmetry, tNCS). Building on this understanding, if it is estimated that the LLG will be 60 or less, then Phaser will assume that the problem is a difficult one, and will implement search procedures optimised for difficult problems.<br />
<br />
==What Resolution of Data Should be Used?==<br />
The signal for a molecular replacement solution should be very clear if the expected value of the LLG is much higher than the minimum required to be fairly certain of a solution. Currently Phaser aims for a minimum LLG of 120 and, if it is possible to achieve an even higher value, given the quality of the model and the quantity of diffraction data, then the resolution for the initial search is limited to the value required to achieve an expected LLG of 120. Data to the full resolution are still used for a final rigid-body refinement, or in a second pass if a clear solution is not found in the first attempt.<br />
<br />
However, if the model is expected to have a large RMS error (based usually on the correlation between sequence identity and RMS error), then data to high resolution will not contribute any significant signal. Regardless of the expected LLG at the highest resolution limit, the resolution used is limited to 1.8 times the estimated RMS error of the model, because this resolution limit gives about 99% of the LLG that could be achieved.<br />
<br />
Because Phaser implements strategies designed to solve structures with as much confidence as possible, as efficiently as possible, it is best to leave the choice of resolution to Phaser, at least in the first instance.<br />
<br />
==Has Phaser Solved It?==<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! TF Z-score !! Have I solved it?<br />
|-<br />
| less than 5 || no<br />
|-<br />
| 5 - 6 || unlikely<br />
|-<br />
| 6 - 7 || possibly<br />
|-<br />
| 7 - 8 || probably<br />
|-<br />
| more than 8* ||definitely<br />
|-<br />
|colspan="2" style="text-align: center;" | *''6 for 1st model in monoclinic space groups''<br />
|} <br />
<br />
Ideally, a unique solution with a strong signal will be found at the end of the search. If you are searching for multiple components, then ideally the search for each component will also give a strong signal. However if the signal-to-noise of your search is low, there will be noise peaks and multiple ambiguous solutions. Signal-to-noise is judged using the '''Z-score''', which is computed by comparing the LLG values from the rotation or translation search with LLG values for a set of random rotations or translations. The mean and the RMS deviation from the mean are computed from the random set, then the Z-score for a search peak is defined as its LLG minus the mean, all divided by the RMS deviation, ''i.e. '' '''the number of standard deviations above (or below) the mean. '''<br />
<br />
For a rotation function, the correct orientation may be well down the list with a Z-score (number of standard deviations above the mean value, or RFZ) under 4, and it is often not possible to identify the correct orientation until a translation function is performed and yields a clear solution. Note that the signal-to-noise of the rotation function drops with increasing number of primitive symmetry operations (the number of different orientations for symmetry-related molecules), because there is more uncertainty about how the structure factor contributions from symmetry-related copies will add up.<br />
<br />
For a translation function the correct solution will generally have a Z-score (TFZ) over 5 and be well separated from the rest of the solutions. Of course, there will always be exceptions! The table gives a very rough guide to interpreting TFZ scores. This table will be updated, as we learn more from systematic molecular replacement trials.<br />
<br />
When you are searching for multiple components, the signal may be low for the first few components but, as the model becomes more complete, the signal should become stronger. Finding a clear solution for a new component is a good sign that the partial solution to which that component was added was indeed correct.<br />
<br />
You should always at least glance through the summary of the logfile. One thing to look for, in particular, is whether any translation solutions with a high Z-score have been rejected by the packing step. By default up to 5 percent of marker atoms (C-alpha atoms for protein) are allowed to be involved in clashes. A solution with more clashes may still be correct, and the clashes may arise only because of differences in small surface loops. If this happens, repeat the run allowing a suitable number of clashes. Note that, unless there is specific evidence in the logfile that a high TFZ-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond the default will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
<br />
Note that, by default, Phaser will produce a single PDB file corresponding to the top solution found (if any), so finding a single PDB file in your output directory is not an indication that the search succeeded! You have to look, at least, at the summary of the logfile, or at the list of possible solutions in the .sol file that is produced if you run Phaser from ccp4i or command-line scripts.<br />
<br />
==Annotation==<br />
<br />
A highly compact summary of the history of the statistics of a solution is given in the SOLUTION SET in the .sol file. This is a good place to start your analysis of the output. The annotation gives the Z-score of the solution at each rotation and translation function, the number of clashes in the packing, and the refined LLG.<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! Annotation !! Meaning<br />
|-<br />
| RFZ= || Rotation Function Z-score<br />
|-<br />
| TFZ= || Translation Function Z-score<br />
|-<br />
| PAK= || Number of packing clashes<br />
|-<br />
| LLG= || LLG after refinement. Will be repeated when a low resolution refinement is followed by a high resolution refinement.<br />
|-<br />
| TFZ== || Translation Function Z-score equivalent, only calculated for the top solution after refinement (or for the number of top files specified by TOPFILES)<br />
|-<br />
| RF++ || Rotation angle from previous strong solution has been used in the addition of next solution<br />
|-<br />
| RF*0 || Rotation angle 000 identified by low R-factor of input model<br />
|-<br />
| TFZ=* || First molecule in P1 (arbitrary origin, no Translation Function required)<br />
|-<br />
| TF*0 || Translation vector 000 identified by low R-factor of input model<br />
|-<br />
| (&&nbsp;... & ...) || Set of TFZ PAK and LLG values for placements that were amalgamated (more than one placement from a single Translation Function)<br />
|-<br />
| LLG+=(...&nbsp;&&nbsp;...)&nbsp;|| Set of LLG values calculated during amalgamation, which will always be increasing in value<br />
|-<br />
| +TNCS || Components added by Translational NCS relation<br />
|-<br />
| *T=<i>n</i> || Solution matches template solution <i>n</i><br />
|} <br />
<br />
Two versions of TFZ (the translation function Z-score) now appear for each component. The first ("TFZ=") is the Z-score from the actual translation search, which depends on the accuracy of the orientation used for that search. The second ("TFZ==") is the TFZ-equivalent, which indicates what the TFZ score would have been with the correct (refined) orientation. You should see the TFZ-equivalent is high at least for the final components of the solution, and that the LLG (log-likelihood gain) increases as each component of the solution is added. For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU SET RFZ=10.7 TFZ=24.3 PAK=0 LLG=472 TFZ==24.7 RFZ=6.4 TFZ=24.4 PAK=0 LLG=1006 TFZ==29.7 LLG=1006 TFZ==29.7<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
Note that the Euler angles in Phaser follow the same convention as those defined for the Crowther fast rotation function, i.e. z-y-z (rotate around the z-axis, followed by the new y-axis, followed by the new z-axis).<br />
<br />
==History==<br />
<br />
A highly compact summary of the history of the peak positions of a solution is given in the SOLUTION HISTORY in the .sol file. Together with the SOLUTION SET annotation, this is useful in your analysis of the output. <br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! History !! Meaning<br />
|-<br />
| RF/TF(r/t:n) || (r) Rotation Function peak number/(t) Translation Function peak number for the rotation function : (n) number of peak in final merged and sorted list<br />
|-<br />
| PAK(n:m) || (n) input solution number : (m) output solution number after packing condition applied<br />
|-<br />
| RNP(m,a,b,c,... : p) || All input peaks amalgamated after refinement to give output solution number (m and others): (p) output solution number<br />
|-<br />
| FUSE(A,B,C) || Solution numbers merged in amalgamation<br />
|} <br />
<br />
For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU HISTORY RF/TF(1/1:1)PAK(1:1)RNP(1:1)RNP(1:1)<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
A more complicated structure solution may have<br />
<br />
SOLU HISTORY RF/TF(7/1:10)PAK(10:10)RNP(10,12,13,11,17,16,18,25,3,8,22,21,20,7,969,6,5,201,9,4,390,2,1,19:1)RNP(1:1)<br />
<br />
==What to do in Difficult Cases==<br />
<br />
Not every structure can be solved by molecular replacement, but the right strategy can push the limits. What to do when the default jobs fail depends on why your structure is difficult.<br />
*'''Flexible Structure'''<br />
*:The relative orientations of the domains may be different in your crystal than in the model. If that may be the case, break the model into separate PDB files containing rigid-body units, enter these as separate ensembles, and search for them separately. If you find a convincing solution for one domain, but fail to find a solution for the next domain, you can take advantage of the knowledge that its orientation is likely to be similar to that of the first domain. The ROTAte&nbsp;AROUnd option of the brute rotation search can be used to restrict the search to orientations within, say, 30 degrees of that of the known domain. Allow for close approach of the domains by increasing the allowed clashes with the PACK keyword by, say, 1 for each domain break that you introduce. Note that it is possible to use the brute rotation search as part of the automated molecular replacement pipeline, by changing the choice of the type of rotation search. Alternatively, you could try generating a series of models perturbed by normal modes, with the NMAPdb keyword. One of these may duplicate the hinge motion and provide a good single model.<br />
*'''Poor or Incomplete Model'''<br />
*:Signal-to-noise is reduced by coordinate errors or incompleteness of the model. Since the rotation search has lower signal to begin with than the translation search, it is usually more severely affected. For this reason, it can be very useful to use the subsequent translation search as a way to choose among many (say 1000) orientations. THe MR_AUTO FAST search mode automatically reduces the cutoff for accepting peaks from the fast rotation function if the decault pass does not find a solution with a high z-score, but you can manually reduce this further with the PEAKS and PURGE keywords. You can also try turning off the clustering of fast rotation function peaks because the correct orientation may sit on the shoulder of a peak in the rotation function. <br />
*:As shown convincingly by Schwarzenbacher ''et al.'' (Schwarzenbacher, Godzik, Grzechnik &amp; Jaroszewski, ''Acta Cryst.'' D'''60''', 1229-1236, 2004), judicious editing can make a significant difference in the quality of a distant model. In a number of tests with their data on models below 30% sequence identity, we have found that Phaser works best with a "mixed model" (non-identical sidechains longer than Ser replaced by Ser). In agreement with their results, the best models are generally derived using more sophisticated alignment protocols, such as their FFAS protocol. Use [http://www.phenix-online.org/documentation/sculptor.htm phenix.sculptor] to edit your model.<br />
*'''High Degree of Non-crystallographic Symmetry'''<br />
*:If there are clear peaks in the self-rotation function, you can expect orientations to be related by this known NCS. Methods to automatically use such information will be implemented in a future version of Phaser. In the meantime, you can work out for yourself the orientations that would be consistent with NCS and use the ROTAte&nbsp;AROUnd option to sample similar orientations. Alternatively, you may have an oligomeric model and expect similar NCS in the crystal. First search with the oligomeric model; if this fails, search with a monomer. If that succeeds, you can again use the ROTAte&nbsp;AROUnd option to force a subsequent monomer to adopt an orientation similar to the one you expect.<br />
*'''What <u>not</u> to do'''<br />
*:The automated mode of Phaser is fast when Phaser finds a high Z-score solution to your problem. When Phaser cannot find a solution with a significant Z-score, it "thrashes", meaning it maintains a list of 100-1000's of low Z-score potential solutions and tries to improve them. This can lead to exceptionally long Phaser runs (over a week of CPU time). Such runs are possible because the highly automated script allows many consecutive MR jobs to be run without you having to manually set 100-1000's of jobs running and keep track of the results. "Thrashing" generally does not produce a solution: solutions generally appear relatively quickly or not at all. It is more useful to go back and analyse your models and your data to see where improvements can be made. Your system manager will appreciate you terminating these jobs.<br />
*:It is also not a good idea to effectively remove the packing test. Unless there is specific evidence in the logfile that a high TF-function Z-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond a few (e.g. 1-5) will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
*'''Other suggestions'''<br />
*:Phaser has powerful input, output and scripting facilities that allow a large number of possibilities for altering default behaviour and forcing Phaser to do what you think it should. However, you will need to read the information in the manual below to take advantage of these facilities!<br />
<br />
==How to Define Data==<br />
You need to tell Phaser the name of the mtz file containing your data and the columns in the mtz file to be used using the HKLIn and LABIn keywords. Additional keywords (BINS CELL OUTLier RESOlution SPACegroup) define how the data are used.<br />
<br />
==How to Define Models==<br />
Phaser must be given the models that it will use for molecular replacement. A model in Phaser is referred to as an "ensemble", even when it is described by a single file. This is because it is possible to provide a set of aligned structures as an ensemble, from which a statistically-weighted averaged model is calculated. A molecular replacement model is provided either as one or more aligned pdb files, or as an electron density map, entered as structure factors in an mtz file. Each ensemble is treated as a separate type of rigid body to be placed in the molecular replacement solution. An ensemble should only be defined once, even if there are several copies of the molecule in the asymmetric unit.<br />
<br />
Fundamental to the way in which Phaser uses MR models (either from coordinates or maps) is to estimate how the accuracy of the model falls off as a function of resolution, represented by the Sigma(A) curve. To generate the Sigma(A) curve, Phaser needs to know the RMS coordinate error expected for the model and the fraction of the scattering power in the asymmetric unit that this model contributes.<br />
<br />
A Babinet-style correction is used to account for the effects of disordered solvent on the completeness of the model at low resolution.<br />
<br />
Molecular replacement models are defined with the ENSEmble keyword and the COMPosition keyword. The ENSEmble keyword gives (amongst other things) the RMS deviation for the Sigma(A) curve. The COMPosition keyword is used to deduce the fraction of the scattering power in the asymmetric unit that each ensemble contributes. The composition of the asymmetric unit is defined either by entering the molecular weights or sequences of the components in the asymmetric unit, and giving the number of copies of each. Expert users can also enter the fraction of the scattering of each component directly, although the composition must still be entered for the absolute scale calculation. Please note that the composition supplied to Phaser has to include everything in the asymmetric unit, not just what is being looked for in the current search!<br />
<br />
===Building an Ensemble from Coordinates===<br />
The RMS deviation is determined directly from RMS or indirectly from IDENtity in the ENSEmble<br />
keyword using a formula that depends on the sequence identity and the number of residues in the model.<br />
<br />
The RMS deviation estimated from ID may be an underestimate of the true value if there is a slight conformational change between the model and target structures. To find a solution in these cases it may be necessary to increase the RMS from the default value generated from the ID, by say 0.5 Angstroms. On the other hand, when Phaser succeeds in solving a structure from a model with sequence identity much below 30%, it is often found that the fold is preserved better than the average for that level of sequence identity. So it may be worth submitting a run in which the RMS error is set at, say, 1.5, even if the sequence identity is low. The table below can be used as a guide as to the default RMS value corresponding to ID.<br />
<br />
If you construct a model by homology modelling, remember that the RMS error you expect is essentially the error you expect from the template structure (if not worse!). So specify the sequence identity of the template, not of the homology model.<br />
<br />
Only the model with the highest sequence identity is reported in the output pdb file. Also, HETATM cards in the input pdb file are ignored in the calculation of the structure factors for the ensemble, but are carried through to the output pdb file. Thus, the phases on the output mtz file (which come from the structure factors of the ensemble) do not correspond to those that would be calculated from the output pdb file, when there is more than one pdb file in an ensemble and/or the pdbfile(s) have HETATM records.<br />
<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! wibble !! #50 !! #100 !! #200 !! #300 !! #400 !! #600 !! #850 !! #1000 !! #1500 !! #2000<br />
|-<br />
|'''ID=0%''' || 1.579 || 1.689 || 1.875 || 2.030 || 2.164 || 2.391 || 2.625 || 2.748 || 3.093 || 3.375<br />
|-<br />
|'''ID=10%''' || 1.356 || 1.451 || 1.610 || 1.743 || 1.858 || 2.053 || 2.255 || 2.360 || 2.657 || 2.899<br />
|-<br />
|'''ID=20%''' || 1.165 || 1.246 || 1.383 || 1.497 || 1.596 || 1.764 || 1.936 || 2.027 || 2.281 || 2.489<br />
|-<br />
|'''ID=30%''' || 1.000 || 1.070 || 1.188 || 1.286 || 1.371 || 1.515 || 1.663 || 1.741 || 1.959 || 2.138<br />
|-<br />
|'''ID=40%''' || 0.859 || 0.919 || 1.020 || 1.104 || 1.177 || 1.301 || 1.428 || 1.495 || 1.683 || 1.836<br />
|-<br />
|'''ID=50%''' || 0.738 || 0.789 || 0.876 || 0.948 || 1.011 || 1.117 || 1.227 || 1.284 || 1.445 || 1.577<br />
|-<br />
|'''ID=60%''' || 0.634 || 0.678 || 0.752 || 0.814 || 0.868 || 0.959 || 1.053 || 1.103 || 1.241 || 1.354<br />
|-<br />
|'''ID=70%''' || 0.544 || 0.582 || 0.646 || 0.699 || 0.746 || 0.824 || 0.905 || 0.947 || 1.066 || 1.163<br />
|-<br />
|'''ID=80%''' || 0.467 || 0.500 || 0.555 || 0.601 || 0.640 || 0.708 || 0.777 || 0.813 || 0.915 || 0.999<br />
|-<br />
|'''ID=90%''' || 0.401 || 0.429 || 0.477 || 0.516 || 0.550 || 0.608 || 0.667 || 0.698 || 0.786 || 0.858<br />
|-<br />
|'''ID=100%''' || 0.345 || 0.369 || 0.409 || 0.443 || 0.472 || 0.522 || 0.573 || 0.600 || 0.675 || 0.737<br />
|-<br />
|}<br />
<br />
<br />
====Coordinate Editing====<br />
=====HETATM/LIGANDS=====<br />
Phaser ignores the scattering from HETATM records. The HETATM records are carried though to output with occupancy set to zero. Ligands will therefore not contribute to the scattering used for molecular replacement. The exceptions to this rule are the HETATM records for MSE (seleno-methionine) MSO (seleno-methionine selenoxide) CSE (seleno-cysteine) CSO (seleno-cysteine selenoxide) ALY (acetyllysine) MLY (n-dimethyl-lysine) and MLZ (n-methyl-lysine) which are used in the scattering and carried through to output with their original occupancy. If you wish to include any HETATM records in the scattering the record name use the keyword ENSE modlid HETATOM ON<br />
<br />
=====WATER=====<br />
Water molecules (identified by the residue name OW WAT HOH H2O OH2 MOH WTR or TIP) are deleted from the pdb file on input, are not used in the scattering and are not carried through to file output. If you want to retain water molecules you will need to change the residue name to something other than this (e.g. WWW) so that the atoms are not identified as water. To include the water molecules in the scattering, the HETATM records will also have to be changed to ATOM records as described above.<br />
<br />
===Building an Ensemble from Electron Density===<br />
When using density as a model, it is necessary to specify both the extent (x,y,z limits) of the cut-out region of density, and the centre of this region. With coordinates, Phaser can work this out by itself. This information is needed, for instance, to decide how large rotational steps can be in the rotation search and to carry out the molecular transform interpolation correctly. In the case of electron density, the RMS value does not have the same physical meaning that it has when the model is specified by atomic coordinates, but it is used to judge how the accuracy of the calculated structure factors drops off with resolution. A suitable value for RMS can be obtained, in the case of density from an experimentally-phased map, by choosing a value that makes the SigmaA curve fall off with resolution similarly to the mean figures-of-merit. In the case of density from an EM image reconstruction, the RMS value should make the SigmaA curve fall off similarly to a Fourier correlation curve used to judge the resolution of the EM image.<br />
<br />
For detailed information, including a tutorial with example scripts, see<br />
[[Using Electron Density as a Model| Using density as a model]]<br />
<br />
==How to Define Composition==<br />
The composition defines the total amount of protein and nucleic acid that you have in the asymmetric unit not the fraction of the asymmetric unit that you are searching for.<br />
<br />
===Default Composition===<br />
For convenience, the composition defaults to 50% protein scattering by volume (the average for protein crystals). It is better to enter it explicitly, even if only to check that you have correctly deduced the probable content of your crystal. If your crystal has higher or lower solvent content than this, or contains nucleic acid, then the composition should be entered explicitly.<br />
===Composition by Solvent Content===<br />
Scattering is determined from the solvent content of the crystal, assuming that the crystal contains protein only, and the average distribution of amino acids in protein. If your crystal contains nucleic acid or your protein has an unusual amino acid distribution then the composition should be entered explicitly using the MW or sequence options.<br />
===Composition by Number of Residues in ASU===<br />
Scattering is determined from the number of residues in the asymmetric unit, assuming that the crystal contains protein only or nucleic acid only, and assuming an average distribution of residues for either. If your crystal contains a mixture then the composition should be entered explicitly using the MW or sequence options. If your crystal has an unusual residue distribution then the composition should be entered explicitly using the sequence options.<br />
===Composition by Molecular Weight===<br />
The composition is calculated from the molecular weight of the protein and nucleic acid assuming the protein and nucleic acid have the average distribution of amino acids and bases. If your protein or nucleic acid has an unusual amino acid or base distribution the composition should be entered by sequence. You can mix compositions entered by molecular weight with those entered by sequence.<br />
===Composition by Sequence===<br />
The composition is calculated from the amino acid sequence of the protein and the base sequence of the nucleic acid in fasta format. You can mix compositions entered by molecular weight with those entered by sequence. Individual atoms can be added to the composition with the COMPOSITION ATOM keyword. This allows the explicit addition of heavy atoms in the structure e.g. Fe atoms.<br />
===Composition by Percentage Scattering===<br />
The fraction scattering of each ensemble can be entered directly. The fraction scattering of each ensemble is normally automatically worked out from the average scattering from each ensemble (calculated from the pdb files if entered as coordinates, or from the protein and nucleic acid molecular weights if entered as a map) divided by the total scattering given by the composition, but entering the fraction scattering directly overrides this calculation. This option is for use when the pdb files of the models in the ensemble are unusual e.g. consist only of C-alpha atoms, or only of hydrogen atoms (as in the CLOUDS method for NMR).<br />
<br />
==How to Define Solutions==<br />
Phaser writes out files ending in ".sol" and ".rlist" that contain the solution information from the job. The root of the files is given by the ROOT keyword. By default, the root filename is PHASER. These files can be read back into subsequent runs of Phaser to build up solutions containing more than one molecule in the asymmetric unit.<br />
<br />
"PHASER.sol" files are generated by all modes (rotation function modes with VERBOSE output), and contain the current idea of potential molecular replacement solutions.<br />
<br />
"PHASER.rlist" files are generated by the rotation function modes, and are used as input for performing translation functions.<br />
<br />
For simple MR cases you don't really need to know how to define molecular replacement solutions. However, for difficult cases you might need to edit the files "PHASER.sol" and "PHASER.rlist" files manually<br />
<br />
=== "sol" Files===<br />
SOLUtion 6DIM keywords describe Ensembles that have been oriented by a rotation search and positioned by a translation search. Each Ensemble in the asymmetric unit has its own SOLUtion keyword. When more than one (potential) molecular replacement solution is present, the solutions are separated with the SOLUTION SET keywords.<br />
<br />
==="rlist" Files===<br />
These files define a rotation function list. The peak list is given with a series of SOLUtion TRIAl keywords.<br />
<br />
If a partial solution is already known, then the information for the currently "known" parts of the asymmetric unit is given in the form used for the PHASER.sol file, followed by the list of trial orientations for which a translation function is to be performed.<br />
<br />
===Fixed partial structure===<br />
If you have the coordinates of a partial solution with the pdb coordinates of the known structure in the correct orientation and position, then you can force Phaser to use these coordinates. Use the SOLUTION keyword to fix a rotation of 0 0 0 and a position of 0 0 0 for these coordinates.<br />
<br />
==How to Select Peaks==<br />
<br />
<br />
<br />
The selection of peaks saved for output in the rotation and translation functions can be done in four different ways.<br />
*'''Select by Percentage'''<br />
*: Percentage of the top peak, where the value of the top peak is defined as 100% and the value of the mean is defined as 0%.<br />
*: Default, cutoff=75%. This criteria has the advantange that at least one peak (the top peak) always survives the selection. If the top solution is clear, then only the one solution will be output, but if the distribution of peaks is rather flat, then many peaks will be output for testing in the next part of the MR procedure (e.g. many peaks selected from the rotation function for testing with a translation function). <br />
*'''Select by Z-score'''<br />
*: Number of standard deviations (sigmas) over the mean (the Z-score). <br />
*: Absolute significance test. Not all searches will produce output if the cutoff value is too high (e.g. 5 sigma). <br />
*'''Select by Number'''<br />
*: Number of top peaks to select. <br />
*: If the distribution is very flat then it might be better to select a fixed large number (e.g. 1000) of top rotation peaks for testing in the translation function.<br />
*'''No selection'''<br />
*: All peaks are selected. <br />
*: Enables full 6 dimensional searches, where all the solutions from the rotation function are output for testing in the translation function. This should never be necessary; it would be much faster and probably just as likely to work if the top 1000 peaks were used in this way.<br />
<br />
[[Image:Phaser_selection.gif| Selection criteria]]<br />
<br />
Peaks can also be clustered or not clustered prior to selection in steps 1 and 2.<br />
*'''Clustering Off'''<br />
: All high peaks on the search grid are selected<br />
*'''Clustering On'''<br />
: Points on the search grid with higher neighbouring points are removed from the selection<br />
<br />
<br />
[[Image:Phaser_clustering.gif| Clustering]]<br />
<br />
==How to Control Output==<br />
The output of Phaser can be controlled with optional keywords. <br />
<br />
The ROOT keyword is not compulsory (the default root filename is "PHASER"), but should always be given, so that your jobs have separate and meaningful output filenames.<br />
<br />
The TOPFiles keyword controls the number of potential MR solutions for which PDB and (in the appropriate modes) MTZ files are produced.<br />
<br />
For the MR_AUTO, MR_RNP and MR_LLG modes, unless HKLOut OFF is given as an optional keyword, Phaser produces an MTZ file with "SigmaA" type weighted Fourier map coefficients for producing electron density maps for rebuilding.<br />
<br />
{| class="wikitable" style="text-align:left" width=100%<br />
|-<br />
! MTZ Column Labels !! Description<br />
|-<br />
| FWT/PHWT || Amplitude and phase for 2''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| DELFWT/PHDELWT || Amplitude and phase for ''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| FOM || ''m'', analogous to the "Sim" weight, to estimate the reliability of &alpha;<sub>calc</sub><br />
|-<br />
| HLA/HLB/HLC/HLD || Hendrickson-Lattman coefficients encoding the phase probability distribution<br />
|}<br />
<br />
==Translational Non-crystallographic Symmetry==<br />
<br />
<span style="color:crimson">'''*Warning*''' Solution by MR in the presence of translational non-crystallographic symmetry is not fully automated.</span><br />
<br />
Phaser calculates correction factors for the expected intensities in the presence of translational non-crystallographic symmetry (tNCS), and is able to solve structures with complex patterns of tNCS. '''However, the use of Phaser in the presence of tNCS requires the nature of the tNCS to be understood by the user.''' In simple cases, solution is no more difficult than solution without tNCS, but in complex cases, separate Phaser runs with tNCS turned on and off, and/or the use of different tNCS vectors, may be necessary.<br />
<br />
The output of Phaser will help the user in detecting and understanding the tNCS, but '''the tNCS is not completely characterised by Phaser'''. The default behaviour may or may not be correct for the particular crystal under study.<br />
<br />
Characterization of the tNCS involves understanding the number of copies of the molecule in the asymmetric unit and the translation vectors between them. Molecules related by a tNCS vector will have an associated peak in the native Patterson. Phaser calculates the native Patterson (MODE TNCS) and lists the peaks that are more than 20% of the origin peak. Any given crystal with tNCS may have one or more peaks meeting this criteria.<br />
<br />
===Default tNCS detection and correction===<br />
<span style="color:crimson">Documentation for Phaser-2.7.16 and above</span><br />
<br />
====No tNCS====<br />
No tNCS correction is applied by default if there is<br />
# no peak in the native Patterson <br />
# more than one peak in the native Patterson over 20% of the origin and these peaks are not all the result of a commensurate modulation<br />
<br />
====Pairs of molecules====<br />
By default, if Phaser detects a peak in the native Patterson then Phaser will search for molecules in pairs related by the tNCS vector given by the peak in the native Patterson.<br />
<br />
This will be the correct behaviour if and only if there are an even number of copies of the molecule in the asymmetric unit, clustered into two groups related by a single tNCS vector. There will only be one significant peak in the native Patterson. Fortunately, this is a reasonably common scenario.<br />
<br />
Phaser refines the relative orientation of the molecules in the two groups (rotations of up to 10 degrees will still give rise to a significant native Patterson peak) and uses this information to generate expected intensity factors for the reflections. Solution should be straightforward, with the usual caveat for MR that there is a sufficiently good model.<br />
<br />
Where there is a single peak in the native Patterson, it is often located at a position half way along a unit cell axis or diagonal, representing a pseudo-halving of the unit cell dimensions. However, Phaser is by no means restricted to these sorts of pseudo-cells in its handling of two-fold tNCS, and the tNCS vector can be in a general position.<br />
<br />
===Non-default tNCS correction===<br />
====Higher order tNCS====<br />
Frequently, tNCS does not associate 2 clusters of molecules in the asymmetric unit, but rather there are 3 or more (n) clusters of molecules associated by a series of vectors that are multiples of 1, 2, 3 ... (n-1) times a basic translation vector. Where n times the basic translation vector equates to (very close to) integer multiples of unit cell axes, the tNCS represents a pseudo-cell, and this case is known as commensurate modulation. <br />
<br />
Phaser attempts to automatically detect commensurate modulation. The peaks of the native Patterson are analyzed to find the n-fold relationship. The series will not generally have all peaks the same height. Lower peaks in the series represent relationships where the relative rotations between related molecules are larger. Missing peaks in the series may be below the default 20% of origin cut-off. This can be lowered with TNCS PATT PERCENT <x><br />
<br />
Phaser then sets TNCS NMOL <n> and the vector for the tNCS, and searches for ensembles in multiples of NMOL.<br />
<br />
When there are more than two molecules related by tNCS, Phaser does not refine the orientations between the molecules related by the tNCS.<br />
<br />
However, as for two-fold tNCS, Phaser is not restricted to these sorts of pseudo-cells and the basic tNCS vector can be in a general position, as can the number of copies.<br />
<br />
'''The automatic detection may not give the true tNCS relationship'''. For example, the true commensurate modulation may be a factor of the NMOL automatically detected by Phaser, or there may not be commensurate modulation at all, or commensurate modulation may not be found with the default Pattesron peak height cutoff. In difficult cases, please inspect the Patterson for peaks.<br />
<br />
====Complex tNCS====<br />
If there are many molecules in the asymmetric unit but they are not all related by tNCS, or there are sub-groups of molecules related by different tNCS vectors, then the modulations of the expected intensities due to the tNCS will be much less significant than the cases described above. '''In these cases it is possible that structure solution will be achieved without any tNCS correction factors being applied.''' Indeed, searching for all the copies as tNCS-related multiples when some molecules are not related by tNCS will cause structure solution to fail. To turn off the automatic detection and use of tNCS use the keyword TNCS USE OFF.<br />
<br />
If turning off the TNCS correction factors fails to give a solution, then a good approach is to proceed step-wise. Consider the highest native Patterson peak first and determine that nature of the tNCS associated with it. Use the appropriate correction factors to locate all the molecules with this tNCS. Then take the second independent native Patterson peak and apply the correction factors associated with it to find the second set of molecules, fixing the first, etc. Finally, turn TNCS off to find any orphan molecules.</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Molecular_Replacement&diff=2428Molecular Replacement2018-02-07T17:16:57Z<p>Rdo20: </p>
<hr />
<div><div style="margin-left: 25px; float: right;">__TOC__</div><br />
<br />
'''Quicklink to example scripts''' : [[MR using keyword input]]<br />
<br />
'''Quicklink to phaser.famos (find_alt_orig_sym_mate) documentation''' : [[Famos]]<br />
<br />
Phaser should be able to solve most structures with the Automated Molecular Replacement mode, and this is the first mode that you should try. Give Phaser your data ([[#How to Define Data|How to Define Data]]) and your models ([[#How to Define Models|How to Define Models]]), tell Phaser what to search for, and a list of possible spacegroups (in the same point group).<br />
<br />
If this doesn't work (see [[#Has Phaser Solved It?| Has Phaser Solved It?]]), you can try selecting peaks of lower significance in the rotation function in case the real orientation was not within the selection criteria. By default peaks above 75% of the top peak are selected (see [[#How to Select Peaks| How to Select Peaks]]). See [[#What to do in Difficult Cases| What to do in Difficult Cases]] for more hints and tips. If the automated molecular replacement mode doesn't work even with non-default input you need to run the modes of Phaser separately. The possibilities are endless - you can even try exhaustive searches (translations of all orientations) if you want - but experience has shown that most structures that can be solved by Phaser can be solved by relatively simple strategies.<br />
<br />
==Automated Molecular Replacement==<br />
Automated Molecular Replacement combines the anisotropy correction, likelihood enhanced fast rotation function, likelihood enhanced fast translation function, packing and refinement modes for multiple search models and a set of possible spacegroups to automatically solve a structure by molecular replacement. Top solutions are output to the files FILEROOT.sol, FILEROOT.#.mtz and FILEROOT.#.pdb (where "#" refers to the sorted solution number, 1 being the best, and only 1 is output by default). Many structures can be solved by running an automated molecular replacement search with defaults, giving the ensembles that you expect to be easiest to find first.<br />
<br />
At the completion of Molecular Replacement you may wish to place your solutions on a common origin with a previous solution, for which [[Famos | Famos ]] can be used.<br />
<br />
[[Image:Phaser_MR_auto.gif|Flow Diagram for Automated MR]]<br />
<br />
==Should Phaser Solve It?==<br />
The difficulty of a molecular replacement problem depends primarily on two major factors: how well the model will be able to explain the diffraction data (which depends both on the accuracy of the model and on its completeness), and how many reflections can be explained, at least in part. Each reflection provides a piece of information that helps to identify correct MR solutions.<br />
<br />
It is possible to make a reasonable prediction of whether or not a solution will be found. If the quality of the model (its accuracy and completeness) can be estimated, then the expected contribution of each reflection to the total LLG can also be estimated. From a large battery of tests, we know that an LLG of 40 or greater usually indicates a correct solution (at least in the absence of complicating factors such as translational non-crystallographic symmetry, tNCS). Building on this understanding, if it is estimated that the LLG will be 60 or less, then Phaser will assume that the problem is a difficult one, and will implement search procedures optimised for difficult problems.<br />
<br />
==What Resolution of Data Should be Used?==<br />
The signal for a molecular replacement solution should be very clear if the expected value of the LLG is much higher than the minimum required to be fairly certain of a solution. Currently Phaser aims for a minimum LLG of 120 and, if it is possible to achieve an even higher value, given the quality of the model and the quantity of diffraction data, then the resolution for the initial search is limited to the value required to achieve an expected LLG of 120. Data to the full resolution are still used for a final rigid-body refinement, or in a second pass if a clear solution is not found in the first attempt.<br />
<br />
However, if the model is expected to have a large RMS error (based usually on the correlation between sequence identity and RMS error), then data to high resolution will not contribute any significant signal. Regardless of the expected LLG at the highest resolution limit, the resolution used is limited to 1.8 times the estimated RMS error of the model, because this resolution limit gives about 99% of the LLG that could be achieved.<br />
<br />
Because Phaser implements strategies designed to solve structures with as much confidence as possible, as efficiently as possible, it is best to leave the choice of resolution to Phaser, at least in the first instance.<br />
<br />
==Has Phaser Solved It?==<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! TF Z-score !! Have I solved it?<br />
|-<br />
| less than 5 || no<br />
|-<br />
| 5 - 6 || unlikely<br />
|-<br />
| 6 - 7 || possibly<br />
|-<br />
| 7 - 8 || probably<br />
|-<br />
| more than 8* ||definitely<br />
|-<br />
|colspan="2" style="text-align: center;" | *''6 for 1st model in monoclinic space groups''<br />
|} <br />
<br />
Ideally, a unique solution with a strong signal will be found at the end of the search. If you are searching for multiple components, then ideally the search for each component will also give a strong signal. However if the signal-to-noise of your search is low, there will be noise peaks and multiple ambiguous solutions. Signal-to-noise is judged using the '''Z-score''', which is computed by comparing the LLG values from the rotation or translation search with LLG values for a set of random rotations or translations. The mean and the RMS deviation from the mean are computed from the random set, then the Z-score for a search peak is defined as its LLG minus the mean, all divided by the RMS deviation, ''i.e. '' '''the number of standard deviations above (or below) the mean. '''<br />
<br />
For a rotation function, the correct orientation may be well down the list with a Z-score (number of standard deviations above the mean value, or RFZ) under 4, and it is often not possible to identify the correct orientation until a translation function is performed and yields a clear solution. Note that the signal-to-noise of the rotation function drops with increasing number of primitive symmetry operations (the number of different orientations for symmetry-related molecules), because there is more uncertainty about how the structure factor contributions from symmetry-related copies will add up.<br />
<br />
For a translation function the correct solution will generally have a Z-score (TFZ) over 5 and be well separated from the rest of the solutions. Of course, there will always be exceptions! The table gives a very rough guide to interpreting TFZ scores. This table will be updated, as we learn more from systematic molecular replacement trials.<br />
<br />
When you are searching for multiple components, the signal may be low for the first few components but, as the model becomes more complete, the signal should become stronger. Finding a clear solution for a new component is a good sign that the partial solution to which that component was added was indeed correct.<br />
<br />
You should always at least glance through the summary of the logfile. One thing to look for, in particular, is whether any translation solutions with a high Z-score have been rejected by the packing step. By default up to 5 percent of marker atoms (C-alpha atoms for protein) are allowed to be involved in clashes. A solution with more clashes may still be correct, and the clashes may arise only because of differences in small surface loops. If this happens, repeat the run allowing a suitable number of clashes. Note that, unless there is specific evidence in the logfile that a high TFZ-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond the default will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
<br />
Note that, by default, Phaser will produce a single PDB file corresponding to the top solution found (if any), so finding a single PDB file in your output directory is not an indication that the search succeeded! You have to look, at least, at the summary of the logfile, or at the list of possible solutions in the .sol file that is produced if you run Phaser from ccp4i or command-line scripts.<br />
<br />
==Annotation==<br />
<br />
A highly compact summary of the history of the statistics of a solution is given in the SOLUTION SET in the .sol file. This is a good place to start your analysis of the output. The annotation gives the Z-score of the solution at each rotation and translation function, the number of clashes in the packing, and the refined LLG.<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! Annotation !! Meaning<br />
|-<br />
| RFZ= || Rotation Function Z-score<br />
|-<br />
| TFZ= || Translation Function Z-score<br />
|-<br />
| PAK= || Number of packing clashes<br />
|-<br />
| LLG= || LLG after refinement. Will be repeated when a low resolution refinement is followed by a high resolution refinement.<br />
|-<br />
| TFZ== || Translation Function Z-score equivalent, only calculated for the top solution after refinement (or for the number of top files specified by TOPFILES)<br />
|-<br />
| RF++ || Rotation angle from previous strong solution has been used in the addition of next solution<br />
|-<br />
| RF*0 || Rotation angle 000 identified by low R-factor of input model<br />
|-<br />
| TFZ=* || First molecule in P1 (arbitrary origin, no Translation Function required)<br />
|-<br />
| TF*0 || Translation vector 000 identified by low R-factor of input model<br />
|-<br />
| (&&nbsp;... & ...) || Set of TFZ PAK and LLG values for placements that were amalgamated (more than one placement from a single Translation Function)<br />
|-<br />
| LLG+=(...&nbsp;&&nbsp;...)&nbsp;|| Set of LLG values calculated during amalgamation, which will always be increasing in value<br />
|-<br />
| +TNCS || Components added by Translational NCS relation<br />
|-<br />
| *T=<i>n</i> || Solution matches template solution <i>n</i><br />
|} <br />
<br />
Two versions of TFZ (the translation function Z-score) now appear for each component. The first ("TFZ=") is the Z-score from the actual translation search, which depends on the accuracy of the orientation used for that search. The second ("TFZ==") is the TFZ-equivalent, which indicates what the TFZ score would have been with the correct (refined) orientation. You should see the TFZ-equivalent is high at least for the final components of the solution, and that the LLG (log-likelihood gain) increases as each component of the solution is added. For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU SET RFZ=10.7 TFZ=24.3 PAK=0 LLG=472 TFZ==24.7 RFZ=6.4 TFZ=24.4 PAK=0 LLG=1006 TFZ==29.7 LLG=1006 TFZ==29.7<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
Note that the Euler angles in Phaser follow the same convention as those defined for the Crowther fast rotation function, i.e. z-y-z (rotate around the z-axis, followed by the new y-axis, followed by the new z-axis).<br />
<br />
==History==<br />
<br />
A highly compact summary of the history of the peak positions of a solution is given in the SOLUTION HISTORY in the .sol file. Together with the SOLUTION SET annotation, this is useful in your analysis of the output. <br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! History !! Meaning<br />
|-<br />
| RF/TF(r/t:n) || (r) Rotation Function peak number/(t) Translation Function peak number for the rotation function : (n) number of peak in final merged and sorted list<br />
|-<br />
| PAK(n:m) || (n) input solution number : (m) output solution number after packing condition applied<br />
|-<br />
| RNP(m,a,b,c,... : p) || All input peaks amalgamated after refinement to give output solution number (m and others): (p) output solution number<br />
|-<br />
| FUSE(A,B,C) || Solution numbers merged in amalgamation<br />
|} <br />
<br />
For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU HISTORY RF/TF(1/1:1)PAK(1:1)RNP(1:1)RNP(1:1)<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
A more complicated structure solution may have<br />
<br />
SOLU HISTORY RF/TF(7/1:10)PAK(10:10)RNP(10,12,13,11,17,16,18,25,3,8,22,21,20,7,969,6,5,201,9,4,390,2,1,19:1)RNP(1:1)<br />
<br />
==What to do in Difficult Cases==<br />
<br />
Not every structure can be solved by molecular replacement, but the right strategy can push the limits. What to do when the default jobs fail depends on why your structure is difficult.<br />
*'''Flexible Structure'''<br />
*:The relative orientations of the domains may be different in your crystal than in the model. If that may be the case, break the model into separate PDB files containing rigid-body units, enter these as separate ensembles, and search for them separately. If you find a convincing solution for one domain, but fail to find a solution for the next domain, you can take advantage of the knowledge that its orientation is likely to be similar to that of the first domain. The ROTAte&nbsp;AROUnd option of the brute rotation search can be used to restrict the search to orientations within, say, 30 degrees of that of the known domain. Allow for close approach of the domains by increasing the allowed clashes with the PACK keyword by, say, 1 for each domain break that you introduce. Note that it is possible to use the brute rotation search as part of the automated molecular replacement pipeline, by changing the choice of the type of rotation search. Alternatively, you could try generating a series of models perturbed by normal modes, with the NMAPdb keyword. One of these may duplicate the hinge motion and provide a good single model.<br />
*'''Poor or Incomplete Model'''<br />
*:Signal-to-noise is reduced by coordinate errors or incompleteness of the model. Since the rotation search has lower signal to begin with than the translation search, it is usually more severely affected. For this reason, it can be very useful to use the subsequent translation search as a way to choose among many (say 1000) orientations. THe MR_AUTO FAST search mode automatically reduces the cutoff for accepting peaks from the fast rotation function if the decault pass does not find a solution with a high z-score, but you can manually reduce this further with the PEAKS and PURGE keywords. You can also try turning off the clustering of fast rotation function peaks because the correct orientation may sit on the shoulder of a peak in the rotation function. <br />
*:As shown convincingly by Schwarzenbacher ''et al.'' (Schwarzenbacher, Godzik, Grzechnik &amp; Jaroszewski, ''Acta Cryst.'' D'''60''', 1229-1236, 2004), judicious editing can make a significant difference in the quality of a distant model. In a number of tests with their data on models below 30% sequence identity, we have found that Phaser works best with a "mixed model" (non-identical sidechains longer than Ser replaced by Ser). In agreement with their results, the best models are generally derived using more sophisticated alignment protocols, such as their FFAS protocol. Use [http://www.phenix-online.org/documentation/sculptor.htm phenix.sculptor] to edit your model.<br />
*'''High Degree of Non-crystallographic Symmetry'''<br />
*:If there are clear peaks in the self-rotation function, you can expect orientations to be related by this known NCS. Methods to automatically use such information will be implemented in a future version of Phaser. In the meantime, you can work out for yourself the orientations that would be consistent with NCS and use the ROTAte&nbsp;AROUnd option to sample similar orientations. Alternatively, you may have an oligomeric model and expect similar NCS in the crystal. First search with the oligomeric model; if this fails, search with a monomer. If that succeeds, you can again use the ROTAte&nbsp;AROUnd option to force a subsequent monomer to adopt an orientation similar to the one you expect.<br />
*'''What <u>not</u> to do'''<br />
*:The automated mode of Phaser is fast when Phaser finds a high Z-score solution to your problem. When Phaser cannot find a solution with a significant Z-score, it "thrashes", meaning it maintains a list of 100-1000's of low Z-score potential solutions and tries to improve them. This can lead to exceptionally long Phaser runs (over a week of CPU time). Such runs are possible because the highly automated script allows many consecutive MR jobs to be run without you having to manually set 100-1000's of jobs running and keep track of the results. "Thrashing" generally does not produce a solution: solutions generally appear relatively quickly or not at all. It is more useful to go back and analyse your models and your data to see where improvements can be made. Your system manager will appreciate you terminating these jobs.<br />
*:It is also not a good idea to effectively remove the packing test. Unless there is specific evidence in the logfile that a high TF-function Z-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond a few (e.g. 1-5) will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
*'''Other suggestions'''<br />
*:Phaser has powerful input, output and scripting facilities that allow a large number of possibilities for altering default behaviour and forcing Phaser to do what you think it should. However, you will need to read the information in the manual below to take advantage of these facilities!<br />
<br />
==How to Define Data==<br />
You need to tell Phaser the name of the mtz file containing your data and the columns in the mtz file to be used using the HKLIn and LABIn keywords. Additional keywords (BINS CELL OUTLier RESOlution SPACegroup) define how the data are used.<br />
<br />
==How to Define Models==<br />
Phaser must be given the models that it will use for molecular replacement. A model in Phaser is referred to as an "ensemble", even when it is described by a single file. This is because it is possible to provide a set of aligned structures as an ensemble, from which a statistically-weighted averaged model is calculated. A molecular replacement model is provided either as one or more aligned pdb files, or as an electron density map, entered as structure factors in an mtz file. Each ensemble is treated as a separate type of rigid body to be placed in the molecular replacement solution. An ensemble should only be defined once, even if there are several copies of the molecule in the asymmetric unit.<br />
<br />
Fundamental to the way in which Phaser uses MR models (either from coordinates or maps) is to estimate how the accuracy of the model falls off as a function of resolution, represented by the Sigma(A) curve. To generate the Sigma(A) curve, Phaser needs to know the RMS coordinate error expected for the model and the fraction of the scattering power in the asymmetric unit that this model contributes.<br />
<br />
A Babinet-style correction is used to account for the effects of disordered solvent on the completeness of the model at low resolution.<br />
<br />
Molecular replacement models are defined with the ENSEmble keyword and the COMPosition keyword. The ENSEmble keyword gives (amongst other things) the RMS deviation for the Sigma(A) curve. The COMPosition keyword is used to deduce the fraction of the scattering power in the asymmetric unit that each ensemble contributes. The composition of the asymmetric unit is defined either by entering the molecular weights or sequences of the components in the asymmetric unit, and giving the number of copies of each. Expert users can also enter the fraction of the scattering of each component directly, although the composition must still be entered for the absolute scale calculation. Please note that the composition supplied to Phaser has to include everything in the asymmetric unit, not just what is being looked for in the current search!<br />
<br />
===Building an Ensemble from Coordinates===<br />
The RMS deviation is determined directly from RMS or indirectly from IDENtity in the ENSEmble<br />
keyword using a formula that depends on the sequence identity and the number of residues in the model.<br />
<br />
The RMS deviation estimated from ID may be an underestimate of the true value if there is a slight conformational change between the model and target structures. To find a solution in these cases it may be necessary to increase the RMS from the default value generated from the ID, by say 0.5 Angstroms. On the other hand, when Phaser succeeds in solving a structure from a model with sequence identity much below 30%, it is often found that the fold is preserved better than the average for that level of sequence identity. So it may be worth submitting a run in which the RMS error is set at, say, 1.5, even if the sequence identity is low. The table below can be used as a guide as to the default RMS value corresponding to ID.<br />
<br />
If you construct a model by homology modelling, remember that the RMS error you expect is essentially the error you expect from the template structure (if not worse!). So specify the sequence identity of the template, not of the homology model.<br />
<br />
Only the model with the highest sequence identity is reported in the output pdb file. Also, HETATM cards in the input pdb file are ignored in the calculation of the structure factors for the ensemble, but are carried through to the output pdb file. Thus, the phases on the output mtz file (which come from the structure factors of the ensemble) do not correspond to those that would be calculated from the output pdb file, when there is more than one pdb file in an ensemble and/or the pdbfile(s) have HETATM records.<br />
<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! !! #50 !! #100 !! #200 !! #300 !! #400 !! #600 !! #850 !! #1000 !! #1500 !! #2000<br />
|-<br />
|'''ID=0%''' || 1.579 || 1.689 || 1.875 || 2.030 || 2.164 || 2.391 || 2.625 || 2.748 || 3.093 || 3.375<br />
|-<br />
|'''ID=10%''' || 1.356 || 1.451 || 1.610 || 1.743 || 1.858 || 2.053 || 2.255 || 2.360 || 2.657 || 2.899<br />
|-<br />
|'''ID=20%''' || 1.165 || 1.246 || 1.383 || 1.497 || 1.596 || 1.764 || 1.936 || 2.027 || 2.281 || 2.489<br />
|-<br />
|'''ID=30%''' || 1.000 || 1.070 || 1.188 || 1.286 || 1.371 || 1.515 || 1.663 || 1.741 || 1.959 || 2.138<br />
|-<br />
|'''ID=40%''' || 0.859 || 0.919 || 1.020 || 1.104 || 1.177 || 1.301 || 1.428 || 1.495 || 1.683 || 1.836<br />
|-<br />
|'''ID=50%''' || 0.738 || 0.789 || 0.876 || 0.948 || 1.011 || 1.117 || 1.227 || 1.284 || 1.445 || 1.577<br />
|-<br />
|'''ID=60%''' || 0.634 || 0.678 || 0.752 || 0.814 || 0.868 || 0.959 || 1.053 || 1.103 || 1.241 || 1.354<br />
|-<br />
|'''ID=70%''' || 0.544 || 0.582 || 0.646 || 0.699 || 0.746 || 0.824 || 0.905 || 0.947 || 1.066 || 1.163<br />
|-<br />
|'''ID=80%''' || 0.467 || 0.500 || 0.555 || 0.601 || 0.640 || 0.708 || 0.777 || 0.813 || 0.915 || 0.999<br />
|-<br />
|'''ID=90%''' || 0.401 || 0.429 || 0.477 || 0.516 || 0.550 || 0.608 || 0.667 || 0.698 || 0.786 || 0.858<br />
|-<br />
|'''ID=100%''' || 0.345 || 0.369 || 0.409 || 0.443 || 0.472 || 0.522 || 0.573 || 0.600 || 0.675 || 0.737<br />
|-<br />
|}<br />
<br />
<br />
====Coordinate Editing====<br />
=====HETATM/LIGANDS=====<br />
Phaser ignores the scattering from HETATM records. The HETATM records are carried though to output with occupancy set to zero. Ligands will therefore not contribute to the scattering used for molecular replacement. The exceptions to this rule are the HETATM records for MSE (seleno-methionine) MSO (seleno-methionine selenoxide) CSE (seleno-cysteine) CSO (seleno-cysteine selenoxide) ALY (acetyllysine) MLY (n-dimethyl-lysine) and MLZ (n-methyl-lysine) which are used in the scattering and carried through to output with their original occupancy. If you wish to include any HETATM records in the scattering the record name use the keyword ENSE modlid HETATOM ON<br />
<br />
=====WATER=====<br />
Water molecules (identified by the residue name OW WAT HOH H2O OH2 MOH WTR or TIP) are deleted from the pdb file on input, are not used in the scattering and are not carried through to file output. If you want to retain water molecules you will need to change the residue name to something other than this (e.g. WWW) so that the atoms are not identified as water. To include the water molecules in the scattering, the HETATM records will also have to be changed to ATOM records as described above.<br />
<br />
===Building an Ensemble from Electron Density===<br />
When using density as a model, it is necessary to specify both the extent (x,y,z limits) of the cut-out region of density, and the centre of this region. With coordinates, Phaser can work this out by itself. This information is needed, for instance, to decide how large rotational steps can be in the rotation search and to carry out the molecular transform interpolation correctly. In the case of electron density, the RMS value does not have the same physical meaning that it has when the model is specified by atomic coordinates, but it is used to judge how the accuracy of the calculated structure factors drops off with resolution. A suitable value for RMS can be obtained, in the case of density from an experimentally-phased map, by choosing a value that makes the SigmaA curve fall off with resolution similarly to the mean figures-of-merit. In the case of density from an EM image reconstruction, the RMS value should make the SigmaA curve fall off similarly to a Fourier correlation curve used to judge the resolution of the EM image.<br />
<br />
For detailed information, including a tutorial with example scripts, see<br />
[[Using Electron Density as a Model| Using density as a model]]<br />
<br />
==How to Define Composition==<br />
The composition defines the total amount of protein and nucleic acid that you have in the asymmetric unit not the fraction of the asymmetric unit that you are searching for.<br />
<br />
===Default Composition===<br />
For convenience, the composition defaults to 50% protein scattering by volume (the average for protein crystals). It is better to enter it explicitly, even if only to check that you have correctly deduced the probable content of your crystal. If your crystal has higher or lower solvent content than this, or contains nucleic acid, then the composition should be entered explicitly.<br />
===Composition by Solvent Content===<br />
Scattering is determined from the solvent content of the crystal, assuming that the crystal contains protein only, and the average distribution of amino acids in protein. If your crystal contains nucleic acid or your protein has an unusual amino acid distribution then the composition should be entered explicitly using the MW or sequence options.<br />
===Composition by Number of Residues in ASU===<br />
Scattering is determined from the number of residues in the asymmetric unit, assuming that the crystal contains protein only or nucleic acid only, and assuming an average distribution of residues for either. If your crystal contains a mixture then the composition should be entered explicitly using the MW or sequence options. If your crystal has an unusual residue distribution then the composition should be entered explicitly using the sequence options.<br />
===Composition by Molecular Weight===<br />
The composition is calculated from the molecular weight of the protein and nucleic acid assuming the protein and nucleic acid have the average distribution of amino acids and bases. If your protein or nucleic acid has an unusual amino acid or base distribution the composition should be entered by sequence. You can mix compositions entered by molecular weight with those entered by sequence.<br />
===Composition by Sequence===<br />
The composition is calculated from the amino acid sequence of the protein and the base sequence of the nucleic acid in fasta format. You can mix compositions entered by molecular weight with those entered by sequence. Individual atoms can be added to the composition with the COMPOSITION ATOM keyword. This allows the explicit addition of heavy atoms in the structure e.g. Fe atoms.<br />
===Composition by Percentage Scattering===<br />
The fraction scattering of each ensemble can be entered directly. The fraction scattering of each ensemble is normally automatically worked out from the average scattering from each ensemble (calculated from the pdb files if entered as coordinates, or from the protein and nucleic acid molecular weights if entered as a map) divided by the total scattering given by the composition, but entering the fraction scattering directly overrides this calculation. This option is for use when the pdb files of the models in the ensemble are unusual e.g. consist only of C-alpha atoms, or only of hydrogen atoms (as in the CLOUDS method for NMR).<br />
<br />
==How to Define Solutions==<br />
Phaser writes out files ending in ".sol" and ".rlist" that contain the solution information from the job. The root of the files is given by the ROOT keyword. By default, the root filename is PHASER. These files can be read back into subsequent runs of Phaser to build up solutions containing more than one molecule in the asymmetric unit.<br />
<br />
"PHASER.sol" files are generated by all modes (rotation function modes with VERBOSE output), and contain the current idea of potential molecular replacement solutions.<br />
<br />
"PHASER.rlist" files are generated by the rotation function modes, and are used as input for performing translation functions.<br />
<br />
For simple MR cases you don't really need to know how to define molecular replacement solutions. However, for difficult cases you might need to edit the files "PHASER.sol" and "PHASER.rlist" files manually<br />
<br />
=== "sol" Files===<br />
SOLUtion 6DIM keywords describe Ensembles that have been oriented by a rotation search and positioned by a translation search. Each Ensemble in the asymmetric unit has its own SOLUtion keyword. When more than one (potential) molecular replacement solution is present, the solutions are separated with the SOLUTION SET keywords.<br />
<br />
==="rlist" Files===<br />
These files define a rotation function list. The peak list is given with a series of SOLUtion TRIAl keywords.<br />
<br />
If a partial solution is already known, then the information for the currently "known" parts of the asymmetric unit is given in the form used for the PHASER.sol file, followed by the list of trial orientations for which a translation function is to be performed.<br />
<br />
===Fixed partial structure===<br />
If you have the coordinates of a partial solution with the pdb coordinates of the known structure in the correct orientation and position, then you can force Phaser to use these coordinates. Use the SOLUTION keyword to fix a rotation of 0 0 0 and a position of 0 0 0 for these coordinates.<br />
<br />
==How to Select Peaks==<br />
<br />
<br />
<br />
The selection of peaks saved for output in the rotation and translation functions can be done in four different ways.<br />
*'''Select by Percentage'''<br />
*: Percentage of the top peak, where the value of the top peak is defined as 100% and the value of the mean is defined as 0%.<br />
*: Default, cutoff=75%. This criteria has the advantange that at least one peak (the top peak) always survives the selection. If the top solution is clear, then only the one solution will be output, but if the distribution of peaks is rather flat, then many peaks will be output for testing in the next part of the MR procedure (e.g. many peaks selected from the rotation function for testing with a translation function). <br />
*'''Select by Z-score'''<br />
*: Number of standard deviations (sigmas) over the mean (the Z-score). <br />
*: Absolute significance test. Not all searches will produce output if the cutoff value is too high (e.g. 5 sigma). <br />
*'''Select by Number'''<br />
*: Number of top peaks to select. <br />
*: If the distribution is very flat then it might be better to select a fixed large number (e.g. 1000) of top rotation peaks for testing in the translation function.<br />
*'''No selection'''<br />
*: All peaks are selected. <br />
*: Enables full 6 dimensional searches, where all the solutions from the rotation function are output for testing in the translation function. This should never be necessary; it would be much faster and probably just as likely to work if the top 1000 peaks were used in this way.<br />
<br />
[[Image:Phaser_selection.gif| Selection criteria]]<br />
<br />
Peaks can also be clustered or not clustered prior to selection in steps 1 and 2.<br />
*'''Clustering Off'''<br />
: All high peaks on the search grid are selected<br />
*'''Clustering On'''<br />
: Points on the search grid with higher neighbouring points are removed from the selection<br />
<br />
<br />
[[Image:Phaser_clustering.gif| Clustering]]<br />
<br />
==How to Control Output==<br />
The output of Phaser can be controlled with optional keywords. <br />
<br />
The ROOT keyword is not compulsory (the default root filename is "PHASER"), but should always be given, so that your jobs have separate and meaningful output filenames.<br />
<br />
The TOPFiles keyword controls the number of potential MR solutions for which PDB and (in the appropriate modes) MTZ files are produced.<br />
<br />
For the MR_AUTO, MR_RNP and MR_LLG modes, unless HKLOut OFF is given as an optional keyword, Phaser produces an MTZ file with "SigmaA" type weighted Fourier map coefficients for producing electron density maps for rebuilding.<br />
<br />
{| class="wikitable" style="text-align:left" width=100%<br />
|-<br />
! MTZ Column Labels !! Description<br />
|-<br />
| FWT/PHWT || Amplitude and phase for 2''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| DELFWT/PHDELWT || Amplitude and phase for ''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| FOM || ''m'', analogous to the "Sim" weight, to estimate the reliability of &alpha;<sub>calc</sub><br />
|-<br />
| HLA/HLB/HLC/HLD || Hendrickson-Lattman coefficients encoding the phase probability distribution<br />
|}<br />
<br />
==Translational Non-crystallographic Symmetry==<br />
<br />
<span style="color:crimson">'''*Warning*''' Solution by MR in the presence of translational non-crystallographic symmetry is not fully automated.</span><br />
<br />
Phaser calculates correction factors for the expected intensities in the presence of translational non-crystallographic symmetry (tNCS), and is able to solve structures with complex patterns of tNCS. '''However, the use of Phaser in the presence of tNCS requires the nature of the tNCS to be understood by the user.''' In simple cases, solution is no more difficult than solution without tNCS, but in complex cases, separate Phaser runs with tNCS turned on and off, and/or the use of different tNCS vectors, may be necessary.<br />
<br />
The output of Phaser will help the user in detecting and understanding the tNCS, but '''the tNCS is not completely characterised by Phaser'''. The default behaviour may or may not be correct for the particular crystal under study.<br />
<br />
Characterization of the tNCS involves understanding the number of copies of the molecule in the asymmetric unit and the translation vectors between them. Molecules related by a tNCS vector will have an associated peak in the native Patterson. Phaser calculates the native Patterson (MODE TNCS) and lists the peaks that are more than 20% of the origin peak. Any given crystal with tNCS may have one or more peaks meeting this criteria.<br />
<br />
===Default tNCS detection and correction===<br />
<span style="color:crimson">Documentation for Phaser-2.7.16 and above</span><br />
<br />
====No tNCS====<br />
No tNCS correction is applied by default if there is<br />
# no peak in the native Patterson <br />
# more than one peak in the native Patterson over 20% of the origin and these peaks are not all the result of a commensurate modulation<br />
<br />
====Pairs of molecules====<br />
By default, if Phaser detects a peak in the native Patterson then Phaser will search for molecules in pairs related by the tNCS vector given by the peak in the native Patterson.<br />
<br />
This will be the correct behaviour if and only if there are an even number of copies of the molecule in the asymmetric unit, clustered into two groups related by a single tNCS vector. There will only be one significant peak in the native Patterson. Fortunately, this is a reasonably common scenario.<br />
<br />
Phaser refines the relative orientation of the molecules in the two groups (rotations of up to 10 degrees will still give rise to a significant native Patterson peak) and uses this information to generate expected intensity factors for the reflections. Solution should be straightforward, with the usual caveat for MR that there is a sufficiently good model.<br />
<br />
Where there is a single peak in the native Patterson, it is often located at a position half way along a unit cell axis or diagonal, representing a pseudo-halving of the unit cell dimensions. However, Phaser is by no means restricted to these sorts of pseudo-cells in its handling of two-fold tNCS, and the tNCS vector can be in a general position.<br />
<br />
===Non-default tNCS correction===<br />
====Higher order tNCS====<br />
Frequently, tNCS does not associate 2 clusters of molecules in the asymmetric unit, but rather there are 3 or more (n) clusters of molecules associated by a series of vectors that are multiples of 1, 2, 3 ... (n-1) times a basic translation vector. Where n times the basic translation vector equates to (very close to) integer multiples of unit cell axes, the tNCS represents a pseudo-cell, and this case is known as commensurate modulation. <br />
<br />
Phaser attempts to automatically detect commensurate modulation. The peaks of the native Patterson are analyzed to find the n-fold relationship. The series will not generally have all peaks the same height. Lower peaks in the series represent relationships where the relative rotations between related molecules are larger. Missing peaks in the series may be below the default 20% of origin cut-off. This can be lowered with TNCS PATT PERCENT <x><br />
<br />
Phaser then sets TNCS NMOL <n> and the vector for the tNCS, and searches for ensembles in multiples of NMOL.<br />
<br />
When there are more than two molecules related by tNCS, Phaser does not refine the orientations between the molecules related by the tNCS.<br />
<br />
However, as for two-fold tNCS, Phaser is not restricted to these sorts of pseudo-cells and the basic tNCS vector can be in a general position, as can the number of copies.<br />
<br />
'''The automatic detection may not give the true tNCS relationship'''. For example, the true commensurate modulation may be a factor of the NMOL automatically detected by Phaser, or there may not be commensurate modulation at all, or commensurate modulation may not be found with the default Pattesron peak height cutoff. In difficult cases, please inspect the Patterson for peaks.<br />
<br />
====Complex tNCS====<br />
If there are many molecules in the asymmetric unit but they are not all related by tNCS, or there are sub-groups of molecules related by different tNCS vectors, then the modulations of the expected intensities due to the tNCS will be much less significant than the cases described above. '''In these cases it is possible that structure solution will be achieved without any tNCS correction factors being applied.''' Indeed, searching for all the copies as tNCS-related multiples when some molecules are not related by tNCS will cause structure solution to fail. To turn off the automatic detection and use of tNCS use the keyword TNCS USE OFF.<br />
<br />
If turning off the TNCS correction factors fails to give a solution, then a good approach is to proceed step-wise. Consider the highest native Patterson peak first and determine that nature of the tNCS associated with it. Use the appropriate correction factors to locate all the molecules with this tNCS. Then take the second independent native Patterson peak and apply the correction factors associated with it to find the second set of molecules, fixing the first, etc. Finally, turn TNCS off to find any orphan molecules.</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Molecular_Replacement&diff=2427Molecular Replacement2018-02-07T17:16:36Z<p>Rdo20: /* Building an Ensemble from Coordinates */</p>
<hr />
<div><div style="margin-left: 25px; float: right;">__TOC__</div><br />
<br />
'''Quicklink to example scripts''' → [[MR using keyword input]]<br />
<br />
'''Quicklink to phaser.famos (find_alt_orig_sym_mate) documentation''' → [[Famos]]<br />
<br />
Phaser should be able to solve most structures with the Automated Molecular Replacement mode, and this is the first mode that you should try. Give Phaser your data ([[#How to Define Data|How to Define Data]]) and your models ([[#How to Define Models|How to Define Models]]), tell Phaser what to search for, and a list of possible spacegroups (in the same point group).<br />
<br />
If this doesn't work (see [[#Has Phaser Solved It?| Has Phaser Solved It?]]), you can try selecting peaks of lower significance in the rotation function in case the real orientation was not within the selection criteria. By default peaks above 75% of the top peak are selected (see [[#How to Select Peaks| How to Select Peaks]]). See [[#What to do in Difficult Cases| What to do in Difficult Cases]] for more hints and tips. If the automated molecular replacement mode doesn't work even with non-default input you need to run the modes of Phaser separately. The possibilities are endless - you can even try exhaustive searches (translations of all orientations) if you want - but experience has shown that most structures that can be solved by Phaser can be solved by relatively simple strategies.<br />
<br />
==Automated Molecular Replacement==<br />
Automated Molecular Replacement combines the anisotropy correction, likelihood enhanced fast rotation function, likelihood enhanced fast translation function, packing and refinement modes for multiple search models and a set of possible spacegroups to automatically solve a structure by molecular replacement. Top solutions are output to the files FILEROOT.sol, FILEROOT.#.mtz and FILEROOT.#.pdb (where "#" refers to the sorted solution number, 1 being the best, and only 1 is output by default). Many structures can be solved by running an automated molecular replacement search with defaults, giving the ensembles that you expect to be easiest to find first.<br />
<br />
At the completion of Molecular Replacement you may wish to place your solutions on a common origin with a previous solution, for which [[Famos | Famos ]] can be used.<br />
<br />
[[Image:Phaser_MR_auto.gif|Flow Diagram for Automated MR]]<br />
<br />
==Should Phaser Solve It?==<br />
The difficulty of a molecular replacement problem depends primarily on two major factors: how well the model will be able to explain the diffraction data (which depends both on the accuracy of the model and on its completeness), and how many reflections can be explained, at least in part. Each reflection provides a piece of information that helps to identify correct MR solutions.<br />
<br />
It is possible to make a reasonable prediction of whether or not a solution will be found. If the quality of the model (its accuracy and completeness) can be estimated, then the expected contribution of each reflection to the total LLG can also be estimated. From a large battery of tests, we know that an LLG of 40 or greater usually indicates a correct solution (at least in the absence of complicating factors such as translational non-crystallographic symmetry, tNCS). Building on this understanding, if it is estimated that the LLG will be 60 or less, then Phaser will assume that the problem is a difficult one, and will implement search procedures optimised for difficult problems.<br />
<br />
==What Resolution of Data Should be Used?==<br />
The signal for a molecular replacement solution should be very clear if the expected value of the LLG is much higher than the minimum required to be fairly certain of a solution. Currently Phaser aims for a minimum LLG of 120 and, if it is possible to achieve an even higher value, given the quality of the model and the quantity of diffraction data, then the resolution for the initial search is limited to the value required to achieve an expected LLG of 120. Data to the full resolution are still used for a final rigid-body refinement, or in a second pass if a clear solution is not found in the first attempt.<br />
<br />
However, if the model is expected to have a large RMS error (based usually on the correlation between sequence identity and RMS error), then data to high resolution will not contribute any significant signal. Regardless of the expected LLG at the highest resolution limit, the resolution used is limited to 1.8 times the estimated RMS error of the model, because this resolution limit gives about 99% of the LLG that could be achieved.<br />
<br />
Because Phaser implements strategies designed to solve structures with as much confidence as possible, as efficiently as possible, it is best to leave the choice of resolution to Phaser, at least in the first instance.<br />
<br />
==Has Phaser Solved It?==<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! TF Z-score !! Have I solved it?<br />
|-<br />
| less than 5 || no<br />
|-<br />
| 5 - 6 || unlikely<br />
|-<br />
| 6 - 7 || possibly<br />
|-<br />
| 7 - 8 || probably<br />
|-<br />
| more than 8* ||definitely<br />
|-<br />
|colspan="2" style="text-align: center;" | *''6 for 1st model in monoclinic space groups''<br />
|} <br />
<br />
Ideally, a unique solution with a strong signal will be found at the end of the search. If you are searching for multiple components, then ideally the search for each component will also give a strong signal. However if the signal-to-noise of your search is low, there will be noise peaks and multiple ambiguous solutions. Signal-to-noise is judged using the '''Z-score''', which is computed by comparing the LLG values from the rotation or translation search with LLG values for a set of random rotations or translations. The mean and the RMS deviation from the mean are computed from the random set, then the Z-score for a search peak is defined as its LLG minus the mean, all divided by the RMS deviation, ''i.e. '' '''the number of standard deviations above (or below) the mean. '''<br />
<br />
For a rotation function, the correct orientation may be well down the list with a Z-score (number of standard deviations above the mean value, or RFZ) under 4, and it is often not possible to identify the correct orientation until a translation function is performed and yields a clear solution. Note that the signal-to-noise of the rotation function drops with increasing number of primitive symmetry operations (the number of different orientations for symmetry-related molecules), because there is more uncertainty about how the structure factor contributions from symmetry-related copies will add up.<br />
<br />
For a translation function the correct solution will generally have a Z-score (TFZ) over 5 and be well separated from the rest of the solutions. Of course, there will always be exceptions! The table gives a very rough guide to interpreting TFZ scores. This table will be updated, as we learn more from systematic molecular replacement trials.<br />
<br />
When you are searching for multiple components, the signal may be low for the first few components but, as the model becomes more complete, the signal should become stronger. Finding a clear solution for a new component is a good sign that the partial solution to which that component was added was indeed correct.<br />
<br />
You should always at least glance through the summary of the logfile. One thing to look for, in particular, is whether any translation solutions with a high Z-score have been rejected by the packing step. By default up to 5 percent of marker atoms (C-alpha atoms for protein) are allowed to be involved in clashes. A solution with more clashes may still be correct, and the clashes may arise only because of differences in small surface loops. If this happens, repeat the run allowing a suitable number of clashes. Note that, unless there is specific evidence in the logfile that a high TFZ-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond the default will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
<br />
Note that, by default, Phaser will produce a single PDB file corresponding to the top solution found (if any), so finding a single PDB file in your output directory is not an indication that the search succeeded! You have to look, at least, at the summary of the logfile, or at the list of possible solutions in the .sol file that is produced if you run Phaser from ccp4i or command-line scripts.<br />
<br />
==Annotation==<br />
<br />
A highly compact summary of the history of the statistics of a solution is given in the SOLUTION SET in the .sol file. This is a good place to start your analysis of the output. The annotation gives the Z-score of the solution at each rotation and translation function, the number of clashes in the packing, and the refined LLG.<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! Annotation !! Meaning<br />
|-<br />
| RFZ= || Rotation Function Z-score<br />
|-<br />
| TFZ= || Translation Function Z-score<br />
|-<br />
| PAK= || Number of packing clashes<br />
|-<br />
| LLG= || LLG after refinement. Will be repeated when a low resolution refinement is followed by a high resolution refinement.<br />
|-<br />
| TFZ== || Translation Function Z-score equivalent, only calculated for the top solution after refinement (or for the number of top files specified by TOPFILES)<br />
|-<br />
| RF++ || Rotation angle from previous strong solution has been used in the addition of next solution<br />
|-<br />
| RF*0 || Rotation angle 000 identified by low R-factor of input model<br />
|-<br />
| TFZ=* || First molecule in P1 (arbitrary origin, no Translation Function required)<br />
|-<br />
| TF*0 || Translation vector 000 identified by low R-factor of input model<br />
|-<br />
| (&&nbsp;... & ...) || Set of TFZ PAK and LLG values for placements that were amalgamated (more than one placement from a single Translation Function)<br />
|-<br />
| LLG+=(...&nbsp;&&nbsp;...)&nbsp;|| Set of LLG values calculated during amalgamation, which will always be increasing in value<br />
|-<br />
| +TNCS || Components added by Translational NCS relation<br />
|-<br />
| *T=<i>n</i> || Solution matches template solution <i>n</i><br />
|} <br />
<br />
Two versions of TFZ (the translation function Z-score) now appear for each component. The first ("TFZ=") is the Z-score from the actual translation search, which depends on the accuracy of the orientation used for that search. The second ("TFZ==") is the TFZ-equivalent, which indicates what the TFZ score would have been with the correct (refined) orientation. You should see the TFZ-equivalent is high at least for the final components of the solution, and that the LLG (log-likelihood gain) increases as each component of the solution is added. For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU SET RFZ=10.7 TFZ=24.3 PAK=0 LLG=472 TFZ==24.7 RFZ=6.4 TFZ=24.4 PAK=0 LLG=1006 TFZ==29.7 LLG=1006 TFZ==29.7<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
Note that the Euler angles in Phaser follow the same convention as those defined for the Crowther fast rotation function, i.e. z-y-z (rotate around the z-axis, followed by the new y-axis, followed by the new z-axis).<br />
<br />
==History==<br />
<br />
A highly compact summary of the history of the peak positions of a solution is given in the SOLUTION HISTORY in the .sol file. Together with the SOLUTION SET annotation, this is useful in your analysis of the output. <br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! History !! Meaning<br />
|-<br />
| RF/TF(r/t:n) || (r) Rotation Function peak number/(t) Translation Function peak number for the rotation function : (n) number of peak in final merged and sorted list<br />
|-<br />
| PAK(n:m) || (n) input solution number : (m) output solution number after packing condition applied<br />
|-<br />
| RNP(m,a,b,c,... : p) || All input peaks amalgamated after refinement to give output solution number (m and others): (p) output solution number<br />
|-<br />
| FUSE(A,B,C) || Solution numbers merged in amalgamation<br />
|} <br />
<br />
For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU HISTORY RF/TF(1/1:1)PAK(1:1)RNP(1:1)RNP(1:1)<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
A more complicated structure solution may have<br />
<br />
SOLU HISTORY RF/TF(7/1:10)PAK(10:10)RNP(10,12,13,11,17,16,18,25,3,8,22,21,20,7,969,6,5,201,9,4,390,2,1,19:1)RNP(1:1)<br />
<br />
==What to do in Difficult Cases==<br />
<br />
Not every structure can be solved by molecular replacement, but the right strategy can push the limits. What to do when the default jobs fail depends on why your structure is difficult.<br />
*'''Flexible Structure'''<br />
*:The relative orientations of the domains may be different in your crystal than in the model. If that may be the case, break the model into separate PDB files containing rigid-body units, enter these as separate ensembles, and search for them separately. If you find a convincing solution for one domain, but fail to find a solution for the next domain, you can take advantage of the knowledge that its orientation is likely to be similar to that of the first domain. The ROTAte&nbsp;AROUnd option of the brute rotation search can be used to restrict the search to orientations within, say, 30 degrees of that of the known domain. Allow for close approach of the domains by increasing the allowed clashes with the PACK keyword by, say, 1 for each domain break that you introduce. Note that it is possible to use the brute rotation search as part of the automated molecular replacement pipeline, by changing the choice of the type of rotation search. Alternatively, you could try generating a series of models perturbed by normal modes, with the NMAPdb keyword. One of these may duplicate the hinge motion and provide a good single model.<br />
*'''Poor or Incomplete Model'''<br />
*:Signal-to-noise is reduced by coordinate errors or incompleteness of the model. Since the rotation search has lower signal to begin with than the translation search, it is usually more severely affected. For this reason, it can be very useful to use the subsequent translation search as a way to choose among many (say 1000) orientations. THe MR_AUTO FAST search mode automatically reduces the cutoff for accepting peaks from the fast rotation function if the decault pass does not find a solution with a high z-score, but you can manually reduce this further with the PEAKS and PURGE keywords. You can also try turning off the clustering of fast rotation function peaks because the correct orientation may sit on the shoulder of a peak in the rotation function. <br />
*:As shown convincingly by Schwarzenbacher ''et al.'' (Schwarzenbacher, Godzik, Grzechnik &amp; Jaroszewski, ''Acta Cryst.'' D'''60''', 1229-1236, 2004), judicious editing can make a significant difference in the quality of a distant model. In a number of tests with their data on models below 30% sequence identity, we have found that Phaser works best with a "mixed model" (non-identical sidechains longer than Ser replaced by Ser). In agreement with their results, the best models are generally derived using more sophisticated alignment protocols, such as their FFAS protocol. Use [http://www.phenix-online.org/documentation/sculptor.htm phenix.sculptor] to edit your model.<br />
*'''High Degree of Non-crystallographic Symmetry'''<br />
*:If there are clear peaks in the self-rotation function, you can expect orientations to be related by this known NCS. Methods to automatically use such information will be implemented in a future version of Phaser. In the meantime, you can work out for yourself the orientations that would be consistent with NCS and use the ROTAte&nbsp;AROUnd option to sample similar orientations. Alternatively, you may have an oligomeric model and expect similar NCS in the crystal. First search with the oligomeric model; if this fails, search with a monomer. If that succeeds, you can again use the ROTAte&nbsp;AROUnd option to force a subsequent monomer to adopt an orientation similar to the one you expect.<br />
*'''What <u>not</u> to do'''<br />
*:The automated mode of Phaser is fast when Phaser finds a high Z-score solution to your problem. When Phaser cannot find a solution with a significant Z-score, it "thrashes", meaning it maintains a list of 100-1000's of low Z-score potential solutions and tries to improve them. This can lead to exceptionally long Phaser runs (over a week of CPU time). Such runs are possible because the highly automated script allows many consecutive MR jobs to be run without you having to manually set 100-1000's of jobs running and keep track of the results. "Thrashing" generally does not produce a solution: solutions generally appear relatively quickly or not at all. It is more useful to go back and analyse your models and your data to see where improvements can be made. Your system manager will appreciate you terminating these jobs.<br />
*:It is also not a good idea to effectively remove the packing test. Unless there is specific evidence in the logfile that a high TF-function Z-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond a few (e.g. 1-5) will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
*'''Other suggestions'''<br />
*:Phaser has powerful input, output and scripting facilities that allow a large number of possibilities for altering default behaviour and forcing Phaser to do what you think it should. However, you will need to read the information in the manual below to take advantage of these facilities!<br />
<br />
==How to Define Data==<br />
You need to tell Phaser the name of the mtz file containing your data and the columns in the mtz file to be used using the HKLIn and LABIn keywords. Additional keywords (BINS CELL OUTLier RESOlution SPACegroup) define how the data are used.<br />
<br />
==How to Define Models==<br />
Phaser must be given the models that it will use for molecular replacement. A model in Phaser is referred to as an "ensemble", even when it is described by a single file. This is because it is possible to provide a set of aligned structures as an ensemble, from which a statistically-weighted averaged model is calculated. A molecular replacement model is provided either as one or more aligned pdb files, or as an electron density map, entered as structure factors in an mtz file. Each ensemble is treated as a separate type of rigid body to be placed in the molecular replacement solution. An ensemble should only be defined once, even if there are several copies of the molecule in the asymmetric unit.<br />
<br />
Fundamental to the way in which Phaser uses MR models (either from coordinates or maps) is to estimate how the accuracy of the model falls off as a function of resolution, represented by the Sigma(A) curve. To generate the Sigma(A) curve, Phaser needs to know the RMS coordinate error expected for the model and the fraction of the scattering power in the asymmetric unit that this model contributes.<br />
<br />
A Babinet-style correction is used to account for the effects of disordered solvent on the completeness of the model at low resolution.<br />
<br />
Molecular replacement models are defined with the ENSEmble keyword and the COMPosition keyword. The ENSEmble keyword gives (amongst other things) the RMS deviation for the Sigma(A) curve. The COMPosition keyword is used to deduce the fraction of the scattering power in the asymmetric unit that each ensemble contributes. The composition of the asymmetric unit is defined either by entering the molecular weights or sequences of the components in the asymmetric unit, and giving the number of copies of each. Expert users can also enter the fraction of the scattering of each component directly, although the composition must still be entered for the absolute scale calculation. Please note that the composition supplied to Phaser has to include everything in the asymmetric unit, not just what is being looked for in the current search!<br />
<br />
===Building an Ensemble from Coordinates===<br />
The RMS deviation is determined directly from RMS or indirectly from IDENtity in the ENSEmble<br />
keyword using a formula that depends on the sequence identity and the number of residues in the model.<br />
<br />
The RMS deviation estimated from ID may be an underestimate of the true value if there is a slight conformational change between the model and target structures. To find a solution in these cases it may be necessary to increase the RMS from the default value generated from the ID, by say 0.5 Angstroms. On the other hand, when Phaser succeeds in solving a structure from a model with sequence identity much below 30%, it is often found that the fold is preserved better than the average for that level of sequence identity. So it may be worth submitting a run in which the RMS error is set at, say, 1.5, even if the sequence identity is low. The table below can be used as a guide as to the default RMS value corresponding to ID.<br />
<br />
If you construct a model by homology modelling, remember that the RMS error you expect is essentially the error you expect from the template structure (if not worse!). So specify the sequence identity of the template, not of the homology model.<br />
<br />
Only the model with the highest sequence identity is reported in the output pdb file. Also, HETATM cards in the input pdb file are ignored in the calculation of the structure factors for the ensemble, but are carried through to the output pdb file. Thus, the phases on the output mtz file (which come from the structure factors of the ensemble) do not correspond to those that would be calculated from the output pdb file, when there is more than one pdb file in an ensemble and/or the pdbfile(s) have HETATM records.<br />
<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! !! #50 !! #100 !! #200 !! #300 !! #400 !! #600 !! #850 !! #1000 !! #1500 !! #2000<br />
|-<br />
|'''ID=0%''' || 1.579 || 1.689 || 1.875 || 2.030 || 2.164 || 2.391 || 2.625 || 2.748 || 3.093 || 3.375<br />
|-<br />
|'''ID=10%''' || 1.356 || 1.451 || 1.610 || 1.743 || 1.858 || 2.053 || 2.255 || 2.360 || 2.657 || 2.899<br />
|-<br />
|'''ID=20%''' || 1.165 || 1.246 || 1.383 || 1.497 || 1.596 || 1.764 || 1.936 || 2.027 || 2.281 || 2.489<br />
|-<br />
|'''ID=30%''' || 1.000 || 1.070 || 1.188 || 1.286 || 1.371 || 1.515 || 1.663 || 1.741 || 1.959 || 2.138<br />
|-<br />
|'''ID=40%''' || 0.859 || 0.919 || 1.020 || 1.104 || 1.177 || 1.301 || 1.428 || 1.495 || 1.683 || 1.836<br />
|-<br />
|'''ID=50%''' || 0.738 || 0.789 || 0.876 || 0.948 || 1.011 || 1.117 || 1.227 || 1.284 || 1.445 || 1.577<br />
|-<br />
|'''ID=60%''' || 0.634 || 0.678 || 0.752 || 0.814 || 0.868 || 0.959 || 1.053 || 1.103 || 1.241 || 1.354<br />
|-<br />
|'''ID=70%''' || 0.544 || 0.582 || 0.646 || 0.699 || 0.746 || 0.824 || 0.905 || 0.947 || 1.066 || 1.163<br />
|-<br />
|'''ID=80%''' || 0.467 || 0.500 || 0.555 || 0.601 || 0.640 || 0.708 || 0.777 || 0.813 || 0.915 || 0.999<br />
|-<br />
|'''ID=90%''' || 0.401 || 0.429 || 0.477 || 0.516 || 0.550 || 0.608 || 0.667 || 0.698 || 0.786 || 0.858<br />
|-<br />
|'''ID=100%''' || 0.345 || 0.369 || 0.409 || 0.443 || 0.472 || 0.522 || 0.573 || 0.600 || 0.675 || 0.737<br />
|-<br />
|}<br />
<br />
<br />
====Coordinate Editing====<br />
=====HETATM/LIGANDS=====<br />
Phaser ignores the scattering from HETATM records. The HETATM records are carried though to output with occupancy set to zero. Ligands will therefore not contribute to the scattering used for molecular replacement. The exceptions to this rule are the HETATM records for MSE (seleno-methionine) MSO (seleno-methionine selenoxide) CSE (seleno-cysteine) CSO (seleno-cysteine selenoxide) ALY (acetyllysine) MLY (n-dimethyl-lysine) and MLZ (n-methyl-lysine) which are used in the scattering and carried through to output with their original occupancy. If you wish to include any HETATM records in the scattering the record name use the keyword ENSE modlid HETATOM ON<br />
<br />
=====WATER=====<br />
Water molecules (identified by the residue name OW WAT HOH H2O OH2 MOH WTR or TIP) are deleted from the pdb file on input, are not used in the scattering and are not carried through to file output. If you want to retain water molecules you will need to change the residue name to something other than this (e.g. WWW) so that the atoms are not identified as water. To include the water molecules in the scattering, the HETATM records will also have to be changed to ATOM records as described above.<br />
<br />
===Building an Ensemble from Electron Density===<br />
When using density as a model, it is necessary to specify both the extent (x,y,z limits) of the cut-out region of density, and the centre of this region. With coordinates, Phaser can work this out by itself. This information is needed, for instance, to decide how large rotational steps can be in the rotation search and to carry out the molecular transform interpolation correctly. In the case of electron density, the RMS value does not have the same physical meaning that it has when the model is specified by atomic coordinates, but it is used to judge how the accuracy of the calculated structure factors drops off with resolution. A suitable value for RMS can be obtained, in the case of density from an experimentally-phased map, by choosing a value that makes the SigmaA curve fall off with resolution similarly to the mean figures-of-merit. In the case of density from an EM image reconstruction, the RMS value should make the SigmaA curve fall off similarly to a Fourier correlation curve used to judge the resolution of the EM image.<br />
<br />
For detailed information, including a tutorial with example scripts, see<br />
[[Using Electron Density as a Model| Using density as a model]]<br />
<br />
==How to Define Composition==<br />
The composition defines the total amount of protein and nucleic acid that you have in the asymmetric unit not the fraction of the asymmetric unit that you are searching for.<br />
<br />
===Default Composition===<br />
For convenience, the composition defaults to 50% protein scattering by volume (the average for protein crystals). It is better to enter it explicitly, even if only to check that you have correctly deduced the probable content of your crystal. If your crystal has higher or lower solvent content than this, or contains nucleic acid, then the composition should be entered explicitly.<br />
===Composition by Solvent Content===<br />
Scattering is determined from the solvent content of the crystal, assuming that the crystal contains protein only, and the average distribution of amino acids in protein. If your crystal contains nucleic acid or your protein has an unusual amino acid distribution then the composition should be entered explicitly using the MW or sequence options.<br />
===Composition by Number of Residues in ASU===<br />
Scattering is determined from the number of residues in the asymmetric unit, assuming that the crystal contains protein only or nucleic acid only, and assuming an average distribution of residues for either. If your crystal contains a mixture then the composition should be entered explicitly using the MW or sequence options. If your crystal has an unusual residue distribution then the composition should be entered explicitly using the sequence options.<br />
===Composition by Molecular Weight===<br />
The composition is calculated from the molecular weight of the protein and nucleic acid assuming the protein and nucleic acid have the average distribution of amino acids and bases. If your protein or nucleic acid has an unusual amino acid or base distribution the composition should be entered by sequence. You can mix compositions entered by molecular weight with those entered by sequence.<br />
===Composition by Sequence===<br />
The composition is calculated from the amino acid sequence of the protein and the base sequence of the nucleic acid in fasta format. You can mix compositions entered by molecular weight with those entered by sequence. Individual atoms can be added to the composition with the COMPOSITION ATOM keyword. This allows the explicit addition of heavy atoms in the structure e.g. Fe atoms.<br />
===Composition by Percentage Scattering===<br />
The fraction scattering of each ensemble can be entered directly. The fraction scattering of each ensemble is normally automatically worked out from the average scattering from each ensemble (calculated from the pdb files if entered as coordinates, or from the protein and nucleic acid molecular weights if entered as a map) divided by the total scattering given by the composition, but entering the fraction scattering directly overrides this calculation. This option is for use when the pdb files of the models in the ensemble are unusual e.g. consist only of C-alpha atoms, or only of hydrogen atoms (as in the CLOUDS method for NMR).<br />
<br />
==How to Define Solutions==<br />
Phaser writes out files ending in ".sol" and ".rlist" that contain the solution information from the job. The root of the files is given by the ROOT keyword. By default, the root filename is PHASER. These files can be read back into subsequent runs of Phaser to build up solutions containing more than one molecule in the asymmetric unit.<br />
<br />
"PHASER.sol" files are generated by all modes (rotation function modes with VERBOSE output), and contain the current idea of potential molecular replacement solutions.<br />
<br />
"PHASER.rlist" files are generated by the rotation function modes, and are used as input for performing translation functions.<br />
<br />
For simple MR cases you don't really need to know how to define molecular replacement solutions. However, for difficult cases you might need to edit the files "PHASER.sol" and "PHASER.rlist" files manually<br />
<br />
=== "sol" Files===<br />
SOLUtion 6DIM keywords describe Ensembles that have been oriented by a rotation search and positioned by a translation search. Each Ensemble in the asymmetric unit has its own SOLUtion keyword. When more than one (potential) molecular replacement solution is present, the solutions are separated with the SOLUTION SET keywords.<br />
<br />
==="rlist" Files===<br />
These files define a rotation function list. The peak list is given with a series of SOLUtion TRIAl keywords.<br />
<br />
If a partial solution is already known, then the information for the currently "known" parts of the asymmetric unit is given in the form used for the PHASER.sol file, followed by the list of trial orientations for which a translation function is to be performed.<br />
<br />
===Fixed partial structure===<br />
If you have the coordinates of a partial solution with the pdb coordinates of the known structure in the correct orientation and position, then you can force Phaser to use these coordinates. Use the SOLUTION keyword to fix a rotation of 0 0 0 and a position of 0 0 0 for these coordinates.<br />
<br />
==How to Select Peaks==<br />
<br />
<br />
<br />
The selection of peaks saved for output in the rotation and translation functions can be done in four different ways.<br />
*'''Select by Percentage'''<br />
*: Percentage of the top peak, where the value of the top peak is defined as 100% and the value of the mean is defined as 0%.<br />
*: Default, cutoff=75%. This criteria has the advantange that at least one peak (the top peak) always survives the selection. If the top solution is clear, then only the one solution will be output, but if the distribution of peaks is rather flat, then many peaks will be output for testing in the next part of the MR procedure (e.g. many peaks selected from the rotation function for testing with a translation function). <br />
*'''Select by Z-score'''<br />
*: Number of standard deviations (sigmas) over the mean (the Z-score). <br />
*: Absolute significance test. Not all searches will produce output if the cutoff value is too high (e.g. 5 sigma). <br />
*'''Select by Number'''<br />
*: Number of top peaks to select. <br />
*: If the distribution is very flat then it might be better to select a fixed large number (e.g. 1000) of top rotation peaks for testing in the translation function.<br />
*'''No selection'''<br />
*: All peaks are selected. <br />
*: Enables full 6 dimensional searches, where all the solutions from the rotation function are output for testing in the translation function. This should never be necessary; it would be much faster and probably just as likely to work if the top 1000 peaks were used in this way.<br />
<br />
[[Image:Phaser_selection.gif| Selection criteria]]<br />
<br />
Peaks can also be clustered or not clustered prior to selection in steps 1 and 2.<br />
*'''Clustering Off'''<br />
: All high peaks on the search grid are selected<br />
*'''Clustering On'''<br />
: Points on the search grid with higher neighbouring points are removed from the selection<br />
<br />
<br />
[[Image:Phaser_clustering.gif| Clustering]]<br />
<br />
==How to Control Output==<br />
The output of Phaser can be controlled with optional keywords. <br />
<br />
The ROOT keyword is not compulsory (the default root filename is "PHASER"), but should always be given, so that your jobs have separate and meaningful output filenames.<br />
<br />
The TOPFiles keyword controls the number of potential MR solutions for which PDB and (in the appropriate modes) MTZ files are produced.<br />
<br />
For the MR_AUTO, MR_RNP and MR_LLG modes, unless HKLOut OFF is given as an optional keyword, Phaser produces an MTZ file with "SigmaA" type weighted Fourier map coefficients for producing electron density maps for rebuilding.<br />
<br />
{| class="wikitable" style="text-align:left" width=100%<br />
|-<br />
! MTZ Column Labels !! Description<br />
|-<br />
| FWT/PHWT || Amplitude and phase for 2''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| DELFWT/PHDELWT || Amplitude and phase for ''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| FOM || ''m'', analogous to the "Sim" weight, to estimate the reliability of &alpha;<sub>calc</sub><br />
|-<br />
| HLA/HLB/HLC/HLD || Hendrickson-Lattman coefficients encoding the phase probability distribution<br />
|}<br />
<br />
==Translational Non-crystallographic Symmetry==<br />
<br />
<span style="color:crimson">'''*Warning*''' Solution by MR in the presence of translational non-crystallographic symmetry is not fully automated.</span><br />
<br />
Phaser calculates correction factors for the expected intensities in the presence of translational non-crystallographic symmetry (tNCS), and is able to solve structures with complex patterns of tNCS. '''However, the use of Phaser in the presence of tNCS requires the nature of the tNCS to be understood by the user.''' In simple cases, solution is no more difficult than solution without tNCS, but in complex cases, separate Phaser runs with tNCS turned on and off, and/or the use of different tNCS vectors, may be necessary.<br />
<br />
The output of Phaser will help the user in detecting and understanding the tNCS, but '''the tNCS is not completely characterised by Phaser'''. The default behaviour may or may not be correct for the particular crystal under study.<br />
<br />
Characterization of the tNCS involves understanding the number of copies of the molecule in the asymmetric unit and the translation vectors between them. Molecules related by a tNCS vector will have an associated peak in the native Patterson. Phaser calculates the native Patterson (MODE TNCS) and lists the peaks that are more than 20% of the origin peak. Any given crystal with tNCS may have one or more peaks meeting this criteria.<br />
<br />
===Default tNCS detection and correction===<br />
<span style="color:crimson">Documentation for Phaser-2.7.16 and above</span><br />
<br />
====No tNCS====<br />
No tNCS correction is applied by default if there is<br />
# no peak in the native Patterson <br />
# more than one peak in the native Patterson over 20% of the origin and these peaks are not all the result of a commensurate modulation<br />
<br />
====Pairs of molecules====<br />
By default, if Phaser detects a peak in the native Patterson then Phaser will search for molecules in pairs related by the tNCS vector given by the peak in the native Patterson.<br />
<br />
This will be the correct behaviour if and only if there are an even number of copies of the molecule in the asymmetric unit, clustered into two groups related by a single tNCS vector. There will only be one significant peak in the native Patterson. Fortunately, this is a reasonably common scenario.<br />
<br />
Phaser refines the relative orientation of the molecules in the two groups (rotations of up to 10 degrees will still give rise to a significant native Patterson peak) and uses this information to generate expected intensity factors for the reflections. Solution should be straightforward, with the usual caveat for MR that there is a sufficiently good model.<br />
<br />
Where there is a single peak in the native Patterson, it is often located at a position half way along a unit cell axis or diagonal, representing a pseudo-halving of the unit cell dimensions. However, Phaser is by no means restricted to these sorts of pseudo-cells in its handling of two-fold tNCS, and the tNCS vector can be in a general position.<br />
<br />
===Non-default tNCS correction===<br />
====Higher order tNCS====<br />
Frequently, tNCS does not associate 2 clusters of molecules in the asymmetric unit, but rather there are 3 or more (n) clusters of molecules associated by a series of vectors that are multiples of 1, 2, 3 ... (n-1) times a basic translation vector. Where n times the basic translation vector equates to (very close to) integer multiples of unit cell axes, the tNCS represents a pseudo-cell, and this case is known as commensurate modulation. <br />
<br />
Phaser attempts to automatically detect commensurate modulation. The peaks of the native Patterson are analyzed to find the n-fold relationship. The series will not generally have all peaks the same height. Lower peaks in the series represent relationships where the relative rotations between related molecules are larger. Missing peaks in the series may be below the default 20% of origin cut-off. This can be lowered with TNCS PATT PERCENT <x><br />
<br />
Phaser then sets TNCS NMOL <n> and the vector for the tNCS, and searches for ensembles in multiples of NMOL.<br />
<br />
When there are more than two molecules related by tNCS, Phaser does not refine the orientations between the molecules related by the tNCS.<br />
<br />
However, as for two-fold tNCS, Phaser is not restricted to these sorts of pseudo-cells and the basic tNCS vector can be in a general position, as can the number of copies.<br />
<br />
'''The automatic detection may not give the true tNCS relationship'''. For example, the true commensurate modulation may be a factor of the NMOL automatically detected by Phaser, or there may not be commensurate modulation at all, or commensurate modulation may not be found with the default Pattesron peak height cutoff. In difficult cases, please inspect the Patterson for peaks.<br />
<br />
====Complex tNCS====<br />
If there are many molecules in the asymmetric unit but they are not all related by tNCS, or there are sub-groups of molecules related by different tNCS vectors, then the modulations of the expected intensities due to the tNCS will be much less significant than the cases described above. '''In these cases it is possible that structure solution will be achieved without any tNCS correction factors being applied.''' Indeed, searching for all the copies as tNCS-related multiples when some molecules are not related by tNCS will cause structure solution to fail. To turn off the automatic detection and use of tNCS use the keyword TNCS USE OFF.<br />
<br />
If turning off the TNCS correction factors fails to give a solution, then a good approach is to proceed step-wise. Consider the highest native Patterson peak first and determine that nature of the tNCS associated with it. Use the appropriate correction factors to locate all the molecules with this tNCS. Then take the second independent native Patterson peak and apply the correction factors associated with it to find the second set of molecules, fixing the first, etc. Finally, turn TNCS off to find any orphan molecules.</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Molecular_Replacement&diff=2426Molecular Replacement2018-02-07T17:07:32Z<p>Rdo20: /* Building an Ensemble from Coordinates */</p>
<hr />
<div><div style="margin-left: 25px; float: right;">__TOC__</div><br />
<br />
'''Quicklink to example scripts''' → [[MR using keyword input]]<br />
<br />
'''Quicklink to phaser.famos (find_alt_orig_sym_mate) documentation''' → [[Famos]]<br />
<br />
Phaser should be able to solve most structures with the Automated Molecular Replacement mode, and this is the first mode that you should try. Give Phaser your data ([[#How to Define Data|How to Define Data]]) and your models ([[#How to Define Models|How to Define Models]]), tell Phaser what to search for, and a list of possible spacegroups (in the same point group).<br />
<br />
If this doesn't work (see [[#Has Phaser Solved It?| Has Phaser Solved It?]]), you can try selecting peaks of lower significance in the rotation function in case the real orientation was not within the selection criteria. By default peaks above 75% of the top peak are selected (see [[#How to Select Peaks| How to Select Peaks]]). See [[#What to do in Difficult Cases| What to do in Difficult Cases]] for more hints and tips. If the automated molecular replacement mode doesn't work even with non-default input you need to run the modes of Phaser separately. The possibilities are endless - you can even try exhaustive searches (translations of all orientations) if you want - but experience has shown that most structures that can be solved by Phaser can be solved by relatively simple strategies.<br />
<br />
==Automated Molecular Replacement==<br />
Automated Molecular Replacement combines the anisotropy correction, likelihood enhanced fast rotation function, likelihood enhanced fast translation function, packing and refinement modes for multiple search models and a set of possible spacegroups to automatically solve a structure by molecular replacement. Top solutions are output to the files FILEROOT.sol, FILEROOT.#.mtz and FILEROOT.#.pdb (where "#" refers to the sorted solution number, 1 being the best, and only 1 is output by default). Many structures can be solved by running an automated molecular replacement search with defaults, giving the ensembles that you expect to be easiest to find first.<br />
<br />
At the completion of Molecular Replacement you may wish to place your solutions on a common origin with a previous solution, for which [[Famos | Famos ]] can be used.<br />
<br />
[[Image:Phaser_MR_auto.gif|Flow Diagram for Automated MR]]<br />
<br />
==Should Phaser Solve It?==<br />
The difficulty of a molecular replacement problem depends primarily on two major factors: how well the model will be able to explain the diffraction data (which depends both on the accuracy of the model and on its completeness), and how many reflections can be explained, at least in part. Each reflection provides a piece of information that helps to identify correct MR solutions.<br />
<br />
It is possible to make a reasonable prediction of whether or not a solution will be found. If the quality of the model (its accuracy and completeness) can be estimated, then the expected contribution of each reflection to the total LLG can also be estimated. From a large battery of tests, we know that an LLG of 40 or greater usually indicates a correct solution (at least in the absence of complicating factors such as translational non-crystallographic symmetry, tNCS). Building on this understanding, if it is estimated that the LLG will be 60 or less, then Phaser will assume that the problem is a difficult one, and will implement search procedures optimised for difficult problems.<br />
<br />
==What Resolution of Data Should be Used?==<br />
The signal for a molecular replacement solution should be very clear if the expected value of the LLG is much higher than the minimum required to be fairly certain of a solution. Currently Phaser aims for a minimum LLG of 120 and, if it is possible to achieve an even higher value, given the quality of the model and the quantity of diffraction data, then the resolution for the initial search is limited to the value required to achieve an expected LLG of 120. Data to the full resolution are still used for a final rigid-body refinement, or in a second pass if a clear solution is not found in the first attempt.<br />
<br />
However, if the model is expected to have a large RMS error (based usually on the correlation between sequence identity and RMS error), then data to high resolution will not contribute any significant signal. Regardless of the expected LLG at the highest resolution limit, the resolution used is limited to 1.8 times the estimated RMS error of the model, because this resolution limit gives about 99% of the LLG that could be achieved.<br />
<br />
Because Phaser implements strategies designed to solve structures with as much confidence as possible, as efficiently as possible, it is best to leave the choice of resolution to Phaser, at least in the first instance.<br />
<br />
==Has Phaser Solved It?==<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! TF Z-score !! Have I solved it?<br />
|-<br />
| less than 5 || no<br />
|-<br />
| 5 - 6 || unlikely<br />
|-<br />
| 6 - 7 || possibly<br />
|-<br />
| 7 - 8 || probably<br />
|-<br />
| more than 8* ||definitely<br />
|-<br />
|colspan="2" style="text-align: center;" | *''6 for 1st model in monoclinic space groups''<br />
|} <br />
<br />
Ideally, a unique solution with a strong signal will be found at the end of the search. If you are searching for multiple components, then ideally the search for each component will also give a strong signal. However if the signal-to-noise of your search is low, there will be noise peaks and multiple ambiguous solutions. Signal-to-noise is judged using the '''Z-score''', which is computed by comparing the LLG values from the rotation or translation search with LLG values for a set of random rotations or translations. The mean and the RMS deviation from the mean are computed from the random set, then the Z-score for a search peak is defined as its LLG minus the mean, all divided by the RMS deviation, ''i.e. '' '''the number of standard deviations above (or below) the mean. '''<br />
<br />
For a rotation function, the correct orientation may be well down the list with a Z-score (number of standard deviations above the mean value, or RFZ) under 4, and it is often not possible to identify the correct orientation until a translation function is performed and yields a clear solution. Note that the signal-to-noise of the rotation function drops with increasing number of primitive symmetry operations (the number of different orientations for symmetry-related molecules), because there is more uncertainty about how the structure factor contributions from symmetry-related copies will add up.<br />
<br />
For a translation function the correct solution will generally have a Z-score (TFZ) over 5 and be well separated from the rest of the solutions. Of course, there will always be exceptions! The table gives a very rough guide to interpreting TFZ scores. This table will be updated, as we learn more from systematic molecular replacement trials.<br />
<br />
When you are searching for multiple components, the signal may be low for the first few components but, as the model becomes more complete, the signal should become stronger. Finding a clear solution for a new component is a good sign that the partial solution to which that component was added was indeed correct.<br />
<br />
You should always at least glance through the summary of the logfile. One thing to look for, in particular, is whether any translation solutions with a high Z-score have been rejected by the packing step. By default up to 5 percent of marker atoms (C-alpha atoms for protein) are allowed to be involved in clashes. A solution with more clashes may still be correct, and the clashes may arise only because of differences in small surface loops. If this happens, repeat the run allowing a suitable number of clashes. Note that, unless there is specific evidence in the logfile that a high TFZ-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond the default will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
<br />
Note that, by default, Phaser will produce a single PDB file corresponding to the top solution found (if any), so finding a single PDB file in your output directory is not an indication that the search succeeded! You have to look, at least, at the summary of the logfile, or at the list of possible solutions in the .sol file that is produced if you run Phaser from ccp4i or command-line scripts.<br />
<br />
==Annotation==<br />
<br />
A highly compact summary of the history of the statistics of a solution is given in the SOLUTION SET in the .sol file. This is a good place to start your analysis of the output. The annotation gives the Z-score of the solution at each rotation and translation function, the number of clashes in the packing, and the refined LLG.<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! Annotation !! Meaning<br />
|-<br />
| RFZ= || Rotation Function Z-score<br />
|-<br />
| TFZ= || Translation Function Z-score<br />
|-<br />
| PAK= || Number of packing clashes<br />
|-<br />
| LLG= || LLG after refinement. Will be repeated when a low resolution refinement is followed by a high resolution refinement.<br />
|-<br />
| TFZ== || Translation Function Z-score equivalent, only calculated for the top solution after refinement (or for the number of top files specified by TOPFILES)<br />
|-<br />
| RF++ || Rotation angle from previous strong solution has been used in the addition of next solution<br />
|-<br />
| RF*0 || Rotation angle 000 identified by low R-factor of input model<br />
|-<br />
| TFZ=* || First molecule in P1 (arbitrary origin, no Translation Function required)<br />
|-<br />
| TF*0 || Translation vector 000 identified by low R-factor of input model<br />
|-<br />
| (&&nbsp;... & ...) || Set of TFZ PAK and LLG values for placements that were amalgamated (more than one placement from a single Translation Function)<br />
|-<br />
| LLG+=(...&nbsp;&&nbsp;...)&nbsp;|| Set of LLG values calculated during amalgamation, which will always be increasing in value<br />
|-<br />
| +TNCS || Components added by Translational NCS relation<br />
|-<br />
| *T=<i>n</i> || Solution matches template solution <i>n</i><br />
|} <br />
<br />
Two versions of TFZ (the translation function Z-score) now appear for each component. The first ("TFZ=") is the Z-score from the actual translation search, which depends on the accuracy of the orientation used for that search. The second ("TFZ==") is the TFZ-equivalent, which indicates what the TFZ score would have been with the correct (refined) orientation. You should see the TFZ-equivalent is high at least for the final components of the solution, and that the LLG (log-likelihood gain) increases as each component of the solution is added. For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU SET RFZ=10.7 TFZ=24.3 PAK=0 LLG=472 TFZ==24.7 RFZ=6.4 TFZ=24.4 PAK=0 LLG=1006 TFZ==29.7 LLG=1006 TFZ==29.7<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
Note that the Euler angles in Phaser follow the same convention as those defined for the Crowther fast rotation function, i.e. z-y-z (rotate around the z-axis, followed by the new y-axis, followed by the new z-axis).<br />
<br />
==History==<br />
<br />
A highly compact summary of the history of the peak positions of a solution is given in the SOLUTION HISTORY in the .sol file. Together with the SOLUTION SET annotation, this is useful in your analysis of the output. <br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! History !! Meaning<br />
|-<br />
| RF/TF(r/t:n) || (r) Rotation Function peak number/(t) Translation Function peak number for the rotation function : (n) number of peak in final merged and sorted list<br />
|-<br />
| PAK(n:m) || (n) input solution number : (m) output solution number after packing condition applied<br />
|-<br />
| RNP(m,a,b,c,... : p) || All input peaks amalgamated after refinement to give output solution number (m and others): (p) output solution number<br />
|-<br />
| FUSE(A,B,C) || Solution numbers merged in amalgamation<br />
|} <br />
<br />
For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features<br />
<br />
SOLU HISTORY RF/TF(1/1:1)PAK(1:1)RNP(1:1)RNP(1:1)<br />
SOLU 6DIM ENSE beta EULER 200.849 41.269 183.909 FRAC -0.49604 -0.15830 -0.28092 BFAC 0.00000<br />
SOLU 6DIM ENSE blip EULER 43.749 80.793 117.292 FRAC -0.12289 0.29435 -0.09266 BFAC 0.00000<br />
<br />
A more complicated structure solution may have<br />
<br />
SOLU HISTORY RF/TF(7/1:10)PAK(10:10)RNP(10,12,13,11,17,16,18,25,3,8,22,21,20,7,969,6,5,201,9,4,390,2,1,19:1)RNP(1:1)<br />
<br />
==What to do in Difficult Cases==<br />
<br />
Not every structure can be solved by molecular replacement, but the right strategy can push the limits. What to do when the default jobs fail depends on why your structure is difficult.<br />
*'''Flexible Structure'''<br />
*:The relative orientations of the domains may be different in your crystal than in the model. If that may be the case, break the model into separate PDB files containing rigid-body units, enter these as separate ensembles, and search for them separately. If you find a convincing solution for one domain, but fail to find a solution for the next domain, you can take advantage of the knowledge that its orientation is likely to be similar to that of the first domain. The ROTAte&nbsp;AROUnd option of the brute rotation search can be used to restrict the search to orientations within, say, 30 degrees of that of the known domain. Allow for close approach of the domains by increasing the allowed clashes with the PACK keyword by, say, 1 for each domain break that you introduce. Note that it is possible to use the brute rotation search as part of the automated molecular replacement pipeline, by changing the choice of the type of rotation search. Alternatively, you could try generating a series of models perturbed by normal modes, with the NMAPdb keyword. One of these may duplicate the hinge motion and provide a good single model.<br />
*'''Poor or Incomplete Model'''<br />
*:Signal-to-noise is reduced by coordinate errors or incompleteness of the model. Since the rotation search has lower signal to begin with than the translation search, it is usually more severely affected. For this reason, it can be very useful to use the subsequent translation search as a way to choose among many (say 1000) orientations. THe MR_AUTO FAST search mode automatically reduces the cutoff for accepting peaks from the fast rotation function if the decault pass does not find a solution with a high z-score, but you can manually reduce this further with the PEAKS and PURGE keywords. You can also try turning off the clustering of fast rotation function peaks because the correct orientation may sit on the shoulder of a peak in the rotation function. <br />
*:As shown convincingly by Schwarzenbacher ''et al.'' (Schwarzenbacher, Godzik, Grzechnik &amp; Jaroszewski, ''Acta Cryst.'' D'''60''', 1229-1236, 2004), judicious editing can make a significant difference in the quality of a distant model. In a number of tests with their data on models below 30% sequence identity, we have found that Phaser works best with a "mixed model" (non-identical sidechains longer than Ser replaced by Ser). In agreement with their results, the best models are generally derived using more sophisticated alignment protocols, such as their FFAS protocol. Use [http://www.phenix-online.org/documentation/sculptor.htm phenix.sculptor] to edit your model.<br />
*'''High Degree of Non-crystallographic Symmetry'''<br />
*:If there are clear peaks in the self-rotation function, you can expect orientations to be related by this known NCS. Methods to automatically use such information will be implemented in a future version of Phaser. In the meantime, you can work out for yourself the orientations that would be consistent with NCS and use the ROTAte&nbsp;AROUnd option to sample similar orientations. Alternatively, you may have an oligomeric model and expect similar NCS in the crystal. First search with the oligomeric model; if this fails, search with a monomer. If that succeeds, you can again use the ROTAte&nbsp;AROUnd option to force a subsequent monomer to adopt an orientation similar to the one you expect.<br />
*'''What <u>not</u> to do'''<br />
*:The automated mode of Phaser is fast when Phaser finds a high Z-score solution to your problem. When Phaser cannot find a solution with a significant Z-score, it "thrashes", meaning it maintains a list of 100-1000's of low Z-score potential solutions and tries to improve them. This can lead to exceptionally long Phaser runs (over a week of CPU time). Such runs are possible because the highly automated script allows many consecutive MR jobs to be run without you having to manually set 100-1000's of jobs running and keep track of the results. "Thrashing" generally does not produce a solution: solutions generally appear relatively quickly or not at all. It is more useful to go back and analyse your models and your data to see where improvements can be made. Your system manager will appreciate you terminating these jobs.<br />
*:It is also not a good idea to effectively remove the packing test. Unless there is specific evidence in the logfile that a high TF-function Z-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond a few (e.g. 1-5) will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.<br />
*'''Other suggestions'''<br />
*:Phaser has powerful input, output and scripting facilities that allow a large number of possibilities for altering default behaviour and forcing Phaser to do what you think it should. However, you will need to read the information in the manual below to take advantage of these facilities!<br />
<br />
==How to Define Data==<br />
You need to tell Phaser the name of the mtz file containing your data and the columns in the mtz file to be used using the HKLIn and LABIn keywords. Additional keywords (BINS CELL OUTLier RESOlution SPACegroup) define how the data are used.<br />
<br />
==How to Define Models==<br />
Phaser must be given the models that it will use for molecular replacement. A model in Phaser is referred to as an "ensemble", even when it is described by a single file. This is because it is possible to provide a set of aligned structures as an ensemble, from which a statistically-weighted averaged model is calculated. A molecular replacement model is provided either as one or more aligned pdb files, or as an electron density map, entered as structure factors in an mtz file. Each ensemble is treated as a separate type of rigid body to be placed in the molecular replacement solution. An ensemble should only be defined once, even if there are several copies of the molecule in the asymmetric unit.<br />
<br />
Fundamental to the way in which Phaser uses MR models (either from coordinates or maps) is to estimate how the accuracy of the model falls off as a function of resolution, represented by the Sigma(A) curve. To generate the Sigma(A) curve, Phaser needs to know the RMS coordinate error expected for the model and the fraction of the scattering power in the asymmetric unit that this model contributes.<br />
<br />
A Babinet-style correction is used to account for the effects of disordered solvent on the completeness of the model at low resolution.<br />
<br />
Molecular replacement models are defined with the ENSEmble keyword and the COMPosition keyword. The ENSEmble keyword gives (amongst other things) the RMS deviation for the Sigma(A) curve. The COMPosition keyword is used to deduce the fraction of the scattering power in the asymmetric unit that each ensemble contributes. The composition of the asymmetric unit is defined either by entering the molecular weights or sequences of the components in the asymmetric unit, and giving the number of copies of each. Expert users can also enter the fraction of the scattering of each component directly, although the composition must still be entered for the absolute scale calculation. Please note that the composition supplied to Phaser has to include everything in the asymmetric unit, not just what is being looked for in the current search!<br />
<br />
===Building an Ensemble from Coordinates===<br />
The RMS deviation is determined directly from RMS or indirectly from IDENtity in the ENSEmble<br />
keyword using a formula that depends on the sequence identity and the number of residues in the model.<br />
<br />
The RMS deviation estimated from ID may be an underestimate of the true value if there is a slight conformational change between the model and target structures. To find a solution in these cases it may be necessary to increase the RMS from the default value generated from the ID, by say 0.5 Ångstroms. On the other hand, when Phaser succeeds in solving a structure from a model with sequence identity much below 30%, it is often found that the fold is preserved better than the average for that level of sequence identity. So it may be worth submitting a run in which the RMS error is set at, say, 1.5, even if the sequence identity is low. The table below can be used as a guide as to the default RMS value corresponding to ID.<br />
<br />
If you construct a model by homology modelling, remember that the RMS error you expect is essentially the error you expect from the template structure (if not worse!). So specify the sequence identity of the template, not of the homology model.<br />
<br />
Only the model with the highest sequence identity is reported in the output pdb file. Also, HETATM cards in the input pdb file are ignored in the calculation of the structure factors for the ensemble, but are carried through to the output pdb file. Thus, the phases on the output mtz file (which come from the structure factors of the ensemble) do not correspond to those that would be calculated from the output pdb file, when there is more than one pdb file in an ensemble and/or the pdbfile(s) have HETATM records.<br />
<br />
<br />
{| class="wikitable" style="text-align:center" style="margin-left: 30px" <br />
|-<br />
! !! #50 !! #100 !! #200 !! #300 !! #400 !! #600 !! #850 !! #1000 !! #1500 !! #2000<br />
|-<br />
|'''ID=0%''' || 1.579 || 1.689 || 1.875 || 2.030 || 2.164 || 2.391 || 2.625 || 2.748 || 3.093 || 3.375<br />
|-<br />
|'''ID=10%''' || 1.356 || 1.451 || 1.610 || 1.743 || 1.858 || 2.053 || 2.255 || 2.360 || 2.657 || 2.899<br />
|-<br />
|'''ID=20%''' || 1.165 || 1.246 || 1.383 || 1.497 || 1.596 || 1.764 || 1.936 || 2.027 || 2.281 || 2.489<br />
|-<br />
|'''ID=30%''' || 1.000 || 1.070 || 1.188 || 1.286 || 1.371 || 1.515 || 1.663 || 1.741 || 1.959 || 2.138<br />
|-<br />
|'''ID=40%''' || 0.859 || 0.919 || 1.020 || 1.104 || 1.177 || 1.301 || 1.428 || 1.495 || 1.683 || 1.836<br />
|-<br />
|'''ID=50%''' || 0.738 || 0.789 || 0.876 || 0.948 || 1.011 || 1.117 || 1.227 || 1.284 || 1.445 || 1.577<br />
|-<br />
|'''ID=60%''' || 0.634 || 0.678 || 0.752 || 0.814 || 0.868 || 0.959 || 1.053 || 1.103 || 1.241 || 1.354<br />
|-<br />
|'''ID=70%''' || 0.544 || 0.582 || 0.646 || 0.699 || 0.746 || 0.824 || 0.905 || 0.947 || 1.066 || 1.163<br />
|-<br />
|'''ID=80%''' || 0.467 || 0.500 || 0.555 || 0.601 || 0.640 || 0.708 || 0.777 || 0.813 || 0.915 || 0.999<br />
|-<br />
|'''ID=90%''' || 0.401 || 0.429 || 0.477 || 0.516 || 0.550 || 0.608 || 0.667 || 0.698 || 0.786 || 0.858<br />
|-<br />
|'''ID=100%''' || 0.345 || 0.369 || 0.409 || 0.443 || 0.472 || 0.522 || 0.573 || 0.600 || 0.675 || 0.737<br />
|-<br />
|}<br />
<br />
<br />
====Coordinate Editing====<br />
=====HETATM/LIGANDS=====<br />
Phaser ignores the scattering from HETATM records. The HETATM records are carried though to output with occupancy set to zero. Ligands will therefore not contribute to the scattering used for molecular replacement. The exceptions to this rule are the HETATM records for MSE (seleno-methionine) MSO (seleno-methionine selenoxide) CSE (seleno-cysteine) CSO (seleno-cysteine selenoxide) ALY (acetyllysine) MLY (n-dimethyl-lysine) and MLZ (n-methyl-lysine) which are used in the scattering and carried through to output with their original occupancy. If you wish to include any HETATM records in the scattering the record name use the keyword ENSE modlid HETATOM ON<br />
<br />
=====WATER=====<br />
Water molecules (identified by the residue name OW WAT HOH H2O OH2 MOH WTR or TIP) are deleted from the pdb file on input, are not used in the scattering and are not carried through to file output. If you want to retain water molecules you will need to change the residue name to something other than this (e.g. WWW) so that the atoms are not identified as water. To include the water molecules in the scattering, the HETATM records will also have to be changed to ATOM records as described above.<br />
<br />
===Building an Ensemble from Electron Density===<br />
When using density as a model, it is necessary to specify both the extent (x,y,z limits) of the cut-out region of density, and the centre of this region. With coordinates, Phaser can work this out by itself. This information is needed, for instance, to decide how large rotational steps can be in the rotation search and to carry out the molecular transform interpolation correctly. In the case of electron density, the RMS value does not have the same physical meaning that it has when the model is specified by atomic coordinates, but it is used to judge how the accuracy of the calculated structure factors drops off with resolution. A suitable value for RMS can be obtained, in the case of density from an experimentally-phased map, by choosing a value that makes the SigmaA curve fall off with resolution similarly to the mean figures-of-merit. In the case of density from an EM image reconstruction, the RMS value should make the SigmaA curve fall off similarly to a Fourier correlation curve used to judge the resolution of the EM image.<br />
<br />
For detailed information, including a tutorial with example scripts, see<br />
[[Using Electron Density as a Model| Using density as a model]]<br />
<br />
==How to Define Composition==<br />
The composition defines the total amount of protein and nucleic acid that you have in the asymmetric unit not the fraction of the asymmetric unit that you are searching for.<br />
<br />
===Default Composition===<br />
For convenience, the composition defaults to 50% protein scattering by volume (the average for protein crystals). It is better to enter it explicitly, even if only to check that you have correctly deduced the probable content of your crystal. If your crystal has higher or lower solvent content than this, or contains nucleic acid, then the composition should be entered explicitly.<br />
===Composition by Solvent Content===<br />
Scattering is determined from the solvent content of the crystal, assuming that the crystal contains protein only, and the average distribution of amino acids in protein. If your crystal contains nucleic acid or your protein has an unusual amino acid distribution then the composition should be entered explicitly using the MW or sequence options.<br />
===Composition by Number of Residues in ASU===<br />
Scattering is determined from the number of residues in the asymmetric unit, assuming that the crystal contains protein only or nucleic acid only, and assuming an average distribution of residues for either. If your crystal contains a mixture then the composition should be entered explicitly using the MW or sequence options. If your crystal has an unusual residue distribution then the composition should be entered explicitly using the sequence options.<br />
===Composition by Molecular Weight===<br />
The composition is calculated from the molecular weight of the protein and nucleic acid assuming the protein and nucleic acid have the average distribution of amino acids and bases. If your protein or nucleic acid has an unusual amino acid or base distribution the composition should be entered by sequence. You can mix compositions entered by molecular weight with those entered by sequence.<br />
===Composition by Sequence===<br />
The composition is calculated from the amino acid sequence of the protein and the base sequence of the nucleic acid in fasta format. You can mix compositions entered by molecular weight with those entered by sequence. Individual atoms can be added to the composition with the COMPOSITION ATOM keyword. This allows the explicit addition of heavy atoms in the structure e.g. Fe atoms.<br />
===Composition by Percentage Scattering===<br />
The fraction scattering of each ensemble can be entered directly. The fraction scattering of each ensemble is normally automatically worked out from the average scattering from each ensemble (calculated from the pdb files if entered as coordinates, or from the protein and nucleic acid molecular weights if entered as a map) divided by the total scattering given by the composition, but entering the fraction scattering directly overrides this calculation. This option is for use when the pdb files of the models in the ensemble are unusual e.g. consist only of C-alpha atoms, or only of hydrogen atoms (as in the CLOUDS method for NMR).<br />
<br />
==How to Define Solutions==<br />
Phaser writes out files ending in ".sol" and ".rlist" that contain the solution information from the job. The root of the files is given by the ROOT keyword. By default, the root filename is PHASER. These files can be read back into subsequent runs of Phaser to build up solutions containing more than one molecule in the asymmetric unit.<br />
<br />
"PHASER.sol" files are generated by all modes (rotation function modes with VERBOSE output), and contain the current idea of potential molecular replacement solutions.<br />
<br />
"PHASER.rlist" files are generated by the rotation function modes, and are used as input for performing translation functions.<br />
<br />
For simple MR cases you don't really need to know how to define molecular replacement solutions. However, for difficult cases you might need to edit the files "PHASER.sol" and "PHASER.rlist" files manually<br />
<br />
=== "sol" Files===<br />
SOLUtion 6DIM keywords describe Ensembles that have been oriented by a rotation search and positioned by a translation search. Each Ensemble in the asymmetric unit has its own SOLUtion keyword. When more than one (potential) molecular replacement solution is present, the solutions are separated with the SOLUTION SET keywords.<br />
<br />
==="rlist" Files===<br />
These files define a rotation function list. The peak list is given with a series of SOLUtion TRIAl keywords.<br />
<br />
If a partial solution is already known, then the information for the currently "known" parts of the asymmetric unit is given in the form used for the PHASER.sol file, followed by the list of trial orientations for which a translation function is to be performed.<br />
<br />
===Fixed partial structure===<br />
If you have the coordinates of a partial solution with the pdb coordinates of the known structure in the correct orientation and position, then you can force Phaser to use these coordinates. Use the SOLUTION keyword to fix a rotation of 0 0 0 and a position of 0 0 0 for these coordinates.<br />
<br />
==How to Select Peaks==<br />
<br />
<br />
<br />
The selection of peaks saved for output in the rotation and translation functions can be done in four different ways.<br />
*'''Select by Percentage'''<br />
*: Percentage of the top peak, where the value of the top peak is defined as 100% and the value of the mean is defined as 0%.<br />
*: Default, cutoff=75%. This criteria has the advantange that at least one peak (the top peak) always survives the selection. If the top solution is clear, then only the one solution will be output, but if the distribution of peaks is rather flat, then many peaks will be output for testing in the next part of the MR procedure (e.g. many peaks selected from the rotation function for testing with a translation function). <br />
*'''Select by Z-score'''<br />
*: Number of standard deviations (sigmas) over the mean (the Z-score). <br />
*: Absolute significance test. Not all searches will produce output if the cutoff value is too high (e.g. 5 sigma). <br />
*'''Select by Number'''<br />
*: Number of top peaks to select. <br />
*: If the distribution is very flat then it might be better to select a fixed large number (e.g. 1000) of top rotation peaks for testing in the translation function.<br />
*'''No selection'''<br />
*: All peaks are selected. <br />
*: Enables full 6 dimensional searches, where all the solutions from the rotation function are output for testing in the translation function. This should never be necessary; it would be much faster and probably just as likely to work if the top 1000 peaks were used in this way.<br />
<br />
[[Image:Phaser_selection.gif| Selection criteria]]<br />
<br />
Peaks can also be clustered or not clustered prior to selection in steps 1 and 2.<br />
*'''Clustering Off'''<br />
: All high peaks on the search grid are selected<br />
*'''Clustering On'''<br />
: Points on the search grid with higher neighbouring points are removed from the selection<br />
<br />
<br />
[[Image:Phaser_clustering.gif| Clustering]]<br />
<br />
==How to Control Output==<br />
The output of Phaser can be controlled with optional keywords. <br />
<br />
The ROOT keyword is not compulsory (the default root filename is "PHASER"), but should always be given, so that your jobs have separate and meaningful output filenames.<br />
<br />
The TOPFiles keyword controls the number of potential MR solutions for which PDB and (in the appropriate modes) MTZ files are produced.<br />
<br />
For the MR_AUTO, MR_RNP and MR_LLG modes, unless HKLOut OFF is given as an optional keyword, Phaser produces an MTZ file with "SigmaA" type weighted Fourier map coefficients for producing electron density maps for rebuilding.<br />
<br />
{| class="wikitable" style="text-align:left" width=100%<br />
|-<br />
! MTZ Column Labels !! Description<br />
|-<br />
| FWT/PHWT || Amplitude and phase for 2''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| DELFWT/PHDELWT || Amplitude and phase for ''m''&#124;F<sub>obs</sub>&#124;-''D''&#124;F<sub>calc</sub>&#124; exp(''i''&alpha;<sub>calc</sub>) map<br />
|-<br />
| FOM || ''m'', analogous to the "Sim" weight, to estimate the reliability of &alpha;<sub>calc</sub><br />
|-<br />
| HLA/HLB/HLC/HLD || Hendrickson-Lattman coefficients encoding the phase probability distribution<br />
|}<br />
<br />
==Translational Non-crystallographic Symmetry==<br />
<br />
<span style="color:crimson">'''*Warning*''' Solution by MR in the presence of translational non-crystallographic symmetry is not fully automated.</span><br />
<br />
Phaser calculates correction factors for the expected intensities in the presence of translational non-crystallographic symmetry (tNCS), and is able to solve structures with complex patterns of tNCS. '''However, the use of Phaser in the presence of tNCS requires the nature of the tNCS to be understood by the user.''' In simple cases, solution is no more difficult than solution without tNCS, but in complex cases, separate Phaser runs with tNCS turned on and off, and/or the use of different tNCS vectors, may be necessary.<br />
<br />
The output of Phaser will help the user in detecting and understanding the tNCS, but '''the tNCS is not completely characterised by Phaser'''. The default behaviour may or may not be correct for the particular crystal under study.<br />
<br />
Characterization of the tNCS involves understanding the number of copies of the molecule in the asymmetric unit and the translation vectors between them. Molecules related by a tNCS vector will have an associated peak in the native Patterson. Phaser calculates the native Patterson (MODE TNCS) and lists the peaks that are more than 20% of the origin peak. Any given crystal with tNCS may have one or more peaks meeting this criteria.<br />
<br />
===Default tNCS detection and correction===<br />
<span style="color:crimson">Documentation for Phaser-2.7.16 and above</span><br />
<br />
====No tNCS====<br />
No tNCS correction is applied by default if there is<br />
# no peak in the native Patterson <br />
# more than one peak in the native Patterson over 20% of the origin and these peaks are not all the result of a commensurate modulation<br />
<br />
====Pairs of molecules====<br />
By default, if Phaser detects a peak in the native Patterson then Phaser will search for molecules in pairs related by the tNCS vector given by the peak in the native Patterson.<br />
<br />
This will be the correct behaviour if and only if there are an even number of copies of the molecule in the asymmetric unit, clustered into two groups related by a single tNCS vector. There will only be one significant peak in the native Patterson. Fortunately, this is a reasonably common scenario.<br />
<br />
Phaser refines the relative orientation of the molecules in the two groups (rotations of up to 10 degrees will still give rise to a significant native Patterson peak) and uses this information to generate expected intensity factors for the reflections. Solution should be straightforward, with the usual caveat for MR that there is a sufficiently good model.<br />
<br />
Where there is a single peak in the native Patterson, it is often located at a position half way along a unit cell axis or diagonal, representing a pseudo-halving of the unit cell dimensions. However, Phaser is by no means restricted to these sorts of pseudo-cells in its handling of two-fold tNCS, and the tNCS vector can be in a general position.<br />
<br />
===Non-default tNCS correction===<br />
====Higher order tNCS====<br />
Frequently, tNCS does not associate 2 clusters of molecules in the asymmetric unit, but rather there are 3 or more (n) clusters of molecules associated by a series of vectors that are multiples of 1, 2, 3 ... (n-1) times a basic translation vector. Where n times the basic translation vector equates to (very close to) integer multiples of unit cell axes, the tNCS represents a pseudo-cell, and this case is known as commensurate modulation. <br />
<br />
Phaser attempts to automatically detect commensurate modulation. The peaks of the native Patterson are analyzed to find the n-fold relationship. The series will not generally have all peaks the same height. Lower peaks in the series represent relationships where the relative rotations between related molecules are larger. Missing peaks in the series may be below the default 20% of origin cut-off. This can be lowered with TNCS PATT PERCENT <x><br />
<br />
Phaser then sets TNCS NMOL <n> and the vector for the tNCS, and searches for ensembles in multiples of NMOL.<br />
<br />
When there are more than two molecules related by tNCS, Phaser does not refine the orientations between the molecules related by the tNCS.<br />
<br />
However, as for two-fold tNCS, Phaser is not restricted to these sorts of pseudo-cells and the basic tNCS vector can be in a general position, as can the number of copies.<br />
<br />
'''The automatic detection may not give the true tNCS relationship'''. For example, the true commensurate modulation may be a factor of the NMOL automatically detected by Phaser, or there may not be commensurate modulation at all, or commensurate modulation may not be found with the default Pattesron peak height cutoff. In difficult cases, please inspect the Patterson for peaks.<br />
<br />
====Complex tNCS====<br />
If there are many molecules in the asymmetric unit but they are not all related by tNCS, or there are sub-groups of molecules related by different tNCS vectors, then the modulations of the expected intensities due to the tNCS will be much less significant than the cases described above. '''In these cases it is possible that structure solution will be achieved without any tNCS correction factors being applied.''' Indeed, searching for all the copies as tNCS-related multiples when some molecules are not related by tNCS will cause structure solution to fail. To turn off the automatic detection and use of tNCS use the keyword TNCS USE OFF.<br />
<br />
If turning off the TNCS correction factors fails to give a solution, then a good approach is to proceed step-wise. Consider the highest native Patterson peak first and determine that nature of the tNCS associated with it. Use the appropriate correction factors to locate all the molecules with this tNCS. Then take the second independent native Patterson peak and apply the correction factors associated with it to find the second set of molecules, fixing the first, etc. Finally, turn TNCS off to find any orphan molecules.</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Source_Code&diff=2425Source Code2018-02-07T12:43:23Z<p>Rdo20: /* Building Phaser from source */</p>
<hr />
<div>===Repository===<br />
<br />
A public [https://git.csx.cam.ac.uk/x/cimr-phaser/phaser.git/summary Phaser git repository] is available for '''git clone''' and '''git pull''' only. This mirrors commits to the Phaser SVN respository in real time<br />
<br />
The [http://www-structmed.cimr.cam.ac.uk/svn-cgi-bin/viewvc.cgi/ Phaser SVN repository] is located in Cambridge on the CIMR server (password restricted)<br />
<br />
The Berkeley mirror at cci.lbl.gov is updated at midnight Berkeley time<br />
<br />
:/net/cci/auto_build/repositories/phaser<br />
<br />
===Access===<br />
<br />
*You can download nightly builds of Phenix (binaries), which contain the latest version of Phaser that has passed regression tests<br />
*You can compile code with real-time updates from the git repository. This code may not pass regression tests. The git repository is best used for obtaining instant bugfixes, after communication with one of the Phaser developers<br />
*If you are developing a pipeline using Phaser, we are keen to work with you to add features, fix bugs and help you use Phaser optimally<br />
*Note the University of Cambridge's [[ Licences | Licences for Phaser]] with regards to making Phaser part of a pipeline available online<br />
*Source code modifications are allowed under the University of Cambridge's [[ Licences | Licences for Phaser]], provided they are for internal use only. Distribution would require those changes to be incorporated into our SVN repository. <br />
<br />
===Full Access===<br />
<br />
*Requests for permission to commit to the SVN repository via SSH should emailed to [mailto:cimr-phaser@lists.cam.ac.uk phaser-help]<br />
<br />
<br />
===Building Phaser from source===<br />
Phaser can be built as an executable file for the platforms Linux, MacOS, Windows (using VC++ 9.0) and Windows (using g++ in MinGW-W64). Except for MinGW-W64 it can also easily be built as python modules useful for python scripting. It requires at least an installation of CCTBX (available from http://cci.lbl.gov/cctbx_build/) to allow building the Phaser executable. Install CCTBX for the desired platform on your system first. For MinGW-W64 use the Windows build of CCTBX.<br />
Assuming python2.7 is present on your system Phaser can be built from a CCTBX installation <br />
with the following steps from a Bash shell or a Windows command prompt:<br />
#Change directory to the modules/ folder within the CCTBX installation. Then do <pre>git clone git://git.uis.cam.ac.uk/cimr-phaser/phaser.git </pre><br />
#Change directory to the build/ folder within the CCTBX installation<br />
#Delete all files and folders except '''config_modules.sh''' or '''config_modules.cmd'''<br />
#Edit '''config_modules.sh''' or '''config_modules.cmd''' script to like:<br />
#:*'''Linux or MacOS''':<br />
#:<pre>#!/bin/sh &#10;python ../modules/cctbx_project/libtbx/configure.py phaser --enable_openmp_if_possible=True</pre><br />
#:*'''Windows using Microsoft VC++ 9.0'''<br />
#:<pre>python ..\modules\cctbx_project\libtbx\configure.py phaser --enable_openmp_if_possible=True</pre><br />
#:*'''Windows using MinGW-W64 5.3.0'''<br />
#:<pre>python ..\modules\cctbx_project\libtbx\configure.py phaser --enable_openmp_if_possible=True --compiler=mingw --static_exe</pre><br />
#Execute the '''config_modules.sh''' or '''config_modules.cmd''' script.<br />
#On Linux or MacOS source the file '''setpaths.sh''', on Windows execute the file '''setpaths.bat'''<br />
#On Linux or MacOS do '''libtbx.scons -j nproc exe/phaser''', on Windows do '''libtbx.scons -j nproc exe\phaser.exe'''. Here nproc is the number of available CPUs to do the compilation. This will produce the Phaser executable within the build/exe directory. If Phaser python modules are also desired then omit the '''exe/phaser''' or '''exe\phaser.exe''' argument (does not apply to a MinGW-W64 build unless the installed version of your CCTBX was built for MinGW-W64).<br />
<br />
The steps to build Phaser change from time to time as the developments of required components like CCTBX are moving targets. The steps outlined here may therefore differ from the actual ones at short or no notice.</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Source_Code&diff=2424Source Code2018-02-07T12:20:27Z<p>Rdo20: /* Building Phaser from source */</p>
<hr />
<div>===Repository===<br />
<br />
A public [https://git.csx.cam.ac.uk/x/cimr-phaser/phaser.git/summary Phaser git repository] is available for '''git clone''' and '''git pull''' only. This mirrors commits to the Phaser SVN respository in real time<br />
<br />
The [http://www-structmed.cimr.cam.ac.uk/svn-cgi-bin/viewvc.cgi/ Phaser SVN repository] is located in Cambridge on the CIMR server (password restricted)<br />
<br />
The Berkeley mirror at cci.lbl.gov is updated at midnight Berkeley time<br />
<br />
:/net/cci/auto_build/repositories/phaser<br />
<br />
===Access===<br />
<br />
*You can download nightly builds of Phenix (binaries), which contain the latest version of Phaser that has passed regression tests<br />
*You can compile code with real-time updates from the git repository. This code may not pass regression tests. The git repository is best used for obtaining instant bugfixes, after communication with one of the Phaser developers<br />
*If you are developing a pipeline using Phaser, we are keen to work with you to add features, fix bugs and help you use Phaser optimally<br />
*Note the University of Cambridge's [[ Licences | Licences for Phaser]] with regards to making Phaser part of a pipeline available online<br />
*Source code modifications are allowed under the University of Cambridge's [[ Licences | Licences for Phaser]], provided they are for internal use only. Distribution would require those changes to be incorporated into our SVN repository. <br />
<br />
===Full Access===<br />
<br />
*Requests for permission to commit to the SVN repository via SSH should emailed to [mailto:cimr-phaser@lists.cam.ac.uk phaser-help]<br />
<br />
<br />
===Building Phaser from source===<br />
Phaser can be built as an executable file for the platforms Linux, MacOS, Windows (using VC++ 9.0) and Windows (using g++ in MinGW-W64). Except for MinGW-W64 it can also easily be built as python modules useful for python scripting. It requires at least an installation of CCTBX (available from http://cci.lbl.gov/cctbx_build/) to allow building the Phaser executable. Install CCTBX for the desired platform on your system first. For MinGW-W64 use the Windows build of CCTBX.<br />
Assuming python2.7 is present on your system Phaser can be built from a CCTBX installation <br />
with the following steps from a Bash shell or a Windows command prompt:<br />
#Change directory to the modules/ folder within the CCTBX installation. Then do <pre>git clone git://git.uis.cam.ac.uk/cimr-phaser/phaser.git </pre><br />
#Change directory to the build/ folder within the CCTBX installation<br />
#Delete all files and folders except '''config_modules.sh''' or '''config_modules.cmd'''<br />
#Edit '''config_modules.sh''' or '''config_modules.cmd''' script to like:<br />
#:*'''Linux or MacOS''':<br />
#:<pre>#!/bin/sh 1&#10;python ../modules/cctbx_project/libtbx/configure.py phaser --enable_openmp_if_possible=True</pre><br />
#:*'''Windows using Microsoft VC++ 9.0'''<br />
#:<pre>python ..\modules\cctbx_project\libtbx\configure.py phaser --enable_openmp_if_possible=True</pre><br />
#:*'''Windows using MinGW-W64 5.3.0'''<br />
#:<pre>python ..\modules\cctbx_project\libtbx\configure.py phaser --enable_openmp_if_possible=True --compiler=mingw --static_exe</pre><br />
#Execute the '''config_modules.sh''' or '''config_modules.cmd''' script.<br />
#On Linux or MacOS source the file '''setpaths.sh''', on Windows execute the file '''setpaths.bat'''<br />
#On Linux or MacOS do '''libtbx.scons -j nproc exe/phaser''', on Windows do '''libtbx.scons -j nproc exe\phaser.exe'''. Here nproc is the number of available CPUs to do the compilation. This will produce the Phaser executable within the build/exe directory. If Phaser python modules are also desired then omit the '''exe/phaser''' or '''exe\phaser.exe''' argument (does not apply to a MinGW-W64 build unless the installed version of your CCTBX was built for MinGW-W64).<br />
<br />
The steps to build Phaser change from time to time as the developments of required components like CCTBX are moving targets. The steps outlined here may therefore differ from the actual ones at short or no notice.</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Source_Code&diff=2423Source Code2018-02-07T11:36:42Z<p>Rdo20: /* Building Phaser from source */</p>
<hr />
<div>===Repository===<br />
<br />
A public [https://git.csx.cam.ac.uk/x/cimr-phaser/phaser.git/summary Phaser git repository] is available for '''git clone''' and '''git pull''' only. This mirrors commits to the Phaser SVN respository in real time<br />
<br />
The [http://www-structmed.cimr.cam.ac.uk/svn-cgi-bin/viewvc.cgi/ Phaser SVN repository] is located in Cambridge on the CIMR server (password restricted)<br />
<br />
The Berkeley mirror at cci.lbl.gov is updated at midnight Berkeley time<br />
<br />
:/net/cci/auto_build/repositories/phaser<br />
<br />
===Access===<br />
<br />
*You can download nightly builds of Phenix (binaries), which contain the latest version of Phaser that has passed regression tests<br />
*You can compile code with real-time updates from the git repository. This code may not pass regression tests. The git repository is best used for obtaining instant bugfixes, after communication with one of the Phaser developers<br />
*If you are developing a pipeline using Phaser, we are keen to work with you to add features, fix bugs and help you use Phaser optimally<br />
*Note the University of Cambridge's [[ Licences | Licences for Phaser]] with regards to making Phaser part of a pipeline available online<br />
*Source code modifications are allowed under the University of Cambridge's [[ Licences | Licences for Phaser]], provided they are for internal use only. Distribution would require those changes to be incorporated into our SVN repository. <br />
<br />
===Full Access===<br />
<br />
*Requests for permission to commit to the SVN repository via SSH should emailed to [mailto:cimr-phaser@lists.cam.ac.uk phaser-help]<br />
<br />
<br />
===Building Phaser from source===<br />
Phaser can be built as an executable file for the platforms Linux, MacOS, Windows (using VC++ 9.0) and Windows (using g++ in MinGW-W64). Except for MinGW-W64 it can also easily be built as python modules useful for python scripting. It requires at least an installation of CCTBX (available from http://cci.lbl.gov/cctbx_build/) to allow building the Phaser executable. Install CCTBX for the desired platform on your system first. For MinGW-W64 use the Windows build of CCTBX.<br />
Assuming python2.7 is present on your system Phaser can be built from a CCTBX installation <br />
with the following steps from a Bash shell or a Windows command prompt:<br />
*Change directory to the modules/ folder within the CCTBX installation. Then do <br />
git clone git://git.uis.cam.ac.uk/cimr-phaser/phaser.git<br />
*Change directory to the build/ folder within the CCTBX installation<br />
*Delete all files and folders except '''config_modules.sh''' or '''config_modules.cmd'''<br />
*Edit '''config_modules.sh''' or '''config_modules.cmd''' script to like:<br />
**'''Linux or MacOS''':<br />
#!/bin/sh<br />
python ../modules/cctbx_project/libtbx/configure.py phaser --enable_openmp_if_possible=True<br />
:*'''Windows using Microsoft VC++ 9.0'''<br />
python ..\modules\cctbx_project\libtbx\configure.py phaser --enable_openmp_if_possible=True<br />
:*'''Windows using MinGW-W64 5.3.0'''<br />
python ..\modules\cctbx_project\libtbx\configure.py phaser --enable_openmp_if_possible=True --compiler=mingw --static_exe<br />
<br />
*Execute the '''config_modules.sh''' or '''config_modules.cmd''' script.<br />
*On Linux or MacOS source the file '''setpaths.sh''', on Windows execute the file '''setpaths.bat'''<br />
*On Linux or MacOS do '''libtbx.scons -j nproc exe/phaser''', on Windows do '''libtbx.scons -j nproc exe\phaser.exe'''. Here nproc is the number of available CPUs to do the compilation. This will produce the Phaser executable within the build/exe directory. If Phaser python modules are also desired then omit the '''exe/phaser''' or '''exe\phaser.exe''' argument (does not apply to a MinGW-W64 build unless the installed version of your CCTBX was built for MinGW-W64).<br />
<br />
The steps to build Phaser change from time to time as the developments of required components like CCTBX are moving targets. The steps outlined here may therefore differ from the actual ones at short or no notice.</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Source_Code&diff=2422Source Code2018-02-06T17:48:49Z<p>Rdo20: /* Building Phaser from source */</p>
<hr />
<div>===Repository===<br />
<br />
A public [https://git.csx.cam.ac.uk/x/cimr-phaser/phaser.git/summary Phaser git repository] is available for '''git clone''' and '''git pull''' only. This mirrors commits to the Phaser SVN respository in real time<br />
<br />
The [http://www-structmed.cimr.cam.ac.uk/svn-cgi-bin/viewvc.cgi/ Phaser SVN repository] is located in Cambridge on the CIMR server (password restricted)<br />
<br />
The Berkeley mirror at cci.lbl.gov is updated at midnight Berkeley time<br />
<br />
:/net/cci/auto_build/repositories/phaser<br />
<br />
===Access===<br />
<br />
*You can download nightly builds of Phenix (binaries), which contain the latest version of Phaser that has passed regression tests<br />
*You can compile code with real-time updates from the git repository. This code may not pass regression tests. The git repository is best used for obtaining instant bugfixes, after communication with one of the Phaser developers<br />
*If you are developing a pipeline using Phaser, we are keen to work with you to add features, fix bugs and help you use Phaser optimally<br />
*Note the University of Cambridge's [[ Licences | Licences for Phaser]] with regards to making Phaser part of a pipeline available online<br />
*Source code modifications are allowed under the University of Cambridge's [[ Licences | Licences for Phaser]], provided they are for internal use only. Distribution would require those changes to be incorporated into our SVN repository. <br />
<br />
===Full Access===<br />
<br />
*Requests for permission to commit to the SVN repository via SSH should emailed to [mailto:cimr-phaser@lists.cam.ac.uk phaser-help]<br />
<br />
<br />
===Building Phaser from source===<br />
Phaser can be built as an executable file for the platforms Linux, MacOS, Windows (using VC++ 9.0) and Windows (using g++ in MinGW-W64). Except for MinGW-W64 it can also easily be built as python modules useful for python scripting. It requires at least an installation of CCTBX (available from http://cci.lbl.gov/cctbx_build/) to allow building the Phaser executable. Install CCTBX for the desired platform on your system first. For MinGW-W4 use the Windows build of CCTBX.<br />
Assuming python2.7 is present on your system Phaser can be built from a CCTBX installation <br />
with the following steps from a Bash shell or a Windows command prompt:<br />
*Change directory to the modules/ folder within the CCTBX installation. Then do <br />
git clone git://git.uis.cam.ac.uk/cimr-phaser/phaser.git<br />
*Change directory to the build/ folder within the CCTBX installation<br />
*Delete all files and folders except '''config_modules.sh''' or '''config_modules.cmd'''<br />
*Edit '''config_modules.sh''' or '''config_modules.cmd''' script to like:<br />
**'''Linux or MacOS''':<br />
#!/bin/sh<br />
python ../modules/cctbx_project/libtbx/configure.py phaser --enable_openmp_if_possible=True<br />
:*'''Windows using Microsoft VC++ 9.0'''<br />
python ..\modules\cctbx_project\libtbx\configure.py phaser --enable_openmp_if_possible=True<br />
:*'''Windows using MinGW-W64 5.3.0'''<br />
python ..\modules\cctbx_project\libtbx\configure.py phaser --enable_openmp_if_possible=True --compiler=mingw --static_exe<br />
<br />
*Execute the '''config_modules.sh''' or '''config_modules.cmd''' script.<br />
*On Linux or MacOS source the file '''setpaths.sh''', on Windows execute the file '''setpaths.bat'''<br />
*On Linux or MacOS do '''libtbx.scons -j nproc exe/phaser''', on Windows do '''libtbx.scons -j nproc exe\phaser.exe'''. Here nproc is the number of available CPUs to do the compilation. This will produce the Phaser executable within the build/exe directory. If Phaser python modules are also desired then omit the '''exe/phaser''' or '''exe\phaser.exe''' argument (does not apply to MinGW build).<br />
<br />
The steps to build Phaser change from time to time as the developments of required components like CCTBX are moving targets. The steps outlined here may therefore differ from the actual ones at short or no notice.</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Source_Code&diff=2421Source Code2018-02-06T16:54:29Z<p>Rdo20: /* Building Phaser from source */</p>
<hr />
<div>===Repository===<br />
<br />
A public [https://git.csx.cam.ac.uk/x/cimr-phaser/phaser.git/summary Phaser git repository] is available for '''git clone''' and '''git pull''' only. This mirrors commits to the Phaser SVN respository in real time<br />
<br />
The [http://www-structmed.cimr.cam.ac.uk/svn-cgi-bin/viewvc.cgi/ Phaser SVN repository] is located in Cambridge on the CIMR server (password restricted)<br />
<br />
The Berkeley mirror at cci.lbl.gov is updated at midnight Berkeley time<br />
<br />
:/net/cci/auto_build/repositories/phaser<br />
<br />
===Access===<br />
<br />
*You can download nightly builds of Phenix (binaries), which contain the latest version of Phaser that has passed regression tests<br />
*You can compile code with real-time updates from the git repository. This code may not pass regression tests. The git repository is best used for obtaining instant bugfixes, after communication with one of the Phaser developers<br />
*If you are developing a pipeline using Phaser, we are keen to work with you to add features, fix bugs and help you use Phaser optimally<br />
*Note the University of Cambridge's [[ Licences | Licences for Phaser]] with regards to making Phaser part of a pipeline available online<br />
*Source code modifications are allowed under the University of Cambridge's [[ Licences | Licences for Phaser]], provided they are for internal use only. Distribution would require those changes to be incorporated into our SVN repository. <br />
<br />
===Full Access===<br />
<br />
*Requests for permission to commit to the SVN repository via SSH should emailed to [mailto:cimr-phaser@lists.cam.ac.uk phaser-help]<br />
<br />
<br />
===Building Phaser from source===<br />
Phaser can be built as an executable file for the platforms Linux, MacOS, Windows (using VC++ 9.0) and Windows (using g++ in MinGW-W64). Except for MinGW-W64 it can also easily be built as python modules useful for python scripting. It requires at least an installation of CCTBX (available from http://cci.lbl.gov/cctbx_build/) to allow building the Phaser executable. Install CCTBX for the desired platform on your system first. For MinGW-W4 use the Windows build of CCTBX.<br />
Assuming python2.7 is present on your system Phaser can be built from a CCTBX installation <br />
with the following steps from a Bash shell or a Windows command prompt:<br />
*Change directory to the modules/ folder within the CCTBX installation. Then do <br />
git clone git://git.uis.cam.ac.uk/cimr-phaser/phaser.git<br />
*Change directory to the build/ folder within the CCTBX installation<br />
*Delete all files and folders except '''config_modules.sh''' or '''config_modules.cmd'''<br />
*Edit '''config_modules.sh''' or '''config_modules.cmd''' script to like:<br />
**'''Linux or MacOS''':<br />
#!/bin/sh<br />
python ../modules/cctbx_project/libtbx/configure.py phaser --enable_openmp_if_possible=True<br />
:*'''Windows using Microsoft VC++ 9.0'''<br />
python ..\modules\cctbx_project\libtbx\configure.py phaser --enable_openmp_if_possible=True<br />
:*'''Windows using MinGW-W64 5.3.0'''<br />
python ..\modules\cctbx_project\libtbx\configure.py phaser --enable_openmp_if_possible=True --compiler=mingw --static_exe<br />
<br />
*Execute the '''config_modules.sh''' or '''config_modules.cmd''' script.<br />
*On Linux or MacOS source the file '''setpaths.sh''', on Windows execute the file '''setpaths.bat'''<br />
*On Linux or MacOS do '''libtbx.scons -j nproc exe/phaser''', on Windows do '''libtbx.scons -j nproc exe\phaser.exe'''. Here nproc is the number of available CPUs to do the compilation. This will produce the Phaser executable within the build/exe directory. If Phaser python modules are also desired then omit the '''exe/phaser''' or '''exe\phaser.exe''' argument (does not apply to MinGW build).<br />
<br />
The steps to build Phaser changes from time to time as the development of required components like CCTBX are a moving target. The steps outlined here may therefore differ from the actual ones at short or no notice.</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Source_Code&diff=2420Source Code2018-02-06T16:53:26Z<p>Rdo20: /* Building Phaser from source */</p>
<hr />
<div>===Repository===<br />
<br />
A public [https://git.csx.cam.ac.uk/x/cimr-phaser/phaser.git/summary Phaser git repository] is available for '''git clone''' and '''git pull''' only. This mirrors commits to the Phaser SVN respository in real time<br />
<br />
The [http://www-structmed.cimr.cam.ac.uk/svn-cgi-bin/viewvc.cgi/ Phaser SVN repository] is located in Cambridge on the CIMR server (password restricted)<br />
<br />
The Berkeley mirror at cci.lbl.gov is updated at midnight Berkeley time<br />
<br />
:/net/cci/auto_build/repositories/phaser<br />
<br />
===Access===<br />
<br />
*You can download nightly builds of Phenix (binaries), which contain the latest version of Phaser that has passed regression tests<br />
*You can compile code with real-time updates from the git repository. This code may not pass regression tests. The git repository is best used for obtaining instant bugfixes, after communication with one of the Phaser developers<br />
*If you are developing a pipeline using Phaser, we are keen to work with you to add features, fix bugs and help you use Phaser optimally<br />
*Note the University of Cambridge's [[ Licences | Licences for Phaser]] with regards to making Phaser part of a pipeline available online<br />
*Source code modifications are allowed under the University of Cambridge's [[ Licences | Licences for Phaser]], provided they are for internal use only. Distribution would require those changes to be incorporated into our SVN repository. <br />
<br />
===Full Access===<br />
<br />
*Requests for permission to commit to the SVN repository via SSH should emailed to [mailto:cimr-phaser@lists.cam.ac.uk phaser-help]<br />
<br />
<br />
===Building Phaser from source===<br />
Phaser can be built as an executable file for the platforms Linux, MacOS, Windows (using VC++ 9.0) and Windows (using g++ in MinGW-W64). Except for MinGW-W64 it can also easily be built as python modules. It requires at least an installation of CCTBX (available from http://cci.lbl.gov/cctbx_build/) to allow building the Phaser executable. Install CCTBX for the desired platform on your system first. For MinGW-W4 use the Windows build of CCTBX.<br />
Assuming python2.7 is present on your system Phaser can be built from a CCTBX installation <br />
with the following steps from a Bash shell or a Windows command prompt:<br />
*Change directory to the modules/ folder within the CCTBX installation. Then do <br />
git clone git://git.uis.cam.ac.uk/cimr-phaser/phaser.git<br />
*Change directory to the build/ folder within the CCTBX installation<br />
*Delete all files and folders except '''config_modules.sh''' or '''config_modules.cmd'''<br />
*Edit '''config_modules.sh''' or '''config_modules.cmd''' script to like:<br />
**'''Linux or MacOS''':<br />
#!/bin/sh<br />
python ../modules/cctbx_project/libtbx/configure.py phaser --enable_openmp_if_possible=True<br />
:*'''Windows using Microsoft VC++ 9.0'''<br />
python ..\modules\cctbx_project\libtbx\configure.py phaser --enable_openmp_if_possible=True<br />
:*'''Windows using MinGW-W64 5.3.0'''<br />
python ..\modules\cctbx_project\libtbx\configure.py phaser --enable_openmp_if_possible=True --compiler=mingw --static_exe<br />
<br />
*Execute the '''config_modules.sh''' or '''config_modules.cmd''' script.<br />
*On Linux or MacOS source the file '''setpaths.sh''', on Windows execute the file '''setpaths.bat'''<br />
*On Linux or MacOS do '''libtbx.scons -j nproc exe/phaser''', on Windows do '''libtbx.scons -j nproc exe\phaser.exe'''. Here nproc is the number of available CPUs to do the compilation. This will produce the Phaser executable within the build/exe directory. If Phaser python modules useful for python scripting are also desired then omit the '''exe/phaser''' or '''exe\phaser.exe''' argument (does not apply to MinGW build).<br />
<br />
The steps to build Phaser changes from time to time as the development of required components like CCTBX are a moving target. The steps outlined here may therefore differ from the actual ones at short or no notice.</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Source_Code&diff=2419Source Code2018-02-06T16:04:05Z<p>Rdo20: /* Building Phaser from source */</p>
<hr />
<div>===Repository===<br />
<br />
A public [https://git.csx.cam.ac.uk/x/cimr-phaser/phaser.git/summary Phaser git repository] is available for '''git clone''' and '''git pull''' only. This mirrors commits to the Phaser SVN respository in real time<br />
<br />
The [http://www-structmed.cimr.cam.ac.uk/svn-cgi-bin/viewvc.cgi/ Phaser SVN repository] is located in Cambridge on the CIMR server (password restricted)<br />
<br />
The Berkeley mirror at cci.lbl.gov is updated at midnight Berkeley time<br />
<br />
:/net/cci/auto_build/repositories/phaser<br />
<br />
===Access===<br />
<br />
*You can download nightly builds of Phenix (binaries), which contain the latest version of Phaser that has passed regression tests<br />
*You can compile code with real-time updates from the git repository. This code may not pass regression tests. The git repository is best used for obtaining instant bugfixes, after communication with one of the Phaser developers<br />
*If you are developing a pipeline using Phaser, we are keen to work with you to add features, fix bugs and help you use Phaser optimally<br />
*Note the University of Cambridge's [[ Licences | Licences for Phaser]] with regards to making Phaser part of a pipeline available online<br />
*Source code modifications are allowed under the University of Cambridge's [[ Licences | Licences for Phaser]], provided they are for internal use only. Distribution would require those changes to be incorporated into our SVN repository. <br />
<br />
===Full Access===<br />
<br />
*Requests for permission to commit to the SVN repository via SSH should emailed to [mailto:cimr-phaser@lists.cam.ac.uk phaser-help]<br />
<br />
<br />
===Building Phaser from source===<br />
Phaser requires at least an installation of CCTBX (available from http://cci.lbl.gov/cctbx_build/) to allow building the Phaser executable. Install CCTBX for the desired platform on your system first. For MinGW-W4 use the Windows build of CCTBX.<br />
Assuming python2.7 is present on your system Phaser can be built from a CCTBX installation <br />
with the following steps from a Bash shell or a Windows command prompt:<br />
*Change directory to the modules/ folder within the CCTBX installation. Then do <br />
git clone git://git.uis.cam.ac.uk/cimr-phaser/phaser.git<br />
*Change directory to the build/ folder within the CCTBX installation<br />
*Delete all files and folders except '''config_modules.sh''' or '''config_modules.cmd'''<br />
*Edit '''config_modules.sh''' or '''config_modules.cmd''' script to like:<br />
**'''Linux or MacOS''':<br />
#!/bin/sh<br />
python ../modules/cctbx_project/libtbx/configure.py phaser --enable_openmp_if_possible=True<br />
:*'''Windows using Microsoft VC++ 9.0'''<br />
python ..\modules\cctbx_project\libtbx\configure.py phaser --enable_openmp_if_possible=True<br />
:*'''Windows using MinGW-W64 5.3.0'''<br />
python ..\modules\cctbx_project\libtbx\configure.py phaser --enable_openmp_if_possible=True --compiler=mingw --static_exe<br />
<br />
*Execute the '''config_modules.sh''' or '''config_modules.cmd''' script.<br />
*On Linux or MacOS source the file '''setpaths.sh''', on Windows execute the file '''setpaths.bat'''<br />
*On Linux or MacOS do '''libtbx.scons -j nproc exe/phaser''', on Windows do '''libtbx.scons -j nproc exe\phaser.exe'''. Here nproc is the number of available CPUs to do the compilation. This will produce the Phaser executable within the build/exe directory. If Phaser python modules useful for python scripting are also desired then omit the '''exe/phaser''' or '''exe\phaser.exe''' argument (does not apply to MinGW build).<br />
<br />
The steps to build Phaser changes from time to time as the development of required components like CCTBX are a moving target. The steps outlined here may therefore differ from the actual ones at short or no notice.</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Source_Code&diff=2418Source Code2018-02-06T15:48:58Z<p>Rdo20: /* Building Phaser from source */</p>
<hr />
<div>===Repository===<br />
<br />
A public [https://git.csx.cam.ac.uk/x/cimr-phaser/phaser.git/summary Phaser git repository] is available for '''git clone''' and '''git pull''' only. This mirrors commits to the Phaser SVN respository in real time<br />
<br />
The [http://www-structmed.cimr.cam.ac.uk/svn-cgi-bin/viewvc.cgi/ Phaser SVN repository] is located in Cambridge on the CIMR server (password restricted)<br />
<br />
The Berkeley mirror at cci.lbl.gov is updated at midnight Berkeley time<br />
<br />
:/net/cci/auto_build/repositories/phaser<br />
<br />
===Access===<br />
<br />
*You can download nightly builds of Phenix (binaries), which contain the latest version of Phaser that has passed regression tests<br />
*You can compile code with real-time updates from the git repository. This code may not pass regression tests. The git repository is best used for obtaining instant bugfixes, after communication with one of the Phaser developers<br />
*If you are developing a pipeline using Phaser, we are keen to work with you to add features, fix bugs and help you use Phaser optimally<br />
*Note the University of Cambridge's [[ Licences | Licences for Phaser]] with regards to making Phaser part of a pipeline available online<br />
*Source code modifications are allowed under the University of Cambridge's [[ Licences | Licences for Phaser]], provided they are for internal use only. Distribution would require those changes to be incorporated into our SVN repository. <br />
<br />
===Full Access===<br />
<br />
*Requests for permission to commit to the SVN repository via SSH should emailed to [mailto:cimr-phaser@lists.cam.ac.uk phaser-help]<br />
<br />
<br />
===Building Phaser from source===<br />
Phaser requires at least an installation of CCTBX (available from http://cci.lbl.gov/cctbx_build/) to allow building the Phaser executable. Assuming python2.7 is present on your system Phaser can be built from a CCTBX installation <br />
with the following steps from a Bash shell or a Windows command prompt:<br />
*Change directory to the modules/ folder within the CCTBX installation. Then do <br />
git clone git://git.uis.cam.ac.uk/cimr-phaser/phaser.git<br />
*Change directory to the build/ folder within the CCTBX installation<br />
*Delete all files and folders except '''config_modules.sh''' or '''config_modules.cmd'''<br />
*Edit '''config_modules.sh''' or '''config_modules.cmd''' script to like:<br />
**'''Linux or MacOS''':<br />
#!/bin/sh<br />
python ../modules/cctbx_project/libtbx/configure.py phaser --enable_openmp_if_possible=True<br />
:*'''Windows using Microsoft VC++ 9.0'''<br />
python ..\modules\cctbx_project\libtbx\configure.py phaser --enable_openmp_if_possible=True<br />
:*'''Windows using MinGW-W64 5.3.0'''<br />
python ..\modules\cctbx_project\libtbx\configure.py phaser --enable_openmp_if_possible=True --compiler=mingw --static_exe<br />
<br />
*Execute the '''config_modules.sh''' or '''config_modules.cmd''' script.<br />
*On Linux or MacOS source the file '''setpaths.sh''', on Windows execute the file '''setpaths.bat'''<br />
*On Linux or MacOS do '''libtbx.scons -j nproc exe/phaser''', on Windows do '''libtbx.scons -j nproc exe\phaser.exe'''. Here nproc is the number of available CPUs to do the compilation. This will produce the Phaser executable within the build/exe directory. If Phaser python modules useful for python scripting are also desired then omit the '''exe/phaser''' or '''exe\phaser.exe''' argument.<br />
<br />
The steps to build Phaser changes from time to time as the development of required components like CCTBX are a moving target. The steps outlined here may therefore differ from the actual ones at short or no notice.</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Source_Code&diff=2417Source Code2018-02-06T15:37:15Z<p>Rdo20: /* Building Phaser from source */</p>
<hr />
<div>===Repository===<br />
<br />
A public [https://git.csx.cam.ac.uk/x/cimr-phaser/phaser.git/summary Phaser git repository] is available for '''git clone''' and '''git pull''' only. This mirrors commits to the Phaser SVN respository in real time<br />
<br />
The [http://www-structmed.cimr.cam.ac.uk/svn-cgi-bin/viewvc.cgi/ Phaser SVN repository] is located in Cambridge on the CIMR server (password restricted)<br />
<br />
The Berkeley mirror at cci.lbl.gov is updated at midnight Berkeley time<br />
<br />
:/net/cci/auto_build/repositories/phaser<br />
<br />
===Access===<br />
<br />
*You can download nightly builds of Phenix (binaries), which contain the latest version of Phaser that has passed regression tests<br />
*You can compile code with real-time updates from the git repository. This code may not pass regression tests. The git repository is best used for obtaining instant bugfixes, after communication with one of the Phaser developers<br />
*If you are developing a pipeline using Phaser, we are keen to work with you to add features, fix bugs and help you use Phaser optimally<br />
*Note the University of Cambridge's [[ Licences | Licences for Phaser]] with regards to making Phaser part of a pipeline available online<br />
*Source code modifications are allowed under the University of Cambridge's [[ Licences | Licences for Phaser]], provided they are for internal use only. Distribution would require those changes to be incorporated into our SVN repository. <br />
<br />
===Full Access===<br />
<br />
*Requests for permission to commit to the SVN repository via SSH should emailed to [mailto:cimr-phaser@lists.cam.ac.uk phaser-help]<br />
<br />
<br />
===Building Phaser from source===<br />
Phaser requires at least an installation of CCTBX (available from http://cci.lbl.gov/cctbx_build/) to allow building the Phaser executable. Assuming python2.7 is present on your system Phaser can be built from a CCTBX installation <br />
with the following steps from a Bash shell or a Windows command prompt:<br />
*Change directory to the modules/ folder within the CCTBX installation. Then do <br />
git clone git://git.uis.cam.ac.uk/cimr-phaser/phaser.git<br />
*Change directory to the build/ folder within the CCTBX installation<br />
*Delete all files and folders except '''config_modules.sh''' or '''config_modules.cmd'''<br />
*Edit '''config_modules.sh''' or '''config_modules.cmd''' script to like:<br />
**'''Linux or MacOS''':<br />
#!/bin/sh<br />
python ../modules/cctbx_project/libtbx/configure.py phaser --enable_openmp_if_possible=True<br />
:*'''Windows using Microsoft VC++ 9.0'''<br />
python ..\modules\cctbx_project\libtbx\configure.py phaser --enable_openmp_if_possible=True<br />
:*'''Windows using MinGW-W64 5.3.0'''<br />
python ..\modules\cctbx_project\libtbx\configure.py phaser --enable_openmp_if_possible=True --compiler=mingw --static_exe<br />
<br />
*Execute the '''config_modules.sh''' or '''config_modules.cmd''' script.<br />
*On Linux or MacOS source the file '''setpaths.sh''', on Windows execute the file '''setpaths.bat'''<br />
*On Linux or MacOS do '''libtbx.scons -j nproc exe/phaser''', on Windows do '''libtbx.scons -j nproc exe\phaser.exe'''. Here nproc is the number of available CPUs to do the compilation. This will produce the Phaser executable within the build/exe directory.<br />
<br />
The steps to build Phaser changes from time to time as the development of required components like CCTBX are a moving target. The steps outlined here may therefore differ from the actual ones at short or no notice.</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Source_Code&diff=2416Source Code2018-02-06T15:35:43Z<p>Rdo20: Steps to build Phaser from CCTBX</p>
<hr />
<div>===Repository===<br />
<br />
A public [https://git.csx.cam.ac.uk/x/cimr-phaser/phaser.git/summary Phaser git repository] is available for '''git clone''' and '''git pull''' only. This mirrors commits to the Phaser SVN respository in real time<br />
<br />
The [http://www-structmed.cimr.cam.ac.uk/svn-cgi-bin/viewvc.cgi/ Phaser SVN repository] is located in Cambridge on the CIMR server (password restricted)<br />
<br />
The Berkeley mirror at cci.lbl.gov is updated at midnight Berkeley time<br />
<br />
:/net/cci/auto_build/repositories/phaser<br />
<br />
===Access===<br />
<br />
*You can download nightly builds of Phenix (binaries), which contain the latest version of Phaser that has passed regression tests<br />
*You can compile code with real-time updates from the git repository. This code may not pass regression tests. The git repository is best used for obtaining instant bugfixes, after communication with one of the Phaser developers<br />
*If you are developing a pipeline using Phaser, we are keen to work with you to add features, fix bugs and help you use Phaser optimally<br />
*Note the University of Cambridge's [[ Licences | Licences for Phaser]] with regards to making Phaser part of a pipeline available online<br />
*Source code modifications are allowed under the University of Cambridge's [[ Licences | Licences for Phaser]], provided they are for internal use only. Distribution would require those changes to be incorporated into our SVN repository. <br />
<br />
===Full Access===<br />
<br />
*Requests for permission to commit to the SVN repository via SSH should emailed to [mailto:cimr-phaser@lists.cam.ac.uk phaser-help]<br />
<br />
<br />
===Building Phaser from source===<br />
Phaser requires at least an installation of CCTBX (available from http://cci.lbl.gov/cctbx_build/) to allow building the Phaser executable. Assuming python2.7 is present on your system Phaser can be built from a CCTBX installation <br />
with the following steps from a Bash shell or a Windows command prompt:<br />
*Change directory to the modules/ folder within the CCTBX installation. Then do <br />
git clone git://git.uis.cam.ac.uk/cimr-phaser/phaser.git<br />
*Change directory to the build/ folder within the CCTBX installation<br />
*Delete all files and folders except '''config_modules.sh''' or '''config_modules.cmd'''<br />
*Edit '''config_modules.sh''' or '''config_modules.cmd''' script to like:<br />
**'''Linux or MacOS''':<br />
#!/bin/sh<br />
python ../modules/cctbx_project/libtbx/configure.py phaser --enable_openmp_if_possible=True<br />
:*'''Windows using Microsoft VC++ 9.0'''<br />
python ..\modules\cctbx_project\libtbx\configure.py phaser --enable_openmp_if_possible=True<br />
:*'''Windows using MinGW-W64 5.3.0'''<br />
python ..\modules\cctbx_project\libtbx\configure.py phaser --enable_openmp_if_possible=True --compiler=mingw --static_exe<br />
<br />
*Execute the '''config_modules.sh''' or '''config_modules.cmd''' script.<br />
*On Linux or MacOS source the file '''setpaths.sh''', on Windows execute the file '''setpaths.bat'''<br />
*On Linux or MacOS do '''libtbx.scons -j nproc exe/phaser''', on Windows do '''libtbx.scons -j nproc exe\phaser.exe'''. Here nproc is the number of available CPUs to do the compilation. This will produce the Phaser executable within the build/exe directory.<br />
<br />
The steps to build Phaser changes from time to time as the development of required components like CCTBX are a moving target. The steps outlined here may therefore change at short or no notice.</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Downloads&diff=2389Downloads2016-11-28T17:47:51Z<p>Rdo20: /* Phenix Installation */</p>
<hr />
<div><div style="margin-left: 25px; float: right;">__NOTOC__</div><br />
<br />
Note that the use of Phaser is subject to the University of Cambridge's [[Licences | '''Licences''']]<br />
<br />
;Phenix<br />
:Phenix [http://www.phenix-online.org/download/nightly_builds.cgi?admin=1| Nightly Builds] <br />
:Phenix [http://www.phenix-online.org/download/ Official Releases] (recommended)<br />
;CCP4<br />
:CCP4 [http://www.ccp4.ac.uk/dev/nightly/| Nightly Builds] <br />
:CCP4 [http://www.ccp4.ac.uk/download.php Release] (recommended)<br />
:'' Official releases will be made available through the CCP4 update mechanism''<br />
<br />
==Phenix Installation==<br />
Installation of Phaser with Phenix allows a choice of installation methods<br />
;Precompiled executables for selected platforms<br />
*Linux<br />
*Windows<br />
*Mac OS<br />
<br />
;Source code installation<br />
:Phaser can be compiled from source on the above platforms from the Phenix distribution<br />
<br />
==CCP4 Installation==<br />
;Precompiled executables for selected platforms<br />
*Linux<br />
*Windows<br />
*Mac OS</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Famos&diff=2323Famos2016-09-08T16:45:25Z<p>Rdo20: /* Usage from the command line */</p>
<hr />
<div><div style="margin-left: 25px; float: right;">__NOTOC__</div><br />
<br />
'''phaser.famos''' ('''phaser.find_alt_orig_sym_mate''') is a script for determining the best common origin for different molecular replacement solutions, in real space. The coordinates do not need to be identical. The common origin is found via secondary structure matching.<br />
<br />
==Author==<br />
<br />
Robert D. Oeffner<br />
<br />
==Purpose==<br />
<br />
'''phaser.famos''' attempts to find the best superposition of two different molecular replacement solutions of the same dataset termed moving.pdb and fixed.pdb with respect to all symmetry operations and alternate origin shifts permitted by the spacegroup of the crystal. If either of the pdb files contain more chains each chain will be tested for their best match against chains in the other pdb file. It does so by calculating a score value, MLAD (see Algorithm), for all possible symmetry operations and alternate origin shifts of each chain in moving.pdb compared with chains in fixed.pdb. The transformation with the smallest MLAD is retained for that particular pair of chains.<br />
<br />
The script returns the best matches between pairs of chains in the two files. This includes the MLAD values, chain IDs, symmetry transformations and alternate origins for the two structures. If the same origin shift is not applied to all pairs of chains a warning will be printed. If the space group of the crystal has a floating origin this generally results in an origin offset between chains that is not a rational number.<br />
<br />
The superposed structure and the fixed structure can be visually inspected in a molecular viewer such as Coot or Pymol. Matches with low MLAD scores will have correspondingly good superpositions.<br />
<br />
==Usage from the command line==<br />
<br />
'''phaser.famos''' uses [https://www.phenix-online.org/documentation/file_formats.html#phil-files-eff-def-phil Phenix PHIL input] in either a text file or as keywords like:<br />
<br />
phaser.famos moving.pdb=pdbfile1 fixed.pdb=pdbfile2<br />
<br />
or:<br />
<br />
phaser.famos my_phil_input.txt.<br />
<br />
The PHIL input scopes, moving and fixed, specifies the MR solutions. Both scopes are programmatically equivalent and must be non-empty. This means that a scope should either specify the parameter xyzfname or the sub-scope mrsolution or the sub-scope pickle.solution.<br />
<br />
Examples:<br />
*Use the pdb file from one of the MR solutions in a scope. This is useful for simple cases such as when the solution comes from the PHENIX MRage GUI or when it is obtained from another MR program than Phaser.<br />
*Specify a Phaser MR solution by assigning the .sol file name of the solution to the mrsolution.solfname parameter of the sub-scope as well as assigning the IDs of the ensembles and the corresponding MR models to the parameters mrsolution.ensembles.name and mrsolution.ensembles.xyzfname. Multiple search components are specified as multiple ensembles. This way of specifying solutions is useful for solution files produced when Phaser is run from the command line and it produces several solutions of which one is the correct solution.<br />
*Use a solution from the Phaser-MR GUI in PHENIX. In that case the parameter pickle.solution.pklfname in the sub-scope should be assigned to the solution file that is produced by PHENIX after an MR calculation. The pickle.solution.philfname should then be assigned to the input file for the MR calculation. This way of specifying solutions is useful when MR solutions are available from having run Phaser through the PHENIX interface.<br />
*If space group and the unit cell dimensions is not available from any of the input files then these need to be specified by assigning the parameter, spacegroupfname, either to a PDB file with a CRYST1 record or to an MTZ file with that information. This should be the data file used for the molecular replacement calculation.<br />
<br />
<br />
===Examples of PHIL input===<br />
<br />
Unless the input just constitutes of two PDB files a PHIL file is the easiest way to enter input. A few examples of PHIL for '''phaser.famos''' are given below.<br />
<br />
Testing a command-line Phaser MR solution file against a solution specified as a PDB file:<br />
<br />
AltOrigSymMates.fixed.mrsolution<br />
{<br />
solfname = "testdata/MR_3ECI_A0_2P82_A0.sol"<br />
ensembles<br />
{<br />
name = "MR_2P82_A0"<br />
xyzfname = "testdata/sculpt_2P82_A0.pdb"<br />
}<br />
ensembles<br />
{<br />
name = "MR_3ECI_A0"<br />
xyzfname = "testdata/sculpt_3ECI_A0.pdb"<br />
}<br />
}<br />
<br />
AltOrigSymMates.moving_pdb="testdata/2z0d.pdb"<br />
<br />
Testing two set of solution files from the PHENIX Phaser-MR GUI against one another:<br />
<br />
AltOrigSymMates.fixed.pickle_solution<br />
{<br />
philfname = "testdata/phaser_mr_13.eff"<br />
pklfname = "testdata/phaser_mr_13.pkl"<br />
}<br />
AltOrigSymMates.moving.pickle_solution<br />
{<br />
philfname = "testdata/phaser_mr_11.eff"<br />
pklfname = "testdata/phaser_mr_11.pkl"<br />
}<br />
<br />
Testing two solution files from the command-line version of Phaser against one another:<br />
<br />
AltOrigSymMates.fixed.mrsolution<br />
{<br />
solfname = "testdata/MR_3ECI_A0_2P82_A0.sol"<br />
ensembles<br />
{<br />
name = "MR_2P82_A0"<br />
xyzfname = "testdata/sculpt_2P82_A0.pdb"<br />
}<br />
ensembles<br />
{<br />
name = "MR_3ECI_A0"<br />
xyzfname = "testdata/sculpt_3ECI_A0.pdb"<br />
}<br />
}<br />
AltOrigSymMates.moving.mrsolution<br />
{<br />
solfname = "testdata/MR_2ZPN_A0_2P82_A0.sol"<br />
ensembles<br />
{<br />
name = "MR_2P82_A0"<br />
xyzfname = "testdata/sculpt_2P82_A0.pdb"<br />
}<br />
ensembles<br />
{<br />
name = "MR_2ZPN_A0"<br />
xyzfname = "testdata/sculpt_2ZPN_A0.pdb"<br />
}<br />
}<br />
AltOrigSymMates.spacegroupfname = "testdata/2Z0D.mtz"<br />
<br />
For more information on the PHIL input see the bottom of this page.<br />
<br />
===Also move HETATM (hetero atoms)===<br />
<br />
Invoking this flag will move hetero atoms (ligands, waters, metals, etc.) in conjunction with their associated peptide chain. The program first invokes phenix.sort_hetatms as to associate hetero atoms sensibly with adjacent peptide chains.<br />
<br />
After having identified the transformation for a chain yielding the smallest MLAD score it then subjects the associated hetero atoms to the same transformation.<br />
<br />
===Debug mode===<br />
<br />
Invoking the debug flag produces individual pdb files of the C-alpha atoms used for each SSM alignment of the moving scope for each permitted symmetry operation and alternative origin. A gold atom is placed at the centroid of the C-alpha atoms. Similar files are produced for the fixed scope. These files are stored in a subfolder named "AltOrigSymMatesFiles".<br />
<br />
If a floating origin is present in the space group a table of MLAD values is produced by sliding a copy of the chains in the moving scope along the polar axis in the fractional interval [-0.5, 0.5] for each permitted symmetry operation and alternative origin.<br />
<br />
===List of all available keywords===<br />
<br />
See [https://www.phenix-online.org/documentation/reference/find_alt_orig_sym_mate.html#list-of-all-available-keywords Phenix documentation for phenix.find_alt_orig_sym_mate]<br />
<br />
==Output==<br />
<br />
The closest match between chains in the moving section to chains in the fixed section is saved with the name of the pdb file in the fixed scope, concatenated with the name of the pdb file in the moving scope but prepended with "MinMLAD_". A log file with the name of the pdb file in the fixed scope, concatenated with the name of the pdb file in the moving scope but prepended with "AltOrigSymMLAD_" is written containing standard output. All files are saved in the current working directory. If MR solutions specified in the PHIL contains multiple solutions then '''phaser.famos''' will output mulitple log files corresponding to each MR solution.<br />
<br />
===Do the MR solutions match?===<br />
<br />
A good match between two chains usually have a MLAD value below 1.5 whereas a bad match usually have a value above 2.0. This is a rule of thumb and exceptions do occur. It is advisable to visually inspect that the structures superpose one another in a molecular graphics viewing program.<br />
<br />
==Algorithm==<br />
<br />
'''phaser.famos''' computes configurations by looping over all symmetry operations and alternative origin shifts. An alignment between C-alpha atoms from moving.pdb and fixed.pdb is computed using secondary structure matching (SSM). If that fails or if the MLAD score achieved with SSM is larger than 2.0 an alignment is computed using MMTBX alignment functions which is part of the CCTBX. To estimate the best match a distance measure between the aligned C-alpha atoms is computed for each configuration. The mean log absolute deviation (MLAD) is defined as:<br />
<br />
MLAD(dR) = Σ( log(dr·dr/(|dr| + 0.9) + max(0.9, min(dr·dr,1))) - log(0.9)),<br />
<br />
where dr is the difference vector between a pair of aligned C-alpha atoms and the sum is taken over all atom pairs in the alignment. The factor log(0.9) is subtracted to ensure that MLAD(0) = 0.0, i.e. that two identical structures produces the value zero.<br />
<br />
MLAD can loosely be interpreted as a distance measure between structures. But it is not a metric in a strict mathematical sense since the triangle inequality is not fulfilled. Unlike a plain root mean square deviation the logarithm in the MLAD formula will downplay contributions of atom pairs where the atoms are spatially distant. If an RMSD were employed such atom pairs would contribute on an equal footing with those groups of atom pairs that can be superposed perfectly. This in turn may lead to non-optimal superpositions when the structures tested against one another consists of multiple domains where one domain has undergone a domain motion, i.e. where a subset of atoms in one chain are bound to be spatially distant from the atoms in the other chain they have been paired with.<br />
<br />
'''phaser.famos''' will for each chain in the fixed scope find the smallest MLAD with a copy of each chain in the moving scope for a given symmetry operation and alternative origin. When all chains in the fixed scope have been tested these copies will be saved to a file. For spacegroups with floating origin the minimum MLAD is found by doing a Golden Sectioning minimization along the polar axis for each copy of chains in moving.pdb.<br />
<br />
===Caveats===<br />
<br />
The program tests all solutions present in solution files entered as fixed scope against all solutions present in solution files entered as moving scope. Consequently the execution time is proportional to the number of solutions in fixed scope times the number of solutions in moving scope.<br />
<br />
The execution time is proportional to the number of SSM alignments being tested; if SSM identifies 12 alignments the program will take 12 times as long.<br />
<br />
===Changes===<br />
<br />
The command-line syntax mentioned in the Computational Crystallography Newsletter 2012 January has been replaced by PHIL syntax.<br />
<br />
===Literature===<br />
<br />
Algorithms for deriving crystallographic space-group information R.W. Große-Kunstleve Acta Cryst. A55, 383-395 (1999)<br />
<br />
phenix.find_alt_orig_sym_mate Robert D. Oeffner, Gábor Bunkóczi and Randy J. Read Computational Crystallography Newsletter 2012 January, 5-10 (2012)<br />
<br />
Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. E. Krissinel and K. Henrick Acta Cryst. D60, 2256-2268 (2004)</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Famos&diff=2322Famos2016-09-08T16:43:25Z<p>Rdo20: /* Algorithm */</p>
<hr />
<div><div style="margin-left: 25px; float: right;">__NOTOC__</div><br />
<br />
'''phaser.famos''' ('''phaser.find_alt_orig_sym_mate''') is a script for determining the best common origin for different molecular replacement solutions, in real space. The coordinates do not need to be identical. The common origin is found via secondary structure matching.<br />
<br />
==Author==<br />
<br />
Robert D. Oeffner<br />
<br />
==Purpose==<br />
<br />
'''phaser.famos''' attempts to find the best superposition of two different molecular replacement solutions of the same dataset termed moving.pdb and fixed.pdb with respect to all symmetry operations and alternate origin shifts permitted by the spacegroup of the crystal. If either of the pdb files contain more chains each chain will be tested for their best match against chains in the other pdb file. It does so by calculating a score value, MLAD (see Algorithm), for all possible symmetry operations and alternate origin shifts of each chain in moving.pdb compared with chains in fixed.pdb. The transformation with the smallest MLAD is retained for that particular pair of chains.<br />
<br />
The script returns the best matches between pairs of chains in the two files. This includes the MLAD values, chain IDs, symmetry transformations and alternate origins for the two structures. If the same origin shift is not applied to all pairs of chains a warning will be printed. If the space group of the crystal has a floating origin this generally results in an origin offset between chains that is not a rational number.<br />
<br />
The superposed structure and the fixed structure can be visually inspected in a molecular viewer such as Coot or Pymol. Matches with low MLAD scores will have correspondingly good superpositions.<br />
<br />
==Usage from the command line==<br />
<br />
'''phaser.famos''' uses [https://www.phenix-online.org/documentation/file_formats.html#phil-files-eff-def-phil Phenix PHIL input] in either a text file or as keywords like:<br />
<br />
phaser.famos moving.pdb=pdbfile1 fixe.pdb=pdbfile2<br />
<br />
or:<br />
<br />
phaser.famos my_phil_input.txt.<br />
<br />
The PHIL input scopes, moving and fixed, specifies the MR solutions. Both scopes are programmatically equivalent and must be non-empty. This means that a scope should either specify the parameter xyzfname or the sub-scope mrsolution or the sub-scope pickle.solution.<br />
<br />
Examples:<br />
*Use the pdb file from one of the MR solutions in a scope. This is useful for simple cases such as when the solution comes from the PHENIX MRage GUI or when it is obtained from another MR program than Phaser.<br />
*Specify a Phaser MR solution by assigning the .sol file name of the solution to the mrsolution.solfname parameter of the sub-scope as well as assigning the IDs of the ensembles and the corresponding MR models to the parameters mrsolution.ensembles.name and mrsolution.ensembles.xyzfname. Multiple search components are specified as multiple ensembles. This way of specifying solutions is useful for solution files produced when Phaser is run from the command line and it produces several solutions of which one is the correct solution.<br />
*Use a solution from the Phaser-MR GUI in PHENIX. In that case the parameter pickle.solution.pklfname in the sub-scope should be assigned to the solution file that is produced by PHENIX after an MR calculation. The pickle.solution.philfname should then be assigned to the input file for the MR calculation. This way of specifying solutions is useful when MR solutions are available from having run Phaser through the PHENIX interface.<br />
*If space group and the unit cell dimensions is not available from any of the input files then these need to be specified by assigning the parameter, spacegroupfname, either to a PDB file with a CRYST1 record or to an MTZ file with that information. This should be the data file used for the molecular replacement calculation.<br />
<br />
<br />
===Examples of PHIL input===<br />
<br />
Unless the input just constitutes of two PDB files a PHIL file is the easiest way to enter input. A few examples of PHIL for '''phaser.famos''' are given below.<br />
<br />
Testing a command-line Phaser MR solution file against a solution specified as a PDB file:<br />
<br />
AltOrigSymMates.fixed.mrsolution<br />
{<br />
solfname = "testdata/MR_3ECI_A0_2P82_A0.sol"<br />
ensembles<br />
{<br />
name = "MR_2P82_A0"<br />
xyzfname = "testdata/sculpt_2P82_A0.pdb"<br />
}<br />
ensembles<br />
{<br />
name = "MR_3ECI_A0"<br />
xyzfname = "testdata/sculpt_3ECI_A0.pdb"<br />
}<br />
}<br />
<br />
AltOrigSymMates.moving_pdb="testdata/2z0d.pdb"<br />
<br />
Testing two set of solution files from the PHENIX Phaser-MR GUI against one another:<br />
<br />
AltOrigSymMates.fixed.pickle_solution<br />
{<br />
philfname = "testdata/phaser_mr_13.eff"<br />
pklfname = "testdata/phaser_mr_13.pkl"<br />
}<br />
AltOrigSymMates.moving.pickle_solution<br />
{<br />
philfname = "testdata/phaser_mr_11.eff"<br />
pklfname = "testdata/phaser_mr_11.pkl"<br />
}<br />
<br />
Testing two solution files from the command-line version of Phaser against one another:<br />
<br />
AltOrigSymMates.fixed.mrsolution<br />
{<br />
solfname = "testdata/MR_3ECI_A0_2P82_A0.sol"<br />
ensembles<br />
{<br />
name = "MR_2P82_A0"<br />
xyzfname = "testdata/sculpt_2P82_A0.pdb"<br />
}<br />
ensembles<br />
{<br />
name = "MR_3ECI_A0"<br />
xyzfname = "testdata/sculpt_3ECI_A0.pdb"<br />
}<br />
}<br />
AltOrigSymMates.moving.mrsolution<br />
{<br />
solfname = "testdata/MR_2ZPN_A0_2P82_A0.sol"<br />
ensembles<br />
{<br />
name = "MR_2P82_A0"<br />
xyzfname = "testdata/sculpt_2P82_A0.pdb"<br />
}<br />
ensembles<br />
{<br />
name = "MR_2ZPN_A0"<br />
xyzfname = "testdata/sculpt_2ZPN_A0.pdb"<br />
}<br />
}<br />
AltOrigSymMates.spacegroupfname = "testdata/2Z0D.mtz"<br />
<br />
For more information on the PHIL input see the bottom of this page.<br />
<br />
===Also move HETATM (hetero atoms)===<br />
<br />
Invoking this flag will move hetero atoms (ligands, waters, metals, etc.) in conjunction with their associated peptide chain. The program first invokes phenix.sort_hetatms as to associate hetero atoms sensibly with adjacent peptide chains.<br />
<br />
After having identified the transformation for a chain yielding the smallest MLAD score it then subjects the associated hetero atoms to the same transformation.<br />
<br />
===Debug mode===<br />
<br />
Invoking the debug flag produces individual pdb files of the C-alpha atoms used for each SSM alignment of the moving scope for each permitted symmetry operation and alternative origin. A gold atom is placed at the centroid of the C-alpha atoms. Similar files are produced for the fixed scope. These files are stored in a subfolder named "AltOrigSymMatesFiles".<br />
<br />
If a floating origin is present in the space group a table of MLAD values is produced by sliding a copy of the chains in the moving scope along the polar axis in the fractional interval [-0.5, 0.5] for each permitted symmetry operation and alternative origin.<br />
<br />
===List of all available keywords===<br />
<br />
See [https://www.phenix-online.org/documentation/reference/find_alt_orig_sym_mate.html#list-of-all-available-keywords Phenix documentation for phenix.find_alt_orig_sym_mate]<br />
<br />
==Output==<br />
<br />
The closest match between chains in the moving section to chains in the fixed section is saved with the name of the pdb file in the fixed scope, concatenated with the name of the pdb file in the moving scope but prepended with "MinMLAD_". A log file with the name of the pdb file in the fixed scope, concatenated with the name of the pdb file in the moving scope but prepended with "AltOrigSymMLAD_" is written containing standard output. All files are saved in the current working directory. If MR solutions specified in the PHIL contains multiple solutions then '''phaser.famos''' will output mulitple log files corresponding to each MR solution.<br />
<br />
===Do the MR solutions match?===<br />
<br />
A good match between two chains usually have a MLAD value below 1.5 whereas a bad match usually have a value above 2.0. This is a rule of thumb and exceptions do occur. It is advisable to visually inspect that the structures superpose one another in a molecular graphics viewing program.<br />
<br />
==Algorithm==<br />
<br />
'''phaser.famos''' computes configurations by looping over all symmetry operations and alternative origin shifts. An alignment between C-alpha atoms from moving.pdb and fixed.pdb is computed using secondary structure matching (SSM). If that fails or if the MLAD score achieved with SSM is larger than 2.0 an alignment is computed using MMTBX alignment functions which is part of the CCTBX. To estimate the best match a distance measure between the aligned C-alpha atoms is computed for each configuration. The mean log absolute deviation (MLAD) is defined as:<br />
<br />
MLAD(dR) = Σ( log(dr·dr/(|dr| + 0.9) + max(0.9, min(dr·dr,1))) - log(0.9)),<br />
<br />
where dr is the difference vector between a pair of aligned C-alpha atoms and the sum is taken over all atom pairs in the alignment. The factor log(0.9) is subtracted to ensure that MLAD(0) = 0.0, i.e. that two identical structures produces the value zero.<br />
<br />
MLAD can loosely be interpreted as a distance measure between structures. But it is not a metric in a strict mathematical sense since the triangle inequality is not fulfilled. Unlike a plain root mean square deviation the logarithm in the MLAD formula will downplay contributions of atom pairs where the atoms are spatially distant. If an RMSD were employed such atom pairs would contribute on an equal footing with those groups of atom pairs that can be superposed perfectly. This in turn may lead to non-optimal superpositions when the structures tested against one another consists of multiple domains where one domain has undergone a domain motion, i.e. where a subset of atoms in one chain are bound to be spatially distant from the atoms in the other chain they have been paired with.<br />
<br />
'''phaser.famos''' will for each chain in the fixed scope find the smallest MLAD with a copy of each chain in the moving scope for a given symmetry operation and alternative origin. When all chains in the fixed scope have been tested these copies will be saved to a file. For spacegroups with floating origin the minimum MLAD is found by doing a Golden Sectioning minimization along the polar axis for each copy of chains in moving.pdb.<br />
<br />
===Caveats===<br />
<br />
The program tests all solutions present in solution files entered as fixed scope against all solutions present in solution files entered as moving scope. Consequently the execution time is proportional to the number of solutions in fixed scope times the number of solutions in moving scope.<br />
<br />
The execution time is proportional to the number of SSM alignments being tested; if SSM identifies 12 alignments the program will take 12 times as long.<br />
<br />
===Changes===<br />
<br />
The command-line syntax mentioned in the Computational Crystallography Newsletter 2012 January has been replaced by PHIL syntax.<br />
<br />
===Literature===<br />
<br />
Algorithms for deriving crystallographic space-group information R.W. Große-Kunstleve Acta Cryst. A55, 383-395 (1999)<br />
<br />
phenix.find_alt_orig_sym_mate Robert D. Oeffner, Gábor Bunkóczi and Randy J. Read Computational Crystallography Newsletter 2012 January, 5-10 (2012)<br />
<br />
Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. E. Krissinel and K. Henrick Acta Cryst. D60, 2256-2268 (2004)</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Famos&diff=2321Famos2016-09-08T16:42:51Z<p>Rdo20: /* Output */</p>
<hr />
<div><div style="margin-left: 25px; float: right;">__NOTOC__</div><br />
<br />
'''phaser.famos''' ('''phaser.find_alt_orig_sym_mate''') is a script for determining the best common origin for different molecular replacement solutions, in real space. The coordinates do not need to be identical. The common origin is found via secondary structure matching.<br />
<br />
==Author==<br />
<br />
Robert D. Oeffner<br />
<br />
==Purpose==<br />
<br />
'''phaser.famos''' attempts to find the best superposition of two different molecular replacement solutions of the same dataset termed moving.pdb and fixed.pdb with respect to all symmetry operations and alternate origin shifts permitted by the spacegroup of the crystal. If either of the pdb files contain more chains each chain will be tested for their best match against chains in the other pdb file. It does so by calculating a score value, MLAD (see Algorithm), for all possible symmetry operations and alternate origin shifts of each chain in moving.pdb compared with chains in fixed.pdb. The transformation with the smallest MLAD is retained for that particular pair of chains.<br />
<br />
The script returns the best matches between pairs of chains in the two files. This includes the MLAD values, chain IDs, symmetry transformations and alternate origins for the two structures. If the same origin shift is not applied to all pairs of chains a warning will be printed. If the space group of the crystal has a floating origin this generally results in an origin offset between chains that is not a rational number.<br />
<br />
The superposed structure and the fixed structure can be visually inspected in a molecular viewer such as Coot or Pymol. Matches with low MLAD scores will have correspondingly good superpositions.<br />
<br />
==Usage from the command line==<br />
<br />
'''phaser.famos''' uses [https://www.phenix-online.org/documentation/file_formats.html#phil-files-eff-def-phil Phenix PHIL input] in either a text file or as keywords like:<br />
<br />
phaser.famos moving.pdb=pdbfile1 fixe.pdb=pdbfile2<br />
<br />
or:<br />
<br />
phaser.famos my_phil_input.txt.<br />
<br />
The PHIL input scopes, moving and fixed, specifies the MR solutions. Both scopes are programmatically equivalent and must be non-empty. This means that a scope should either specify the parameter xyzfname or the sub-scope mrsolution or the sub-scope pickle.solution.<br />
<br />
Examples:<br />
*Use the pdb file from one of the MR solutions in a scope. This is useful for simple cases such as when the solution comes from the PHENIX MRage GUI or when it is obtained from another MR program than Phaser.<br />
*Specify a Phaser MR solution by assigning the .sol file name of the solution to the mrsolution.solfname parameter of the sub-scope as well as assigning the IDs of the ensembles and the corresponding MR models to the parameters mrsolution.ensembles.name and mrsolution.ensembles.xyzfname. Multiple search components are specified as multiple ensembles. This way of specifying solutions is useful for solution files produced when Phaser is run from the command line and it produces several solutions of which one is the correct solution.<br />
*Use a solution from the Phaser-MR GUI in PHENIX. In that case the parameter pickle.solution.pklfname in the sub-scope should be assigned to the solution file that is produced by PHENIX after an MR calculation. The pickle.solution.philfname should then be assigned to the input file for the MR calculation. This way of specifying solutions is useful when MR solutions are available from having run Phaser through the PHENIX interface.<br />
*If space group and the unit cell dimensions is not available from any of the input files then these need to be specified by assigning the parameter, spacegroupfname, either to a PDB file with a CRYST1 record or to an MTZ file with that information. This should be the data file used for the molecular replacement calculation.<br />
<br />
<br />
===Examples of PHIL input===<br />
<br />
Unless the input just constitutes of two PDB files a PHIL file is the easiest way to enter input. A few examples of PHIL for '''phaser.famos''' are given below.<br />
<br />
Testing a command-line Phaser MR solution file against a solution specified as a PDB file:<br />
<br />
AltOrigSymMates.fixed.mrsolution<br />
{<br />
solfname = "testdata/MR_3ECI_A0_2P82_A0.sol"<br />
ensembles<br />
{<br />
name = "MR_2P82_A0"<br />
xyzfname = "testdata/sculpt_2P82_A0.pdb"<br />
}<br />
ensembles<br />
{<br />
name = "MR_3ECI_A0"<br />
xyzfname = "testdata/sculpt_3ECI_A0.pdb"<br />
}<br />
}<br />
<br />
AltOrigSymMates.moving_pdb="testdata/2z0d.pdb"<br />
<br />
Testing two set of solution files from the PHENIX Phaser-MR GUI against one another:<br />
<br />
AltOrigSymMates.fixed.pickle_solution<br />
{<br />
philfname = "testdata/phaser_mr_13.eff"<br />
pklfname = "testdata/phaser_mr_13.pkl"<br />
}<br />
AltOrigSymMates.moving.pickle_solution<br />
{<br />
philfname = "testdata/phaser_mr_11.eff"<br />
pklfname = "testdata/phaser_mr_11.pkl"<br />
}<br />
<br />
Testing two solution files from the command-line version of Phaser against one another:<br />
<br />
AltOrigSymMates.fixed.mrsolution<br />
{<br />
solfname = "testdata/MR_3ECI_A0_2P82_A0.sol"<br />
ensembles<br />
{<br />
name = "MR_2P82_A0"<br />
xyzfname = "testdata/sculpt_2P82_A0.pdb"<br />
}<br />
ensembles<br />
{<br />
name = "MR_3ECI_A0"<br />
xyzfname = "testdata/sculpt_3ECI_A0.pdb"<br />
}<br />
}<br />
AltOrigSymMates.moving.mrsolution<br />
{<br />
solfname = "testdata/MR_2ZPN_A0_2P82_A0.sol"<br />
ensembles<br />
{<br />
name = "MR_2P82_A0"<br />
xyzfname = "testdata/sculpt_2P82_A0.pdb"<br />
}<br />
ensembles<br />
{<br />
name = "MR_2ZPN_A0"<br />
xyzfname = "testdata/sculpt_2ZPN_A0.pdb"<br />
}<br />
}<br />
AltOrigSymMates.spacegroupfname = "testdata/2Z0D.mtz"<br />
<br />
For more information on the PHIL input see the bottom of this page.<br />
<br />
===Also move HETATM (hetero atoms)===<br />
<br />
Invoking this flag will move hetero atoms (ligands, waters, metals, etc.) in conjunction with their associated peptide chain. The program first invokes phenix.sort_hetatms as to associate hetero atoms sensibly with adjacent peptide chains.<br />
<br />
After having identified the transformation for a chain yielding the smallest MLAD score it then subjects the associated hetero atoms to the same transformation.<br />
<br />
===Debug mode===<br />
<br />
Invoking the debug flag produces individual pdb files of the C-alpha atoms used for each SSM alignment of the moving scope for each permitted symmetry operation and alternative origin. A gold atom is placed at the centroid of the C-alpha atoms. Similar files are produced for the fixed scope. These files are stored in a subfolder named "AltOrigSymMatesFiles".<br />
<br />
If a floating origin is present in the space group a table of MLAD values is produced by sliding a copy of the chains in the moving scope along the polar axis in the fractional interval [-0.5, 0.5] for each permitted symmetry operation and alternative origin.<br />
<br />
===List of all available keywords===<br />
<br />
See [https://www.phenix-online.org/documentation/reference/find_alt_orig_sym_mate.html#list-of-all-available-keywords Phenix documentation for phenix.find_alt_orig_sym_mate]<br />
<br />
==Output==<br />
<br />
The closest match between chains in the moving section to chains in the fixed section is saved with the name of the pdb file in the fixed scope, concatenated with the name of the pdb file in the moving scope but prepended with "MinMLAD_". A log file with the name of the pdb file in the fixed scope, concatenated with the name of the pdb file in the moving scope but prepended with "AltOrigSymMLAD_" is written containing standard output. All files are saved in the current working directory. If MR solutions specified in the PHIL contains multiple solutions then '''phaser.famos''' will output mulitple log files corresponding to each MR solution.<br />
<br />
===Do the MR solutions match?===<br />
<br />
A good match between two chains usually have a MLAD value below 1.5 whereas a bad match usually have a value above 2.0. This is a rule of thumb and exceptions do occur. It is advisable to visually inspect that the structures superpose one another in a molecular graphics viewing program.<br />
<br />
==Algorithm==<br />
<br />
phaser.famos computes configurations by looping over all symmetry operations and alternative origin shifts. An alignment between C-alpha atoms from moving.pdb and fixed.pdb is computed using secondary structure matching (SSM). If that fails or if the MLAD score achieved with SSM is larger than 2.0 an alignment is computed using MMTBX alignment functions which is part of the CCTBX. To estimate the best match a distance measure between the aligned C-alpha atoms is computed for each configuration. The mean log absolute deviation (MLAD) is defined as:<br />
<br />
MLAD(dR) = Σ( log(dr·dr/(|dr| + 0.9) + max(0.9, min(dr·dr,1))) - log(0.9)),<br />
<br />
where dr is the difference vector between a pair of aligned C-alpha atoms and the sum is taken over all atom pairs in the alignment. The factor log(0.9) is subtracted to ensure that MLAD(0) = 0.0, i.e. that two identical structures produces the value zero.<br />
<br />
MLAD can loosely be interpreted as a distance measure between structures. But it is not a metric in a strict mathematical sense since the triangle inequality is not fulfilled. Unlike a plain root mean square deviation the logarithm in the MLAD formula will downplay contributions of atom pairs where the atoms are spatially distant. If an RMSD were employed such atom pairs would contribute on an equal footing with those groups of atom pairs that can be superposed perfectly. This in turn may lead to non-optimal superpositions when the structures tested against one another consists of multiple domains where one domain has undergone a domain motion, i.e. where a subset of atoms in one chain are bound to be spatially distant from the atoms in the other chain they have been paired with.<br />
<br />
phaser.famos will for each chain in the fixed scope find the smallest MLAD with a copy of each chain in the moving scope for a given symmetry operation and alternative origin. When all chains in the fixed scope have been tested these copies will be saved to a file. For spacegroups with floating origin the minimum MLAD is found by doing a Golden Sectioning minimization along the polar axis for each copy of chains in moving.pdb.<br />
<br />
===Caveats===<br />
<br />
The program tests all solutions present in solution files entered as fixed scope against all solutions present in solution files entered as moving scope. Consequently the execution time is proportional to the number of solutions in fixed scope times the number of solutions in moving scope.<br />
<br />
The execution time is proportional to the number of SSM alignments being tested; if SSM identifies 12 alignments the program will take 12 times as long.<br />
<br />
===Changes===<br />
<br />
The command-line syntax mentioned in the Computational Crystallography Newsletter 2012 January has been replaced by PHIL syntax.<br />
<br />
===Literature===<br />
<br />
Algorithms for deriving crystallographic space-group information R.W. Große-Kunstleve Acta Cryst. A55, 383-395 (1999)<br />
<br />
phenix.find_alt_orig_sym_mate Robert D. Oeffner, Gábor Bunkóczi and Randy J. Read Computational Crystallography Newsletter 2012 January, 5-10 (2012)<br />
<br />
Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. E. Krissinel and K. Henrick Acta Cryst. D60, 2256-2268 (2004)</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Famos&diff=2320Famos2016-09-08T16:42:17Z<p>Rdo20: /* Examples of PHIL input */</p>
<hr />
<div><div style="margin-left: 25px; float: right;">__NOTOC__</div><br />
<br />
'''phaser.famos''' ('''phaser.find_alt_orig_sym_mate''') is a script for determining the best common origin for different molecular replacement solutions, in real space. The coordinates do not need to be identical. The common origin is found via secondary structure matching.<br />
<br />
==Author==<br />
<br />
Robert D. Oeffner<br />
<br />
==Purpose==<br />
<br />
'''phaser.famos''' attempts to find the best superposition of two different molecular replacement solutions of the same dataset termed moving.pdb and fixed.pdb with respect to all symmetry operations and alternate origin shifts permitted by the spacegroup of the crystal. If either of the pdb files contain more chains each chain will be tested for their best match against chains in the other pdb file. It does so by calculating a score value, MLAD (see Algorithm), for all possible symmetry operations and alternate origin shifts of each chain in moving.pdb compared with chains in fixed.pdb. The transformation with the smallest MLAD is retained for that particular pair of chains.<br />
<br />
The script returns the best matches between pairs of chains in the two files. This includes the MLAD values, chain IDs, symmetry transformations and alternate origins for the two structures. If the same origin shift is not applied to all pairs of chains a warning will be printed. If the space group of the crystal has a floating origin this generally results in an origin offset between chains that is not a rational number.<br />
<br />
The superposed structure and the fixed structure can be visually inspected in a molecular viewer such as Coot or Pymol. Matches with low MLAD scores will have correspondingly good superpositions.<br />
<br />
==Usage from the command line==<br />
<br />
'''phaser.famos''' uses [https://www.phenix-online.org/documentation/file_formats.html#phil-files-eff-def-phil Phenix PHIL input] in either a text file or as keywords like:<br />
<br />
phaser.famos moving.pdb=pdbfile1 fixe.pdb=pdbfile2<br />
<br />
or:<br />
<br />
phaser.famos my_phil_input.txt.<br />
<br />
The PHIL input scopes, moving and fixed, specifies the MR solutions. Both scopes are programmatically equivalent and must be non-empty. This means that a scope should either specify the parameter xyzfname or the sub-scope mrsolution or the sub-scope pickle.solution.<br />
<br />
Examples:<br />
*Use the pdb file from one of the MR solutions in a scope. This is useful for simple cases such as when the solution comes from the PHENIX MRage GUI or when it is obtained from another MR program than Phaser.<br />
*Specify a Phaser MR solution by assigning the .sol file name of the solution to the mrsolution.solfname parameter of the sub-scope as well as assigning the IDs of the ensembles and the corresponding MR models to the parameters mrsolution.ensembles.name and mrsolution.ensembles.xyzfname. Multiple search components are specified as multiple ensembles. This way of specifying solutions is useful for solution files produced when Phaser is run from the command line and it produces several solutions of which one is the correct solution.<br />
*Use a solution from the Phaser-MR GUI in PHENIX. In that case the parameter pickle.solution.pklfname in the sub-scope should be assigned to the solution file that is produced by PHENIX after an MR calculation. The pickle.solution.philfname should then be assigned to the input file for the MR calculation. This way of specifying solutions is useful when MR solutions are available from having run Phaser through the PHENIX interface.<br />
*If space group and the unit cell dimensions is not available from any of the input files then these need to be specified by assigning the parameter, spacegroupfname, either to a PDB file with a CRYST1 record or to an MTZ file with that information. This should be the data file used for the molecular replacement calculation.<br />
<br />
<br />
===Examples of PHIL input===<br />
<br />
Unless the input just constitutes of two PDB files a PHIL file is the easiest way to enter input. A few examples of PHIL for '''phaser.famos''' are given below.<br />
<br />
Testing a command-line Phaser MR solution file against a solution specified as a PDB file:<br />
<br />
AltOrigSymMates.fixed.mrsolution<br />
{<br />
solfname = "testdata/MR_3ECI_A0_2P82_A0.sol"<br />
ensembles<br />
{<br />
name = "MR_2P82_A0"<br />
xyzfname = "testdata/sculpt_2P82_A0.pdb"<br />
}<br />
ensembles<br />
{<br />
name = "MR_3ECI_A0"<br />
xyzfname = "testdata/sculpt_3ECI_A0.pdb"<br />
}<br />
}<br />
<br />
AltOrigSymMates.moving_pdb="testdata/2z0d.pdb"<br />
<br />
Testing two set of solution files from the PHENIX Phaser-MR GUI against one another:<br />
<br />
AltOrigSymMates.fixed.pickle_solution<br />
{<br />
philfname = "testdata/phaser_mr_13.eff"<br />
pklfname = "testdata/phaser_mr_13.pkl"<br />
}<br />
AltOrigSymMates.moving.pickle_solution<br />
{<br />
philfname = "testdata/phaser_mr_11.eff"<br />
pklfname = "testdata/phaser_mr_11.pkl"<br />
}<br />
<br />
Testing two solution files from the command-line version of Phaser against one another:<br />
<br />
AltOrigSymMates.fixed.mrsolution<br />
{<br />
solfname = "testdata/MR_3ECI_A0_2P82_A0.sol"<br />
ensembles<br />
{<br />
name = "MR_2P82_A0"<br />
xyzfname = "testdata/sculpt_2P82_A0.pdb"<br />
}<br />
ensembles<br />
{<br />
name = "MR_3ECI_A0"<br />
xyzfname = "testdata/sculpt_3ECI_A0.pdb"<br />
}<br />
}<br />
AltOrigSymMates.moving.mrsolution<br />
{<br />
solfname = "testdata/MR_2ZPN_A0_2P82_A0.sol"<br />
ensembles<br />
{<br />
name = "MR_2P82_A0"<br />
xyzfname = "testdata/sculpt_2P82_A0.pdb"<br />
}<br />
ensembles<br />
{<br />
name = "MR_2ZPN_A0"<br />
xyzfname = "testdata/sculpt_2ZPN_A0.pdb"<br />
}<br />
}<br />
AltOrigSymMates.spacegroupfname = "testdata/2Z0D.mtz"<br />
<br />
For more information on the PHIL input see the bottom of this page.<br />
<br />
===Also move HETATM (hetero atoms)===<br />
<br />
Invoking this flag will move hetero atoms (ligands, waters, metals, etc.) in conjunction with their associated peptide chain. The program first invokes phenix.sort_hetatms as to associate hetero atoms sensibly with adjacent peptide chains.<br />
<br />
After having identified the transformation for a chain yielding the smallest MLAD score it then subjects the associated hetero atoms to the same transformation.<br />
<br />
===Debug mode===<br />
<br />
Invoking the debug flag produces individual pdb files of the C-alpha atoms used for each SSM alignment of the moving scope for each permitted symmetry operation and alternative origin. A gold atom is placed at the centroid of the C-alpha atoms. Similar files are produced for the fixed scope. These files are stored in a subfolder named "AltOrigSymMatesFiles".<br />
<br />
If a floating origin is present in the space group a table of MLAD values is produced by sliding a copy of the chains in the moving scope along the polar axis in the fractional interval [-0.5, 0.5] for each permitted symmetry operation and alternative origin.<br />
<br />
===List of all available keywords===<br />
<br />
See [https://www.phenix-online.org/documentation/reference/find_alt_orig_sym_mate.html#list-of-all-available-keywords Phenix documentation for phenix.find_alt_orig_sym_mate]<br />
<br />
==Output==<br />
<br />
The closest match between chains in the moving section to chains in the fixed section is saved with the name of the pdb file in the fixed scope, concatenated with the name of the pdb file in the moving scope but prepended with "MinMLAD_". A log file with the name of the pdb file in the fixed scope, concatenated with the name of the pdb file in the moving scope but prepended with "AltOrigSymMLAD_" is written containing standard output. All files are saved in the current working directory. If MR solutions specified in the PHIL contains multiple solutions then phaser.famos will output mulitple log files corresponding to each MR solution.<br />
<br />
===Do the MR solutions match?===<br />
<br />
A good match between two chains usually have a MLAD value below 1.5 whereas a bad match usually have a value above 2.0. This is a rule of thumb and exceptions do occur. It is advisable to visually inspect that the structures superpose one another in a molecular graphics viewing program.<br />
<br />
==Algorithm==<br />
<br />
phaser.famos computes configurations by looping over all symmetry operations and alternative origin shifts. An alignment between C-alpha atoms from moving.pdb and fixed.pdb is computed using secondary structure matching (SSM). If that fails or if the MLAD score achieved with SSM is larger than 2.0 an alignment is computed using MMTBX alignment functions which is part of the CCTBX. To estimate the best match a distance measure between the aligned C-alpha atoms is computed for each configuration. The mean log absolute deviation (MLAD) is defined as:<br />
<br />
MLAD(dR) = Σ( log(dr·dr/(|dr| + 0.9) + max(0.9, min(dr·dr,1))) - log(0.9)),<br />
<br />
where dr is the difference vector between a pair of aligned C-alpha atoms and the sum is taken over all atom pairs in the alignment. The factor log(0.9) is subtracted to ensure that MLAD(0) = 0.0, i.e. that two identical structures produces the value zero.<br />
<br />
MLAD can loosely be interpreted as a distance measure between structures. But it is not a metric in a strict mathematical sense since the triangle inequality is not fulfilled. Unlike a plain root mean square deviation the logarithm in the MLAD formula will downplay contributions of atom pairs where the atoms are spatially distant. If an RMSD were employed such atom pairs would contribute on an equal footing with those groups of atom pairs that can be superposed perfectly. This in turn may lead to non-optimal superpositions when the structures tested against one another consists of multiple domains where one domain has undergone a domain motion, i.e. where a subset of atoms in one chain are bound to be spatially distant from the atoms in the other chain they have been paired with.<br />
<br />
phaser.famos will for each chain in the fixed scope find the smallest MLAD with a copy of each chain in the moving scope for a given symmetry operation and alternative origin. When all chains in the fixed scope have been tested these copies will be saved to a file. For spacegroups with floating origin the minimum MLAD is found by doing a Golden Sectioning minimization along the polar axis for each copy of chains in moving.pdb.<br />
<br />
===Caveats===<br />
<br />
The program tests all solutions present in solution files entered as fixed scope against all solutions present in solution files entered as moving scope. Consequently the execution time is proportional to the number of solutions in fixed scope times the number of solutions in moving scope.<br />
<br />
The execution time is proportional to the number of SSM alignments being tested; if SSM identifies 12 alignments the program will take 12 times as long.<br />
<br />
===Changes===<br />
<br />
The command-line syntax mentioned in the Computational Crystallography Newsletter 2012 January has been replaced by PHIL syntax.<br />
<br />
===Literature===<br />
<br />
Algorithms for deriving crystallographic space-group information R.W. Große-Kunstleve Acta Cryst. A55, 383-395 (1999)<br />
<br />
phenix.find_alt_orig_sym_mate Robert D. Oeffner, Gábor Bunkóczi and Randy J. Read Computational Crystallography Newsletter 2012 January, 5-10 (2012)<br />
<br />
Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. E. Krissinel and K. Henrick Acta Cryst. D60, 2256-2268 (2004)</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Famos&diff=2319Famos2016-09-08T16:40:13Z<p>Rdo20: /* Algorithm */</p>
<hr />
<div><div style="margin-left: 25px; float: right;">__NOTOC__</div><br />
<br />
'''phaser.famos''' ('''phaser.find_alt_orig_sym_mate''') is a script for determining the best common origin for different molecular replacement solutions, in real space. The coordinates do not need to be identical. The common origin is found via secondary structure matching.<br />
<br />
==Author==<br />
<br />
Robert D. Oeffner<br />
<br />
==Purpose==<br />
<br />
'''phaser.famos''' attempts to find the best superposition of two different molecular replacement solutions of the same dataset termed moving.pdb and fixed.pdb with respect to all symmetry operations and alternate origin shifts permitted by the spacegroup of the crystal. If either of the pdb files contain more chains each chain will be tested for their best match against chains in the other pdb file. It does so by calculating a score value, MLAD (see Algorithm), for all possible symmetry operations and alternate origin shifts of each chain in moving.pdb compared with chains in fixed.pdb. The transformation with the smallest MLAD is retained for that particular pair of chains.<br />
<br />
The script returns the best matches between pairs of chains in the two files. This includes the MLAD values, chain IDs, symmetry transformations and alternate origins for the two structures. If the same origin shift is not applied to all pairs of chains a warning will be printed. If the space group of the crystal has a floating origin this generally results in an origin offset between chains that is not a rational number.<br />
<br />
The superposed structure and the fixed structure can be visually inspected in a molecular viewer such as Coot or Pymol. Matches with low MLAD scores will have correspondingly good superpositions.<br />
<br />
==Usage from the command line==<br />
<br />
'''phaser.famos''' uses [https://www.phenix-online.org/documentation/file_formats.html#phil-files-eff-def-phil Phenix PHIL input] in either a text file or as keywords like:<br />
<br />
phaser.famos moving.pdb=pdbfile1 fixe.pdb=pdbfile2<br />
<br />
or:<br />
<br />
phaser.famos my_phil_input.txt.<br />
<br />
The PHIL input scopes, moving and fixed, specifies the MR solutions. Both scopes are programmatically equivalent and must be non-empty. This means that a scope should either specify the parameter xyzfname or the sub-scope mrsolution or the sub-scope pickle.solution.<br />
<br />
Examples:<br />
*Use the pdb file from one of the MR solutions in a scope. This is useful for simple cases such as when the solution comes from the PHENIX MRage GUI or when it is obtained from another MR program than Phaser.<br />
*Specify a Phaser MR solution by assigning the .sol file name of the solution to the mrsolution.solfname parameter of the sub-scope as well as assigning the IDs of the ensembles and the corresponding MR models to the parameters mrsolution.ensembles.name and mrsolution.ensembles.xyzfname. Multiple search components are specified as multiple ensembles. This way of specifying solutions is useful for solution files produced when Phaser is run from the command line and it produces several solutions of which one is the correct solution.<br />
*Use a solution from the Phaser-MR GUI in PHENIX. In that case the parameter pickle.solution.pklfname in the sub-scope should be assigned to the solution file that is produced by PHENIX after an MR calculation. The pickle.solution.philfname should then be assigned to the input file for the MR calculation. This way of specifying solutions is useful when MR solutions are available from having run Phaser through the PHENIX interface.<br />
*If space group and the unit cell dimensions is not available from any of the input files then these need to be specified by assigning the parameter, spacegroupfname, either to a PDB file with a CRYST1 record or to an MTZ file with that information. This should be the data file used for the molecular replacement calculation.<br />
<br />
<br />
===Examples of PHIL input===<br />
<br />
Unless the input just constitutes of two PDB files a PHIL file is the easiest way to enter input. A few examples of PHIL for phaser.famos are given below.<br />
<br />
Testing a command-line Phaser MR solution file against a solution specified as a PDB file:<br />
<br />
AltOrigSymMates.fixed.mrsolution<br />
{<br />
solfname = "testdata/MR_3ECI_A0_2P82_A0.sol"<br />
ensembles<br />
{<br />
name = "MR_2P82_A0"<br />
xyzfname = "testdata/sculpt_2P82_A0.pdb"<br />
}<br />
ensembles<br />
{<br />
name = "MR_3ECI_A0"<br />
xyzfname = "testdata/sculpt_3ECI_A0.pdb"<br />
}<br />
}<br />
<br />
AltOrigSymMates.moving_pdb="testdata/2z0d.pdb"<br />
<br />
Testing two set of solution files from the PHENIX Phaser-MR GUI against one another:<br />
<br />
AltOrigSymMates.fixed.pickle_solution<br />
{<br />
philfname = "testdata/phaser_mr_13.eff"<br />
pklfname = "testdata/phaser_mr_13.pkl"<br />
}<br />
AltOrigSymMates.moving.pickle_solution<br />
{<br />
philfname = "testdata/phaser_mr_11.eff"<br />
pklfname = "testdata/phaser_mr_11.pkl"<br />
}<br />
<br />
Testing two solution files from the command-line version of Phaser against one another:<br />
<br />
AltOrigSymMates.fixed.mrsolution<br />
{<br />
solfname = "testdata/MR_3ECI_A0_2P82_A0.sol"<br />
ensembles<br />
{<br />
name = "MR_2P82_A0"<br />
xyzfname = "testdata/sculpt_2P82_A0.pdb"<br />
}<br />
ensembles<br />
{<br />
name = "MR_3ECI_A0"<br />
xyzfname = "testdata/sculpt_3ECI_A0.pdb"<br />
}<br />
}<br />
AltOrigSymMates.moving.mrsolution<br />
{<br />
solfname = "testdata/MR_2ZPN_A0_2P82_A0.sol"<br />
ensembles<br />
{<br />
name = "MR_2P82_A0"<br />
xyzfname = "testdata/sculpt_2P82_A0.pdb"<br />
}<br />
ensembles<br />
{<br />
name = "MR_2ZPN_A0"<br />
xyzfname = "testdata/sculpt_2ZPN_A0.pdb"<br />
}<br />
}<br />
AltOrigSymMates.spacegroupfname = "testdata/2Z0D.mtz"<br />
<br />
For more information on the PHIL input see the bottom of this page.<br />
<br />
===Also move HETATM (hetero atoms)===<br />
<br />
Invoking this flag will move hetero atoms (ligands, waters, metals, etc.) in conjunction with their associated peptide chain. The program first invokes phenix.sort_hetatms as to associate hetero atoms sensibly with adjacent peptide chains.<br />
<br />
After having identified the transformation for a chain yielding the smallest MLAD score it then subjects the associated hetero atoms to the same transformation.<br />
<br />
===Debug mode===<br />
<br />
Invoking the debug flag produces individual pdb files of the C-alpha atoms used for each SSM alignment of the moving scope for each permitted symmetry operation and alternative origin. A gold atom is placed at the centroid of the C-alpha atoms. Similar files are produced for the fixed scope. These files are stored in a subfolder named "AltOrigSymMatesFiles".<br />
<br />
If a floating origin is present in the space group a table of MLAD values is produced by sliding a copy of the chains in the moving scope along the polar axis in the fractional interval [-0.5, 0.5] for each permitted symmetry operation and alternative origin.<br />
<br />
===List of all available keywords===<br />
<br />
See [https://www.phenix-online.org/documentation/reference/find_alt_orig_sym_mate.html#list-of-all-available-keywords Phenix documentation for phenix.find_alt_orig_sym_mate]<br />
<br />
==Output==<br />
<br />
The closest match between chains in the moving section to chains in the fixed section is saved with the name of the pdb file in the fixed scope, concatenated with the name of the pdb file in the moving scope but prepended with "MinMLAD_". A log file with the name of the pdb file in the fixed scope, concatenated with the name of the pdb file in the moving scope but prepended with "AltOrigSymMLAD_" is written containing standard output. All files are saved in the current working directory. If MR solutions specified in the PHIL contains multiple solutions then phaser.famos will output mulitple log files corresponding to each MR solution.<br />
<br />
===Do the MR solutions match?===<br />
<br />
A good match between two chains usually have a MLAD value below 1.5 whereas a bad match usually have a value above 2.0. This is a rule of thumb and exceptions do occur. It is advisable to visually inspect that the structures superpose one another in a molecular graphics viewing program.<br />
<br />
==Algorithm==<br />
<br />
phaser.famos computes configurations by looping over all symmetry operations and alternative origin shifts. An alignment between C-alpha atoms from moving.pdb and fixed.pdb is computed using secondary structure matching (SSM). If that fails or if the MLAD score achieved with SSM is larger than 2.0 an alignment is computed using MMTBX alignment functions which is part of the CCTBX. To estimate the best match a distance measure between the aligned C-alpha atoms is computed for each configuration. The mean log absolute deviation (MLAD) is defined as:<br />
<br />
MLAD(dR) = Σ( log(dr·dr/(|dr| + 0.9) + max(0.9, min(dr·dr,1))) - log(0.9)),<br />
<br />
where dr is the difference vector between a pair of aligned C-alpha atoms and the sum is taken over all atom pairs in the alignment. The factor log(0.9) is subtracted to ensure that MLAD(0) = 0.0, i.e. that two identical structures produces the value zero.<br />
<br />
MLAD can loosely be interpreted as a distance measure between structures. But it is not a metric in a strict mathematical sense since the triangle inequality is not fulfilled. Unlike a plain root mean square deviation the logarithm in the MLAD formula will downplay contributions of atom pairs where the atoms are spatially distant. If an RMSD were employed such atom pairs would contribute on an equal footing with those groups of atom pairs that can be superposed perfectly. This in turn may lead to non-optimal superpositions when the structures tested against one another consists of multiple domains where one domain has undergone a domain motion, i.e. where a subset of atoms in one chain are bound to be spatially distant from the atoms in the other chain they have been paired with.<br />
<br />
phaser.famos will for each chain in the fixed scope find the smallest MLAD with a copy of each chain in the moving scope for a given symmetry operation and alternative origin. When all chains in the fixed scope have been tested these copies will be saved to a file. For spacegroups with floating origin the minimum MLAD is found by doing a Golden Sectioning minimization along the polar axis for each copy of chains in moving.pdb.<br />
<br />
===Caveats===<br />
<br />
The program tests all solutions present in solution files entered as fixed scope against all solutions present in solution files entered as moving scope. Consequently the execution time is proportional to the number of solutions in fixed scope times the number of solutions in moving scope.<br />
<br />
The execution time is proportional to the number of SSM alignments being tested; if SSM identifies 12 alignments the program will take 12 times as long.<br />
<br />
===Changes===<br />
<br />
The command-line syntax mentioned in the Computational Crystallography Newsletter 2012 January has been replaced by PHIL syntax.<br />
<br />
===Literature===<br />
<br />
Algorithms for deriving crystallographic space-group information R.W. Große-Kunstleve Acta Cryst. A55, 383-395 (1999)<br />
<br />
phenix.find_alt_orig_sym_mate Robert D. Oeffner, Gábor Bunkóczi and Randy J. Read Computational Crystallography Newsletter 2012 January, 5-10 (2012)<br />
<br />
Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. E. Krissinel and K. Henrick Acta Cryst. D60, 2256-2268 (2004)</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Famos&diff=2318Famos2016-09-08T16:31:50Z<p>Rdo20: /* Examples of PHIL input */</p>
<hr />
<div><div style="margin-left: 25px; float: right;">__NOTOC__</div><br />
<br />
'''phaser.famos''' ('''phaser.find_alt_orig_sym_mate''') is a script for determining the best common origin for different molecular replacement solutions, in real space. The coordinates do not need to be identical. The common origin is found via secondary structure matching.<br />
<br />
==Author==<br />
<br />
Robert D. Oeffner<br />
<br />
==Purpose==<br />
<br />
'''phaser.famos''' attempts to find the best superposition of two different molecular replacement solutions of the same dataset termed moving.pdb and fixed.pdb with respect to all symmetry operations and alternate origin shifts permitted by the spacegroup of the crystal. If either of the pdb files contain more chains each chain will be tested for their best match against chains in the other pdb file. It does so by calculating a score value, MLAD (see Algorithm), for all possible symmetry operations and alternate origin shifts of each chain in moving.pdb compared with chains in fixed.pdb. The transformation with the smallest MLAD is retained for that particular pair of chains.<br />
<br />
The script returns the best matches between pairs of chains in the two files. This includes the MLAD values, chain IDs, symmetry transformations and alternate origins for the two structures. If the same origin shift is not applied to all pairs of chains a warning will be printed. If the space group of the crystal has a floating origin this generally results in an origin offset between chains that is not a rational number.<br />
<br />
The superposed structure and the fixed structure can be visually inspected in a molecular viewer such as Coot or Pymol. Matches with low MLAD scores will have correspondingly good superpositions.<br />
<br />
==Usage from the command line==<br />
<br />
'''phaser.famos''' uses [https://www.phenix-online.org/documentation/file_formats.html#phil-files-eff-def-phil Phenix PHIL input] in either a text file or as keywords like:<br />
<br />
phaser.famos moving.pdb=pdbfile1 fixe.pdb=pdbfile2<br />
<br />
or:<br />
<br />
phaser.famos my_phil_input.txt.<br />
<br />
The PHIL input scopes, moving and fixed, specifies the MR solutions. Both scopes are programmatically equivalent and must be non-empty. This means that a scope should either specify the parameter xyzfname or the sub-scope mrsolution or the sub-scope pickle.solution.<br />
<br />
Examples:<br />
*Use the pdb file from one of the MR solutions in a scope. This is useful for simple cases such as when the solution comes from the PHENIX MRage GUI or when it is obtained from another MR program than Phaser.<br />
*Specify a Phaser MR solution by assigning the .sol file name of the solution to the mrsolution.solfname parameter of the sub-scope as well as assigning the IDs of the ensembles and the corresponding MR models to the parameters mrsolution.ensembles.name and mrsolution.ensembles.xyzfname. Multiple search components are specified as multiple ensembles. This way of specifying solutions is useful for solution files produced when Phaser is run from the command line and it produces several solutions of which one is the correct solution.<br />
*Use a solution from the Phaser-MR GUI in PHENIX. In that case the parameter pickle.solution.pklfname in the sub-scope should be assigned to the solution file that is produced by PHENIX after an MR calculation. The pickle.solution.philfname should then be assigned to the input file for the MR calculation. This way of specifying solutions is useful when MR solutions are available from having run Phaser through the PHENIX interface.<br />
*If space group and the unit cell dimensions is not available from any of the input files then these need to be specified by assigning the parameter, spacegroupfname, either to a PDB file with a CRYST1 record or to an MTZ file with that information. This should be the data file used for the molecular replacement calculation.<br />
<br />
<br />
===Examples of PHIL input===<br />
<br />
Unless the input just constitutes of two PDB files a PHIL file is the easiest way to enter input. A few examples of PHIL for phaser.famos are given below.<br />
<br />
Testing a command-line Phaser MR solution file against a solution specified as a PDB file:<br />
<br />
AltOrigSymMates.fixed.mrsolution<br />
{<br />
solfname = "testdata/MR_3ECI_A0_2P82_A0.sol"<br />
ensembles<br />
{<br />
name = "MR_2P82_A0"<br />
xyzfname = "testdata/sculpt_2P82_A0.pdb"<br />
}<br />
ensembles<br />
{<br />
name = "MR_3ECI_A0"<br />
xyzfname = "testdata/sculpt_3ECI_A0.pdb"<br />
}<br />
}<br />
<br />
AltOrigSymMates.moving_pdb="testdata/2z0d.pdb"<br />
<br />
Testing two set of solution files from the PHENIX Phaser-MR GUI against one another:<br />
<br />
AltOrigSymMates.fixed.pickle_solution<br />
{<br />
philfname = "testdata/phaser_mr_13.eff"<br />
pklfname = "testdata/phaser_mr_13.pkl"<br />
}<br />
AltOrigSymMates.moving.pickle_solution<br />
{<br />
philfname = "testdata/phaser_mr_11.eff"<br />
pklfname = "testdata/phaser_mr_11.pkl"<br />
}<br />
<br />
Testing two solution files from the command-line version of Phaser against one another:<br />
<br />
AltOrigSymMates.fixed.mrsolution<br />
{<br />
solfname = "testdata/MR_3ECI_A0_2P82_A0.sol"<br />
ensembles<br />
{<br />
name = "MR_2P82_A0"<br />
xyzfname = "testdata/sculpt_2P82_A0.pdb"<br />
}<br />
ensembles<br />
{<br />
name = "MR_3ECI_A0"<br />
xyzfname = "testdata/sculpt_3ECI_A0.pdb"<br />
}<br />
}<br />
AltOrigSymMates.moving.mrsolution<br />
{<br />
solfname = "testdata/MR_2ZPN_A0_2P82_A0.sol"<br />
ensembles<br />
{<br />
name = "MR_2P82_A0"<br />
xyzfname = "testdata/sculpt_2P82_A0.pdb"<br />
}<br />
ensembles<br />
{<br />
name = "MR_2ZPN_A0"<br />
xyzfname = "testdata/sculpt_2ZPN_A0.pdb"<br />
}<br />
}<br />
AltOrigSymMates.spacegroupfname = "testdata/2Z0D.mtz"<br />
<br />
For more information on the PHIL input see the bottom of this page.<br />
<br />
===Also move HETATM (hetero atoms)===<br />
<br />
Invoking this flag will move hetero atoms (ligands, waters, metals, etc.) in conjunction with their associated peptide chain. The program first invokes phenix.sort_hetatms as to associate hetero atoms sensibly with adjacent peptide chains.<br />
<br />
After having identified the transformation for a chain yielding the smallest MLAD score it then subjects the associated hetero atoms to the same transformation.<br />
<br />
===Debug mode===<br />
<br />
Invoking the debug flag produces individual pdb files of the C-alpha atoms used for each SSM alignment of the moving scope for each permitted symmetry operation and alternative origin. A gold atom is placed at the centroid of the C-alpha atoms. Similar files are produced for the fixed scope. These files are stored in a subfolder named "AltOrigSymMatesFiles".<br />
<br />
If a floating origin is present in the space group a table of MLAD values is produced by sliding a copy of the chains in the moving scope along the polar axis in the fractional interval [-0.5, 0.5] for each permitted symmetry operation and alternative origin.<br />
<br />
===List of all available keywords===<br />
<br />
See [https://www.phenix-online.org/documentation/reference/find_alt_orig_sym_mate.html#list-of-all-available-keywords Phenix documentation for phenix.find_alt_orig_sym_mate]<br />
<br />
==Output==<br />
<br />
The closest match between chains in the moving section to chains in the fixed section is saved with the name of the pdb file in the fixed scope, concatenated with the name of the pdb file in the moving scope but prepended with "MinMLAD_". A log file with the name of the pdb file in the fixed scope, concatenated with the name of the pdb file in the moving scope but prepended with "AltOrigSymMLAD_" is written containing standard output. All files are saved in the current working directory. If MR solutions specified in the PHIL contains multiple solutions then phaser.famos will output mulitple log files corresponding to each MR solution.<br />
<br />
===Do the MR solutions match?===<br />
<br />
A good match between two chains usually have a MLAD value below 1.5 whereas a bad match usually have a value above 2.0. This is a rule of thumb and exceptions do occur. It is advisable to visually inspect that the structures superpose one another in a molecular graphics viewing program.<br />
<br />
==Algorithm==<br />
<br />
phaser.famos computes configurations by looping over all symmetry operations and alternative origin shifts. An alignment between C-alpha atoms from moving_pdb and fixed_pdb is computed using secondary structure matching (SSM). If that fails or if the MLAD score achieved with SSM is larger than 2.0 an alignment is computed using MMTBX alignment functions which is part of the CCTBX. To estimate the best match a distance measure between the aligned C-alpha atoms is computed for each configuration. The mean log absolute deviation (MLAD) is defined as:<br />
<br />
MLAD(dR) = Σ( log(dr·dr/(|dr| + 0.9) + max(0.9, min(dr·dr,1))) - log(0.9)),<br />
<br />
where dr is the difference vector between a pair of aligned C-alpha atoms and the sum is taken over all atom pairs in the alignment. The factor log(0.9) is subtracted to ensure that MLAD(0) = 0.0, i.e. that two identical structures produces the value zero.<br />
<br />
MLAD can loosely be interpreted as a distance measure between structures. But it is not a metric in a strict mathematical sense since the triangle inequality is not fulfilled. Unlike a plain root mean square deviation the logarithm in the MLAD formula will downplay contributions of atom pairs where the atoms are spatially distant. If an RMSD were employed such contributions would contribute with the same amount as those atom pairs that can be superposed perfectly. This in turn may lead to incorrect super-positions when the chains tested against one another consists of multiple domains where one domain has undergone a domain motion, i.e. where a subset of atoms in one chain are bound to be spatially distant from the atoms in the other chain they have been paired with.<br />
<br />
'''phaser.famos''' will for each chain in the fixed scope find the smallest MLAD with a copy of each chain in the moving scope for a given symmetry operation and alternative origin. When all chains in the fixed scope have been tested these copies will be saved to a file. For space groups with floating origin the minimum MLAD is found by doing a Golden Sectioning minimization along the polar axis for each copy of chains in moving_pdb.<br />
<br />
<br />
===Caveats===<br />
<br />
The program tests all solutions present in solution files entered as fixed scope against all solutions present in solution files entered as moving scope. Consequently the execution time is proportional to the number of solutions in fixed scope times the number of solutions in moving scope.<br />
<br />
The execution time is proportional to the number of SSM alignments being tested; if SSM identifies 12 alignments the program will take 12 times as long.<br />
<br />
===Changes===<br />
<br />
The command-line syntax mentioned in the Computational Crystallography Newsletter 2012 January has been replaced by PHIL syntax.<br />
<br />
===Literature===<br />
<br />
Algorithms for deriving crystallographic space-group information R.W. Große-Kunstleve Acta Cryst. A55, 383-395 (1999)<br />
<br />
phenix.find_alt_orig_sym_mate Robert D. Oeffner, Gábor Bunkóczi and Randy J. Read Computational Crystallography Newsletter 2012 January, 5-10 (2012)<br />
<br />
Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. E. Krissinel and K. Henrick Acta Cryst. D60, 2256-2268 (2004)</div>Rdo20https://www.phaser.cimr.cam.ac.uk/index.php?title=Famos&diff=2317Famos2016-09-08T16:30:58Z<p>Rdo20: /* Examples of PHIL input */</p>
<hr />
<div><div style="margin-left: 25px; float: right;">__NOTOC__</div><br />
<br />
'''phaser.famos''' ('''phaser.find_alt_orig_sym_mate''') is a script for determining the best common origin for different molecular replacement solutions, in real space. The coordinates do not need to be identical. The common origin is found via secondary structure matching.<br />
<br />
==Author==<br />
<br />
Robert D. Oeffner<br />
<br />
==Purpose==<br />
<br />
'''phaser.famos''' attempts to find the best superposition of two different molecular replacement solutions of the same dataset termed moving.pdb and fixed.pdb with respect to all symmetry operations and alternate origin shifts permitted by the spacegroup of the crystal. If either of the pdb files contain more chains each chain will be tested for their best match against chains in the other pdb file. It does so by calculating a score value, MLAD (see Algorithm), for all possible symmetry operations and alternate origin shifts of each chain in moving.pdb compared with chains in fixed.pdb. The transformation with the smallest MLAD is retained for that particular pair of chains.<br />
<br />
The script returns the best matches between pairs of chains in the two files. This includes the MLAD values, chain IDs, symmetry transformations and alternate origins for the two structures. If the same origin shift is not applied to all pairs of chains a warning will be printed. If the space group of the crystal has a floating origin this generally results in an origin offset between chains that is not a rational number.<br />
<br />
The superposed structure and the fixed structure can be visually inspected in a molecular viewer such as Coot or Pymol. Matches with low MLAD scores will have correspondingly good superpositions.<br />
<br />
==Usage from the command line==<br />
<br />
'''phaser.famos''' uses [https://www.phenix-online.org/documentation/file_formats.html#phil-files-eff-def-phil Phenix PHIL input] in either a text file or as keywords like:<br />
<br />
phaser.famos moving.pdb=pdbfile1 fixe.pdb=pdbfile2<br />
<br />
or:<br />
<br />
phaser.famos my_phil_input.txt.<br />
<br />
The PHIL input scopes, moving and fixed, specifies the MR solutions. Both scopes are programmatically equivalent and must be non-empty. This means that a scope should either specify the parameter xyzfname or the sub-scope mrsolution or the sub-scope pickle.solution.<br />
<br />
Examples:<br />
*Use the pdb file from one of the MR solutions in a scope. This is useful for simple cases such as when the solution comes from the PHENIX MRage GUI or when it is obtained from another MR program than Phaser.<br />
*Specify a Phaser MR solution by assigning the .sol file name of the solution to the mrsolution.solfname parameter of the sub-scope as well as assigning the IDs of the ensembles and the corresponding MR models to the parameters mrsolution.ensembles.name and mrsolution.ensembles.xyzfname. Multiple search components are specified as multiple ensembles. This way of specifying solutions is useful for solution files produced when Phaser is run from the command line and it produces several solutions of which one is the correct solution.<br />
*Use a solution from the Phaser-MR GUI in PHENIX. In that case the parameter pickle.solution.pklfname in the sub-scope should be assigned to the solution file that is produced by PHENIX after an MR calculation. The pickle.solution.philfname should then be assigned to the input file for the MR calculation. This way of specifying solutions is useful when MR solutions are available from having run Phaser through the PHENIX interface.<br />
*If space group and the unit cell dimensions is not available from any of the input files then these need to be specified by assigning the parameter, spacegroupfname, either to a PDB file with a CRYST1 record or to an MTZ file with that information. This should be the data file used for the molecular replacement calculation.<br />
<br />
<br />
===Examples of PHIL input===<br />
<br />
Unless the input just constitutes of two PDB files a PHIL file is the easiest way to enter input. A few examples of PHIL for phaser.famos are given below.<br />
<br />
Testing a command-line Phaser MR solution file against a solution specified as a PDB file:<br />
<br />
AltOrigSymMates.fixed.mrsolution<br />
{<br />
solfname = "testdata/MR_3ECI_A0_2P82_A0.sol"<br />
ensembles<br />
{<br />
name = "MR_2P82_A0"<br />
xyzfname = "testdata/sculpt_2P82_A0.pdb"<br />
}<br />
ensembles<br />
{<br />
name = "MR_3ECI_A0"<br />
xyzfname = "testdata/sculpt_3ECI_A0.pdb"<br />
}<br />
}<br />
<br />
AltOrigSymMates.moving_pdb="testdata/2z0d.pdb"<br />
<br />
Testing two set of solution files from the PHENIX Phaser-MR GUI against one another:<br />
<br />
AltOrigSymMates.fixed.pickle_solution<br />
{<br />
philfname = "testdata/phaser_mr_13.eff"<br />
pklfname = "testdata/phaser_mr_13.pkl"<br />
}<br />
AltOrigSymMates.moving.pickle_solution<br />
{<br />
philfname = "testdata/phaser_mr_11.eff"<br />
pklfname = "testdata/phaser_mr_11.pkl"<br />
}<br />
<br />
Testing two solution files from the command-line version of Phaser against one another:<br />
<br />
AltOrigSymMates.fixed.mrsolution<br />
{<br />
solfname = "testdata/MR_3ECI_A0_2P82_A0.sol"<br />
ensembles<br />
{<br />
name = "MR_2P82_A0"<br />
xyzfname = "testdata/sculpt_2P82_A0.pdb"<br />
}<br />
ensembles<br />
{<br />
name = "MR_3ECI_A0"<br />
xyzfname = "testdata/sculpt_3ECI_A0.pdb"<br />
}<br />
}<br />
<br />
AltOrigSymMates.moving.mrsolution<br />
{<br />
solfname = "testdata/MR_2ZPN_A0_2P82_A0.sol"<br />
ensembles<br />
{<br />
name = "MR_2P82_A0"<br />
xyzfname = "testdata/sculpt_2P82_A0.pdb"<br />
}<br />
ensembles<br />
{<br />
name = "MR_2ZPN_A0"<br />
xyzfname = "testdata/sculpt_2ZPN_A0.pdb"<br />
}<br />
}<br />
AltOrigSymMates.spacegroupfname = "testdata/2Z0D.mtz"<br />
<br />
For more information on the PHIL input see the bottom of this page.<br />
<br />
===Also move HETATM (hetero atoms)===<br />
<br />
Invoking this flag will move hetero atoms (ligands, waters, metals, etc.) in conjunction with their associated peptide chain. The program first invokes phenix.sort_hetatms as to associate hetero atoms sensibly with adjacent peptide chains.<br />
<br />
After having identified the transformation for a chain yielding the smallest MLAD score it then subjects the associated hetero atoms to the same transformation.<br />
<br />
===Debug mode===<br />
<br />
Invoking the debug flag produces individual pdb files of the C-alpha atoms used for each SSM alignment of the moving scope for each permitted symmetry operation and alternative origin. A gold atom is placed at the centroid of the C-alpha atoms. Similar files are produced for the fixed scope. These files are stored in a subfolder named "AltOrigSymMatesFiles".<br />
<br />
If a floating origin is present in the space group a table of MLAD values is produced by sliding a copy of the chains in the moving scope along the polar axis in the fractional interval [-0.5, 0.5] for each permitted symmetry operation and alternative origin.<br />
<br />
===List of all available keywords===<br />
<br />
See [https://www.phenix-online.org/documentation/reference/find_alt_orig_sym_mate.html#list-of-all-available-keywords Phenix documentation for phenix.find_alt_orig_sym_mate]<br />
<br />
==Output==<br />
<br />
The closest match between chains in the moving section to chains in the fixed section is saved with the name of the pdb file in the fixed scope, concatenated with the name of the pdb file in the moving scope but prepended with "MinMLAD_". A log file with the name of the pdb file in the fixed scope, concatenated with the name of the pdb file in the moving scope but prepended with "AltOrigSymMLAD_" is written containing standard output. All files are saved in the current working directory. If MR solutions specified in the PHIL contains multiple solutions then phaser.famos will output mulitple log files corresponding to each MR solution.<br />
<br />
===Do the MR solutions match?===<br />
<br />
A good match between two chains usually have a MLAD value below 1.5 whereas a bad match usually have a value above 2.0. This is a rule of thumb and exceptions do occur. It is advisable to visually inspect that the structures superpose one another in a molecular graphics viewing program.<br />
<br />
==Algorithm==<br />
<br />
phaser.famos computes configurations by looping over all symmetry operations and alternative origin shifts. An alignment between C-alpha atoms from moving_pdb and fixed_pdb is computed using secondary structure matching (SSM). If that fails or if the MLAD score achieved with SSM is larger than 2.0 an alignment is computed using MMTBX alignment functions which is part of the CCTBX. To estimate the best match a distance measure between the aligned C-alpha atoms is computed for each configuration. The mean log absolute deviation (MLAD) is defined as:<br />
<br />
MLAD(dR) = Σ( log(dr·dr/(|dr| + 0.9) + max(0.9, min(dr·dr,1))) - log(0.9)),<br />
<br />
where dr is the difference vector between a pair of aligned C-alpha atoms and the sum is taken over all atom pairs in the alignment. The factor log(0.9) is subtracted to ensure that MLAD(0) = 0.0, i.e. that two identical structures produces the value zero.<br />
<br />
MLAD can loosely be interpreted as a distance measure between structures. But it is not a metric in a strict mathematical sense since the triangle inequality is not fulfilled. Unlike a plain root mean square deviation the logarithm in the MLAD formula will downplay contributions of atom pairs where the atoms are spatially distant. If an RMSD were employed such contributions would contribute with the same amount as those atom pairs that can be superposed perfectly. This in turn may lead to incorrect super-positions when the chains tested against one another consists of multiple domains where one domain has undergone a domain motion, i.e. where a subset of atoms in one chain are bound to be spatially distant from the atoms in the other chain they have been paired with.<br />
<br />
'''phaser.famos''' will for each chain in the fixed scope find the smallest MLAD with a copy of each chain in the moving scope for a given symmetry operation and alternative origin. When all chains in the fixed scope have been tested these copies will be saved to a file. For space groups with floating origin the minimum MLAD is found by doing a Golden Sectioning minimization along the polar axis for each copy of chains in moving_pdb.<br />
<br />
<br />
===Caveats===<br />
<br />
The program tests all solutions present in solution files entered as fixed scope against all solutions present in solution files entered as moving scope. Consequently the execution time is proportional to the number of solutions in fixed scope times the number of solutions in moving scope.<br />
<br />
The execution time is proportional to the number of SSM alignments being tested; if SSM identifies 12 alignments the program will take 12 times as long.<br />
<br />
===Changes===<br />
<br />
The command-line syntax mentioned in the Computational Crystallography Newsletter 2012 January has been replaced by PHIL syntax.<br />
<br />
===Literature===<br />
<br />
Algorithms for deriving crystallographic space-group information R.W. Große-Kunstleve Acta Cryst. A55, 383-395 (1999)<br />
<br />
phenix.find_alt_orig_sym_mate Robert D. Oeffner, Gábor Bunkóczi and Randy J. Read Computational Crystallography Newsletter 2012 January, 5-10 (2012)<br />
<br />
Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. E. Krissinel and K. Henrick Acta Cryst. D60, 2256-2268 (2004)</div>Rdo20