Difference between revisions of "MR using keyword input"

From Phaserwiki
(Sample Script)
 
(4 intermediate revisions by 2 users not shown)
Line 7: Line 7:
 
   MODE MR_AUTO
 
   MODE MR_AUTO
 
   HKLIN toxd.mtz
 
   HKLIN toxd.mtz
   LABIN F = FTOXD3 SIGF = SIGFTOXD3
+
   LABIN F = FTOXD3 SIGF = SIGFTOXD3 #optional from Phaser-2.7.12
 
   ENSEMBLE toxd PDBFILE 1D0D_B.pdb IDENTITY 0.364
 
   ENSEMBLE toxd PDBFILE 1D0D_B.pdb IDENTITY 0.364
 
   COMPOSITION PROTEIN SEQUENCE toxd.seq NUM 1
 
   COMPOSITION PROTEIN SEQUENCE toxd.seq NUM 1
Line 32: Line 32:
 
   LABIN F = FTOXD3 SIGF = SIGFTOXD3
 
   LABIN F = FTOXD3 SIGF = SIGFTOXD3
  
specifies which columns the structure factor amplitudes and their standard deviations come from.
+
specifies which columns the structure factor amplitudes and their standard deviations come from.  Note that, when intensities and their standard deviations are available (which is not the case for this structure), it is preferable to use intensities instead of amplitudes.  These are provided through a LABIN command as well, but with I and SIGI instead of F and SIGF.
 +
From Phaser-2.7.12, if only one I column is present in the mtz file, this will be used else if only one F column is present in the mtz file, this will be used.
  
 
Next we need to specify a model that we will use for the molecular replacement. The line:
 
Next we need to specify a model that we will use for the molecular replacement. The line:
Line 60: Line 61:
  
 
Let us modify our script file a little. Use your favourite editor and, after the ENSEMBLE command, add the line:
 
Let us modify our script file a little. Use your favourite editor and, after the ENSEMBLE command, add the line:
   PDBFILE 1BIK_on_1D0D.pdb IDENTITY 0.377
+
   ENSEMBLE toxd PDBFILE 1BIK_on_1D0D.pdb IDENTITY 0.377
  
with an ampersand also added at the end of the previous line. The ampersand means line continuation - you could leave out the ampersand and put the information about the new pdbfile on the same line. To avoid overwriting your old files, change the file root to AUTO_toxd2. Your script job will look like this (AUTO_toxd2.com in the tutorial directory):
+
To avoid overwriting your old files, change the file root to AUTO_toxd2. Your script job will look like this (AUTO_toxd2.com in the tutorial directory):
  
 
   phaser << eof
 
   phaser << eof
Line 68: Line 69:
 
   HKLIN toxd.mtz
 
   HKLIN toxd.mtz
 
   LABIN F = FTOXD3 SIGF = SIGFTOXD3
 
   LABIN F = FTOXD3 SIGF = SIGFTOXD3
   ENSEMBLE toxd PDBFILE 1D0D_B.pdb IDENTITY 0.364 PDBFILE 1BIK_on_1D0D.pdb IDENTITY 0.377
+
   ENSEMBLE toxd PDBFILE 1D0D_B.pdb IDENTITY 0.364
 +
  ENSEMBLE toxd PDBFILE 1BIK_on_1D0D.pdb IDENTITY 0.377
 
   COMPOSITION PROTEIN SEQUENCE toxd.seq NUM 1
 
   COMPOSITION PROTEIN SEQUENCE toxd.seq NUM 1
 
   SEARCH ENSEMBLE toxd NUM 1
 
   SEARCH ENSEMBLE toxd NUM 1
Line 76: Line 78:
 
What have we done? We have now specified a second PDB File to be added to our model. But why do we give it the same handle? Remember I promised to talk in more detail about the ENSEMBLE keyword? Well here we go: What ENSEMBLE really does is that it takes all the PDB files and merges them together into an averaged model. See, neither 1D0D_B nor 1BIK are very good on their own but we can hope that if we put the two together into one single model that their identical features emphasize each other and that their dissimilar parts will be weighted down. This is exactly what the ENSEMBLE is for. It takes a number of PDB files and combines them using their sequence identity (or RMS error) to compute weighting factors.
 
What have we done? We have now specified a second PDB File to be added to our model. But why do we give it the same handle? Remember I promised to talk in more detail about the ENSEMBLE keyword? Well here we go: What ENSEMBLE really does is that it takes all the PDB files and merges them together into an averaged model. See, neither 1D0D_B nor 1BIK are very good on their own but we can hope that if we put the two together into one single model that their identical features emphasize each other and that their dissimilar parts will be weighted down. This is exactly what the ENSEMBLE is for. It takes a number of PDB files and combines them using their sequence identity (or RMS error) to compute weighting factors.
  
Of course, for this to make any sense the models in the PDB files have to have the same relative orientation. In the distributed files, you will find a file 1BIK.pdb, but not the file 1BIK_on_1D0D.pdb. You will have to generate this by using another program to superimpose 1BIK on 1D0D. Probably the most convenient is to use the SSM superpose option in coot. It is VERY important to view your PDB files together in a graphics program (like O, PyMol, xfit or coot) before you attempt to use them in this way.
+
Of course, for this to make any sense the models in the PDB files have to have the same relative orientation. In the distributed files, you will find a file 1BIK.pdb, but not the file 1BIK_on_1D0D.pdb. You will have to generate this by using another program to superimpose 1BIK on 1D0D. Probably the most convenient is to use the SSM superpose option in coot. It is VERY important to view your PDB files together in a graphics program (like O, PyMol, xfit or coot) before you attempt to use them in this way
 +
 
 +
An alternative to this manual approach is to use the Ensembler program, which will superimpose the models and place them into a single merged PDB file.
  
 
In our previous example we only had one single PDB file so the ENSEMBLE keyword didn't really mean a lot.
 
In our previous example we only had one single PDB file so the ENSEMBLE keyword didn't really mean a lot.
Line 97: Line 101:
 
   eof
 
   eof
  
Here we define a separate ENSEMBLE for each separate rigid body that we will be looking for, and we give separate SEARCH commands for each one we wish to look for in this job. This job has been set up to look first for the bigger component, beta, which will be easier to find. Then the information from the fixed beta component will be used in looking for the smaller (and harder to locate) BLIP component.
+
Here we define a separate ENSEMBLE for each separate rigid body that we will be looking for, and we give separate SEARCH commands for each one we wish to look for in this job. Regardless of the order in which we specify the SEARCHes, by default Phaser will search first for the component expected to be easier to find. In this case, this is the bigger molecule, beta. Once beta has been placed, the information from the fixed beta component will be used in looking for the smaller (and harder to locate) BLIP component.
  
 
Note the NUM subkey of the SEARCH command. If we were looking for more than one copy, we could give a NUM greater than 1. Note also that, for convenience, more than one COMPOSITION command can be given. Phaser will just add up the compositions given for the separate components.
 
Note the NUM subkey of the SEARCH command. If we were looking for more than one copy, we could give a NUM greater than 1. Note also that, for convenience, more than one COMPOSITION command can be given. Phaser will just add up the compositions given for the separate components.
 
==Running individual steps==
 
 
In special circumstances, you may need to run the steps of a structure solution separately, to gain more control over the progress of the run or to use specialized features. This can be illustrated by breaking up the solution of the beta-lactamase:BLIP complex.
 
 
Here is a job to automatically find the beta-lactamase component, which we would expect to be easier to find than BLIP (AUTO_beta.com in the tutorial directory).
 
 
  phaser << eof
 
  MODE MR_AUTO
 
  HKLIN beta_blip_P3221.mtz
 
  LABIN F = Fobs SIGF = Sigma
 
  ENSEMBLE beta PDBFILE beta.pdb IDENTITY 1.0
 
  COMPOSITION PROTEIN SEQUENCE beta.seq NUM 1
 
  COMPOSITION PROTEIN SEQUENCE blip.seq NUM 1
 
  SEARCH ENSEMBLE beta NUM 1
 
  ROOT AUTO_beta
 
  eof
 
 
Compared to the fully automated job searching for both components, the only important difference is the removal of the second SEARCH command. We could have defined the ENSEMBLE for blip, but we aren't using it in this job so it isn't necessary. Note that both COMPOSITION commands are still needed so that Phaser knows the fraction of the structure specified by beta!
 
 
Now we can use the information from the beta-lactamase solution in carrying out a rotation search for the BLIP component.
 
 
  phaser << eof
 
  MODE MR_FRF
 
  HKLIN beta_blip_P3221.mtz
 
  LABIN F = Fobs SIGF = Sigma
 
  ENSEMBLE beta PDBFILE beta.pdb IDENTITY 1.0
 
  ENSEMBLE blip PDBFILE blip.pdb IDENTITY 1.0
 
  COMPOSITION PROTEIN SEQUENCE beta.seq NUM 1
 
  COMPOSITION PROTEIN SEQUENCE blip.seq NUM 1
 
  SOLUTION 6DIM ENSEMBLE beta EULER 199.95 41.50 184.08 FRAC -0.4974 -0.1588 -0.2808
 
  SEARCH ENSEMBLE blip
 
  ROOT ROT_blip_fixbeta
 
  eof
 
 
Note that the MODE is now MR_FRF (Fast Rotation Function). The SOLUTION 6DIM command gives information about the solution for beta that is contained in the output file AUTO_beta.sol from running AUTO_beta.com. Take a look at AUTO_beta.sol, if you ran that job. Notice that it specifies the space group (important if we had tested both possibilities, P3121 and P3221). The SOLU SET command can be used to separate different potential solutions, each of which can be used as the start of searches for further molecules, but in this case there is only one.
 
 
Instead of copying the information from AUTO_beta.sol, it is easier to just include it using the @ command. @ is a Phaser preprocessor command that allows you to read in external files and use the contents as if they were explicitly included in the script file. The script is ROT_blip_fixbeta.com in the tutorial directory.
 
 
  phaser << eof
 
  MODE MR_FRF
 
  HKLIN beta_blip_P3221.mtz
 
  LABIN F = Fobs SIGF = Sigma
 
  ENSEMBLE beta PDBFILE beta.pdb IDENTITY 1.0
 
  ENSEMBLE blip PDBFILE blip.pdb IDENTITY 1.0
 
  COMPOSITION PROTEIN SEQUENCE beta.seq NUM 1
 
  COMPOSITION PROTEIN SEQUENCE blip.seq NUM 1
 
  @AUTO_beta.sol
 
  SEARCH ENSEMBLE blip
 
  ROOT ROT_blip_fixbeta
 
  eof
 
 
Look at the file ROT_blip_fixbeta.rlist produced by running this job ("source ROT_blip_fixbeta.com" in the tutorial directory). This file contains the rotation peaks (SOLU TRIAL commands) as well as the fixed beta-lactamase solution (SOLU 6DIM command). We can include this file in a job to run a translation search, still fixing the known beta-lactamase solution.
 
 
  phaser << eof
 
  MODE MR_FTF
 
  HKLIN beta_blip_P3221.mtz
 
  LABIN F = Fobs SIGF = Sigma
 
  ENSEMBLE beta PDBFILE beta.pdb IDENTITY 1.0
 
  ENSEMBLE blip PDBFILE blip.pdb IDENTITY 1.0
 
  COMPOSITION PROTEIN SEQUENCE beta.seq NUM 1
 
  COMPOSITION PROTEIN SEQUENCE blip.seq NUM 1
 
  @ROT_blip_fixbeta.rlist
 
  ROOT TRA_blip_fixbeta
 
  eof
 
 
What has changed?
 
 
* The MODE is now MR_FTF (Molecular Replacement - Fast Translation Function) instead of MR_FRF
 
* The orientations from the rotation search have been included using the @ command
 
* The SEARCH keyword has disappeared
 
 
Ok, that's all there is to it, so run this script (TRA_blip_fixbeta.com) and see what output you get.
 
 
Now that you have an introduction to some of the most common commands used in Phaser, you could look at the full documentation to get an idea of the other things you can do.
 
  
  
 
[[Category:Tutorial]]
 
[[Category:Tutorial]]

Latest revision as of 13:58, 6 September 2016

Sample Script

Let us look at a very simple PHASER script (AUTO_toxd1.com in the distributed tutorial files):

 phaser << eof
 MODE MR_AUTO
 HKLIN toxd.mtz
 LABIN F = FTOXD3 SIGF = SIGFTOXD3 #optional from Phaser-2.7.12
 ENSEMBLE toxd PDBFILE 1D0D_B.pdb IDENTITY 0.364
 COMPOSITION PROTEIN SEQUENCE toxd.seq NUM 1
 SEARCH ENSEMBLE toxd NUM 1
 ROOT AUTO_toxd1
 eof

The words in bold are Phaser keywords. Only the first 4 characters are significant so it does not matter whether you write ENSEMBLE, ENSEMB or ENSE.

Let us examine the contents of this script line by line. The first line:

 phaser << eof

tells us what to run. phaser is the Phaser executable which should be in your PATH for this to work. If you were to run Phaser it wouldn't actually do anything straight away but ask for user input (try it). So what we need to do next is to feed it some information before we get anything out of it. This is what the script job is for. The last part of the first line says: Feed in the subsequent lines until you hit eof (at the end of the script).

Let us move on to the next line

 MODE MR_AUTO

MODE is a Phaser keyword and tells it what kind of job we want to run. Phaser can be used for many different jobs, so it needs to know what it is being used for. In this case MR_AUTO stands for Molecular Replacement - AUTOmatic. Other possibilities are MR_FRF or MR_FTF for example. Most molecular replacement problems can be solved with the AUTO mode, but very specialized jobs can be run by using the other keywords individually.

The next line in the script:

 HKLIN toxd.mtz

specifies where our data are coming from. The HKL data are found in an MTZ file located in the directory you run the script from. MTZ files can store a lot of data so we need to tell Phaser which part of the file we need. The next line in the script:

 LABIN F = FTOXD3 SIGF = SIGFTOXD3

specifies which columns the structure factor amplitudes and their standard deviations come from. Note that, when intensities and their standard deviations are available (which is not the case for this structure), it is preferable to use intensities instead of amplitudes. These are provided through a LABIN command as well, but with I and SIGI instead of F and SIGF. From Phaser-2.7.12, if only one I column is present in the mtz file, this will be used else if only one F column is present in the mtz file, this will be used.

Next we need to specify a model that we will use for the molecular replacement. The line:

 ENSEMBLE toxd PDBFILE 1D0D_B.pdb IDENTITY 0.364

tells us the PDB File for the model and the sequence identity between this model and the corresponding protein in our crystal. The sequence identity is needed so phaser can estimate the RMS error in the model. Ignore the keyword ENSEMBLE for the time being. It will be discussed in more detail later. For now, just think of it as a handle to the model.

The next line:

 COMPOSITION PROTEIN SEQUENCE toxd.seq NUM 1

tells us the composition in the Asymmetric Unit. In our case it is protein with a sequence given in the file toxd.seq, and there is one molecule in the asymmetric unit. We could have specified the composition in a number of other ways, for instance by saying that the molecular weight of the protein is 7139.

Now all we need to do is to tell Phaser which model to search with. In our case we only have one model so it is trivial:

 SEARCH ENSEMBLE toxd NUM 1

The subkey "NUM 1" is the default, but we could have asked Phaser to search for more than one copy. Finally we tell PHASER where to put all the output files. The line:

 ROOT AUTO_toxd1

tells it to put everything in the tutorial directory and that the start of the output file names will be AUTO_toxd1, so you will get a bunch of files called AUTO_toxd1.log, AUTO_toxd1.sum, etc.

Ok, we are done. Let us run it! If you now type

 source AUTO_toxd1.com

in the tutorial directory, it should all run. After it has finished you will notice a number of files that have been created. The best file to look at to get an overview of the job is the summary file, AUTO_toxd1.sum in this case. In that file you will see the progress of all the steps in an automatic Phaser run: correction for anisotropy, cell content analysis, rotation search, rescoring rotation peaks, translation searches, rescoring translations, packing, refinement, and generation of output PDB and MTZ files.

Adding to the Script

Let us modify our script file a little. Use your favourite editor and, after the ENSEMBLE command, add the line:

 ENSEMBLE toxd PDBFILE 1BIK_on_1D0D.pdb IDENTITY 0.377

To avoid overwriting your old files, change the file root to AUTO_toxd2. Your script job will look like this (AUTO_toxd2.com in the tutorial directory):

 phaser << eof
 MODE MR_AUTO
 HKLIN toxd.mtz
 LABIN F = FTOXD3 SIGF = SIGFTOXD3
 ENSEMBLE toxd PDBFILE 1D0D_B.pdb IDENTITY 0.364
 ENSEMBLE toxd PDBFILE 1BIK_on_1D0D.pdb IDENTITY 0.377
 COMPOSITION PROTEIN SEQUENCE toxd.seq NUM 1
 SEARCH ENSEMBLE toxd NUM 1
 ROOT AUTO_toxd2
 eof

What have we done? We have now specified a second PDB File to be added to our model. But why do we give it the same handle? Remember I promised to talk in more detail about the ENSEMBLE keyword? Well here we go: What ENSEMBLE really does is that it takes all the PDB files and merges them together into an averaged model. See, neither 1D0D_B nor 1BIK are very good on their own but we can hope that if we put the two together into one single model that their identical features emphasize each other and that their dissimilar parts will be weighted down. This is exactly what the ENSEMBLE is for. It takes a number of PDB files and combines them using their sequence identity (or RMS error) to compute weighting factors.

Of course, for this to make any sense the models in the PDB files have to have the same relative orientation. In the distributed files, you will find a file 1BIK.pdb, but not the file 1BIK_on_1D0D.pdb. You will have to generate this by using another program to superimpose 1BIK on 1D0D. Probably the most convenient is to use the SSM superpose option in coot. It is VERY important to view your PDB files together in a graphics program (like O, PyMol, xfit or coot) before you attempt to use them in this way.

An alternative to this manual approach is to use the Ensembler program, which will superimpose the models and place them into a single merged PDB file.

In our previous example we only had one single PDB file so the ENSEMBLE keyword didn't really mean a lot.

Searching for more than one molecule

The following job illustrates a more difficult molecular replacement problem, searching for the two components of a complex between beta-lactamase and the beta-lactamase inhibitor protein (BLIP) (AUTO_beta_blip.com in the tutorial directory) .

 phaser << eof
 MODE MR_AUTO
 HKLIN beta_blip_P3221.mtz
 LABIN F = Fobs SIGF = Sigma
 ENSEMBLE beta PDBFILE beta.pdb IDENTITY 1.0
 ENSEMBLE blip PDBFILE blip.pdb IDENTITY 1.0
 COMPOSITION PROTEIN SEQUENCE beta.seq NUM 1
 COMPOSITION PROTEIN SEQUENCE blip.seq NUM 1
 SEARCH ENSEMBLE beta NUM 1
 SEARCH ENSEMBLE blip NUM 1
 ROOT AUTO_beta_blip
 eof

Here we define a separate ENSEMBLE for each separate rigid body that we will be looking for, and we give separate SEARCH commands for each one we wish to look for in this job. Regardless of the order in which we specify the SEARCHes, by default Phaser will search first for the component expected to be easier to find. In this case, this is the bigger molecule, beta. Once beta has been placed, the information from the fixed beta component will be used in looking for the smaller (and harder to locate) BLIP component.

Note the NUM subkey of the SEARCH command. If we were looking for more than one copy, we could give a NUM greater than 1. Note also that, for convenience, more than one COMPOSITION command can be given. Phaser will just add up the compositions given for the separate components.