Difference between revisions of "MR using keyword input"
|Line 1:||Line 1:|
Revision as of 22:06, 1 August 2012
Let us look at a very simple PHASER script (AUTO_toxd1.com in the distributed tutorial files):
phaser << eof MODE MR_AUTO HKLIN toxd.mtz LABIN F = FTOXD3 SIGF = SIGFTOXD3 ENSEMBLE toxd PDBFILE 1D0D_B.pdb IDENTITY 0.364 COMPOSITION PROTEIN SEQUENCE toxd.seq NUM 1 SEARCH ENSEMBLE toxd NUM 1 ROOT AUTO_toxd1 eof
The words in bold are Phaser keywords. Only the first 4 characters are significant so it does not matter whether you write ENSEMBLE, ENSEMB or ENSE.
Let us examine the contents of this script line by line. The first line:
phaser << eof
tells us what to run. phaser is the Phaser executable which should be in your PATH for this to work. If you were to run Phaser it wouldn't actually do anything straight away but ask for user input (try it). So what we need to do next is to feed it some information before we get anything out of it. This is what the script job is for. The last part of the first line says: Feed in the subsequent lines until you hit eof (at the end of the script).
Let us move on to the next line
MODE is a Phaser keyword and tells it what kind of job we want to run. Phaser can be used for many different jobs, so it needs to know what it is being used for. In this case MR_AUTO stands for Molecular Replacement - AUTOmatic. Other possibilities are MR_FRF or MR_FTF for example. Most molecular replacement problems can be solved with the AUTO mode, but very specialized jobs can be run by using the other keywords individually.
The next line in the script:
specifies where our data are coming from. The HKL data are found in an MTZ file located in the directory you run the script from. MTZ files can store a lot of data so we need to tell Phaser which part of the file we need. The next line in the script:
LABIN F = FTOXD3 SIGF = SIGFTOXD3
specifies which columns the structure factor amplitudes and their standard deviations come from.
Next we need to specify a model that we will use for the molecular replacement. The line:
ENSEMBLE toxd PDBFILE 1D0D_B.pdb IDENTITY 0.364
tells us the PDB File for the model and the sequence identity between this model and the corresponding protein in our crystal. The sequence identity is needed so phaser can estimate the RMS error in the model. Ignore the keyword ENSEMBLE for the time being. It will be discussed in more detail later. For now, just think of it as a handle to the model.
The next line:
COMPOSITION PROTEIN SEQUENCE toxd.seq NUM 1
tells us the composition in the Asymmetric Unit. In our case it is protein with a sequence given in the file toxd.seq, and there is one molecule in the asymmetric unit. We could have specified the composition in a number of other ways, for instance by saying that the molecular weight of the protein is 7139.
Now all we need to do is to tell Phaser which model to search with. In our case we only have one model so it is trivial:
SEARCH ENSEMBLE toxd NUM 1
The subkey "NUM 1" is the default, but we could have asked Phaser to search for more than one copy. Finally we tell PHASER where to put all the output files. The line:
tells it to put everything in the tutorial directory and that the start of the output file names will be AUTO_toxd1, so you will get a bunch of files called AUTO_toxd1.log, AUTO_toxd1.sum, etc.
Ok, we are done. Let us run it! If you now type
in the tutorial directory, it should all run. After it has finished you will notice a number of files that have been created. The best file to look at to get an overview of the job is the summary file, AUTO_toxd1.sum in this case. In that file you will see the progress of all the steps in an automatic Phaser run: correction for anisotropy, cell content analysis, rotation search, rescoring rotation peaks, translation searches, rescoring translations, packing, refinement, and generation of output PDB and MTZ files.
Adding to the Script
Let us modify our script file a little. Use your favourite editor and, after the ENSEMBLE command, add the line:
PDBFILE 1BIK_on_1D0D.pdb IDENTITY 0.377
with an ampersand also added at the end of the previous line. The ampersand means line continuation - you could leave out the ampersand and put the information about the new pdbfile on the same line. To avoid overwriting your old files, change the file root to AUTO_toxd2. Your script job will look like this (AUTO_toxd2.com in the tutorial directory):
phaser << eof MODE MR_AUTO HKLIN toxd.mtz LABIN F = FTOXD3 SIGF = SIGFTOXD3 ENSEMBLE toxd PDBFILE 1D0D_B.pdb IDENTITY 0.364 PDBFILE 1BIK_on_1D0D.pdb IDENTITY 0.377 COMPOSITION PROTEIN SEQUENCE toxd.seq NUM 1 SEARCH ENSEMBLE toxd NUM 1 ROOT AUTO_toxd2 eof
What have we done? We have now specified a second PDB File to be added to our model. But why do we give it the same handle? Remember I promised to talk in more detail about the ENSEMBLE keyword? Well here we go: What ENSEMBLE really does is that it takes all the PDB files and merges them together into an averaged model. See, neither 1D0D_B nor 1BIK are very good on their own but we can hope that if we put the two together into one single model that their identical features emphasize each other and that their dissimilar parts will be weighted down. This is exactly what the ENSEMBLE is for. It takes a number of PDB files and combines them using their sequence identity (or RMS error) to compute weighting factors.
Of course, for this to make any sense the models in the PDB files have to have the same relative orientation. In the distributed files, you will find a file 1BIK.pdb, but not the file 1BIK_on_1D0D.pdb. You will have to generate this by using another program to superimpose 1BIK on 1D0D. Probably the most convenient is to use the SSM superpose option in coot. It is VERY important to view your PDB files together in a graphics program (like O, PyMol, xfit or coot) before you attempt to use them in this way.
In our previous example we only had one single PDB file so the ENSEMBLE keyword didn't really mean a lot.
Searching for more than one molecule
The following job illustrates a more difficult molecular replacement problem, searching for the two components of a complex between beta-lactamase and the beta-lactamase inhibitor protein (BLIP) (AUTO_beta_blip.com in the tutorial directory) .
phaser << eof MODE MR_AUTO HKLIN beta_blip_P3221.mtz LABIN F = Fobs SIGF = Sigma ENSEMBLE beta PDBFILE beta.pdb IDENTITY 1.0 ENSEMBLE blip PDBFILE blip.pdb IDENTITY 1.0 COMPOSITION PROTEIN SEQUENCE beta.seq NUM 1 COMPOSITION PROTEIN SEQUENCE blip.seq NUM 1 SEARCH ENSEMBLE beta NUM 1 SEARCH ENSEMBLE blip NUM 1 ROOT AUTO_beta_blip eof
Here we define a separate ENSEMBLE for each separate rigid body that we will be looking for, and we give separate SEARCH commands for each one we wish to look for in this job. This job has been set up to look first for the bigger component, beta, which will be easier to find. Then the information from the fixed beta component will be used in looking for the smaller (and harder to locate) BLIP component.
Note the NUM subkey of the SEARCH command. If we were looking for more than one copy, we could give a NUM greater than 1. Note also that, for convenience, more than one COMPOSITION command can be given. Phaser will just add up the compositions given for the separate components.
Running individual steps
In special circumstances, you may need to run the steps of a structure solution separately, to gain more control over the progress of the run or to use specialized features. This can be illustrated by breaking up the solution of the beta-lactamase:BLIP complex.
Here is a job to automatically find the beta-lactamase component, which we would expect to be easier to find than BLIP (AUTO_beta.com in the tutorial directory).
phaser << eof MODE MR_AUTO HKLIN beta_blip_P3221.mtz LABIN F = Fobs SIGF = Sigma ENSEMBLE beta PDBFILE beta.pdb IDENTITY 1.0 COMPOSITION PROTEIN SEQUENCE beta.seq NUM 1 COMPOSITION PROTEIN SEQUENCE blip.seq NUM 1 SEARCH ENSEMBLE beta NUM 1 ROOT AUTO_beta eof
Compared to the fully automated job searching for both components, the only important difference is the removal of the second SEARCH command. We could have defined the ENSEMBLE for blip, but we aren't using it in this job so it isn't necessary. Note that both COMPOSITION commands are still needed so that Phaser knows the fraction of the structure specified by beta!
Now we can use the information from the beta-lactamase solution in carrying out a rotation search for the BLIP component.
phaser << eof MODE MR_FRF HKLIN beta_blip_P3221.mtz LABIN F = Fobs SIGF = Sigma ENSEMBLE beta PDBFILE beta.pdb IDENTITY 1.0 ENSEMBLE blip PDBFILE blip.pdb IDENTITY 1.0 COMPOSITION PROTEIN SEQUENCE beta.seq NUM 1 COMPOSITION PROTEIN SEQUENCE blip.seq NUM 1 SOLUTION 6DIM ENSEMBLE beta EULER 199.95 41.50 184.08 FRAC -0.4974 -0.1588 -0.2808 SEARCH ENSEMBLE blip ROOT ROT_blip_fixbeta eof
Note that the MODE is now MR_FRF (Fast Rotation Function). The SOLUTION 6DIM command gives information about the solution for beta that is contained in the output file AUTO_beta.sol from running AUTO_beta.com. Take a look at AUTO_beta.sol, if you ran that job. Notice that it specifies the space group (important if we had tested both possibilities, P3121 and P3221). The SOLU SET command can be used to separate different potential solutions, each of which can be used as the start of searches for further molecules, but in this case there is only one.
Instead of copying the information from AUTO_beta.sol, it is easier to just include it using the @ command. @ is a Phaser preprocessor command that allows you to read in external files and use the contents as if they were explicitly included in the script file. The script is ROT_blip_fixbeta.com in the tutorial directory.
phaser << eof MODE MR_FRF HKLIN beta_blip_P3221.mtz LABIN F = Fobs SIGF = Sigma ENSEMBLE beta PDBFILE beta.pdb IDENTITY 1.0 ENSEMBLE blip PDBFILE blip.pdb IDENTITY 1.0 COMPOSITION PROTEIN SEQUENCE beta.seq NUM 1 COMPOSITION PROTEIN SEQUENCE blip.seq NUM 1 @AUTO_beta.sol SEARCH ENSEMBLE blip ROOT ROT_blip_fixbeta eof
Look at the file ROT_blip_fixbeta.rlist produced by running this job ("source ROT_blip_fixbeta.com" in the tutorial directory). This file contains the rotation peaks (SOLU TRIAL commands) as well as the fixed beta-lactamase solution (SOLU 6DIM command). We can include this file in a job to run a translation search, still fixing the known beta-lactamase solution.
phaser << eof MODE MR_FTF HKLIN beta_blip_P3221.mtz LABIN F = Fobs SIGF = Sigma ENSEMBLE beta PDBFILE beta.pdb IDENTITY 1.0 ENSEMBLE blip PDBFILE blip.pdb IDENTITY 1.0 COMPOSITION PROTEIN SEQUENCE beta.seq NUM 1 COMPOSITION PROTEIN SEQUENCE blip.seq NUM 1 @ROT_blip_fixbeta.rlist ROOT TRA_blip_fixbeta eof
What has changed?
- The MODE is now MR_FTF (Molecular Replacement - Fast Translation Function) instead of MR_FRF
- The orientations from the rotation search have been included using the @ command
- The SEARCH keyword has disappeared
Ok, that's all there is to it, so run this script (TRA_blip_fixbeta.com) and see what output you get.
Now that you have an introduction to some of the most common commands used in Phaser, you could look at the full documentation to get an idea of the other things you can do.