Author: | Oliver Beckstein, Joshua Adelman |
---|---|
Year: | 2010–2011 |
Copyright: | GNU Public License v3 |
The module contains functions to fit a target structure to a reference structure. They use the fast QCP algorithm to calculate the root mean square distance (RMSD) between two coordinate sets [Theobald2005] and the rotation matrix R that minimizes the RMSD [Liu2010]. (Please cite these references when using this module.).
Typically, one selects a group of atoms (such as the C-alphas), calculates the RMSD and transformation matrix, and applys the transformation to the current frame of a trajectory to obtain the rotated structure. The alignto() and rms_fit_trj() functions can be used to do this for individual frames and trajectories respectively.
The RMS-fitting tutorial shows how to do the individual steps manually and explains the intermediate steps.
See also
The example uses files provided as part of the MDAnalysis test suite (in the variables PSF, DCD, and PDB_small). For all further examples execute first
>>> from MDAnalysis import *
>>> from MDAnalysis.analysis.align import *
>>> from MDAnalysis.tests.datafiles import PSF, DCD, PDB_small
In the simplest case, we can simply calculate the C-alpha RMSD between two structures, using rmsd():
>>> ref = Universe(PDB_small)
>>> mobile = Universe(PSF,DCD)
>>> rmsd(mobile.atoms.CA.coordinates(), ref.atoms.CA.coordinates())
18.858259026820352
Note that in this example translations have not been removed. In order to look at the pure rotation one needs to superimpose the centres of mass (or geometry) first:
>>> ref0 = ref.atoms.CA.coordinates() - ref.atoms.CA.centerOfMass()
>>> mobile0 = mobile.atoms.CA.coordinates() - mobile.atoms.CA.centerOfMass()
>>> rmsd(mobile0, ref0)
6.8093965864717951
The rotation matrix that superimposes mobile on ref while minimizing the CA-RMSD is obtained with the rotation_matrix() function
>>> R, rmsd = rotation_matrix(mobile0, ref0)
>>> print rmsd
6.8093965864717951
>>> print R
[[ 0.14514539 -0.27259113 0.95111876]
[ 0.88652593 0.46267112 -0.00268642]
[-0.43932289 0.84358136 0.30881368]]
Putting all this together one can superimpose all of mobile onto ref:
>>> mobile.atoms.translate(-mobile.atoms.CA.centerOfMass())
>>> mobile.atoms.rotate(R)
>>> mobile.atoms.translate(ref.atoms.CA.centerOfMass())
>>> mobile.atoms.write("mobile_on_ref.pdb")
To fit a single structure with alignto():
>>> ref = Universe(PSF, PDB_small)
>>> mobile = Universe(PSF, DCD) # we use the first frame
>>> alignto(mobile, ref, select="protein and name CA", mass_weighted=True)
This will change all coordinates in mobile so that the protein C-alpha atoms are optimally superimposed (translation and rotation).
To fit a whole trajectory to a reference structure with the rms_fit_trj() function:
>>> ref = Universe(PSF, PDB_small) # reference structure 1AKE
>>> trj = Universe(PSF, DCD) # trajectory of change 1AKE->4AKE
>>> rms_fit_trj(trj, ref, filename='rmsfit.dcd')
It is also possible to align two arbitrary structures by providing a mapping between atoms based on a sequence alignment. This allows fitting of structural homologs or wild type and mutant.
If a alignment was provided as “sequences.aln” one would first produce the appropriate MDAnalysis selections with the fasta2select() function and then feed the resulting dictionary to rms_fit_trj():
>>> seldict = fasta2select('sequences.aln')
>>> rms_fit_trj(trj, ref, filename='rmsfit.dcd', select=seldict)
(See the documentation of the functions for this advanced usage.)
Returns RMSD between two coordinate sets a and b.
a and b are arrays of the coordinates of N atoms of shape N*3 as generated by, e.g., MDAnalysis.core.AtomGroup.AtomGroup.coordinates().
The weights can be an array of length N, containing e.g. masses for a weighted RMSD calculation.
The function uses Douglas Theobald’s fast QCP algorithm [Theobald2005] to calculate the RMSD.
>>> u = Universe(PSF,DCD)
>>> bb = u.selectAtoms('backbone')
>>> A = bb.coordinates() # coordinates of first frame
>>> u.trajectory[-1] # forward to last frame
>>> B = bb.coordinates() # coordinates of last frame
>>> rmsd(A,B)
6.8342494129169804
Spatially align mobile to reference by doing a RMSD fit on select atoms.
The superposition is done in the following way:
The mobile and reference atom groups can be constructed so that they already match atom by atom. In this case, select should be set to “all” (or None) so that no further selections are applied to mobile and reference, therefore preserving the exact atom ordering (see Ordered selections).
Warning
The atom order for mobile and reference is only preserved when select is either “all” or None. In any other case, a new selection will be made that will sort the resulting AtomGroup by index and therefore destroy the correspondence between the two groups. It is safest not to mix ordered AtomGroups with selection strings.
Arguments : |
|
---|---|
Returns : | RMSD before and after spatial alignment. |
See also
For RMSD-fitting trajectories it is more efficient to use rms_fit_trj().
RMS-fit trajectory to a reference structure using a selection.
Arguments : |
|
---|
Both reference and trajectory must be MDAnalysis.Universe instances. If they contain a trajectory then it is used. The output file format is the same as the input traj.
The following functions are used by the other functions in this module. They are probably of more interest to developers than to normal users.
Return selection strings that will select equivalent residues.
The function aligns two sequences provided in a FASTA file and constructs MDAnalysis selection strings of the common atoms. When these two strings are applied to the two different proteins they will generate AtomGroups of the aligned residues.
fastafilename contains the two un-aligned sequences in FASTA format. The reference is assumed to be the first sequence, the target the second. ClustalW produces a pairwise alignment (which is written to a file with suffix .aln). The output contains atom selection strings that select the same atoms in the two structures.
Unless ref_offset and/or target_offset are specified, the resids in the structure are assumed to correspond to the positions in the un-aligned sequence, namely the first residue has resid == 1.
In more complicated cases (e.g. when the resid numbering in the structure/psf has gaps due to missing parts), simply provide the sequence of resids as they appear in the psf in ref_resids or target_resids, e.g.
target_resids = [a.resid for a in trj.selectAtoms('name CA')]
(This translation table is combined with any value for xxx_offset!)
Arguments : |
|
---|---|
Returns : |
|