5.3.3. Water Bridge analysis — MDAnalysis.analysis.hbonds.WaterBridgeAnalysis
¶
Author: | Zhiyi Wu |
---|---|
Year: | 2017 |
Copyright: | GNU Public License v3 |
Maintainer: | Zhiyi Wu <zhiyi.wu@gtc.ox.ac.uk>, @xiki-tempula on GitHub |
Given a Universe
(simulation
trajectory with 1 or more frames) measure all water bridges for each
frame between selections 1 and 2.
Water bridge is defined as a bridging water which simultaneously forms
two hydrogen bonds with atoms from both selection 1 and selection 2.
A water bridge can form between two hydrogen bond acceptors.
e.g. -CO2-:···H−O−H···:-O2C-
A water bridge can also form between two hydrogen bond donors.
e.g. -NH···:O:···HN- (where O is the oxygen of a bridging water)
A hydrogen bond acceptor and another hydrogen bond donor can be bridged by a water.
e.g. -CO2-:···H−O:···HN- (where H−O is part of H−O−H)
The WaterBridgeAnalysis
class is modeled after the HydrogenBondAnalysis
.
The following keyword arguments are important to control the behavior of the water bridge analysis:
- water_selection (
resname SOL
): the selection string for the bridging water- donor-acceptor distance (Å): 3.0
- Angle cutoff (degrees): 120.0
- forcefield to switch between default values for different force fields
- donors and acceptors atom types (to add additional atom names)
5.3.3.1. Output¶
The results are a list of hydrogen bonds between the selection 1 or selection 2 and the bridging water.
Each list is formated similar to the HydrogenBondAnalysis.timeseries
and contains
- the identities of donor and acceptor heavy-atoms,
- the distance between the heavy atom acceptor atom and the hydrogen atom
- the angle donor-hydrogen-acceptor angle (180º is linear).
Water bridge data are returned per frame, which is stored in WaterBridgeAnalysis.timeseries
(In the following description, #
indicates comments that are not part of the output.):
results = [
[ # frame 1
# hbonds linking the selection 1 and selection 2 to the bridging
# water 1
[ # hbond 1 from selection 1 to the bridging water 1
<donor index (0-based)>,
<acceptor index (0-based)>, <donor string>, <acceptor string>,
<distance>, <angle>
],
[ # hbond 2 from selection 1 to the bridging water 1
<donor index (0-based)>,
<acceptor index (0-based)>, <donor string>, <acceptor string>,
<distance>, <angle>
],
[ # hbond 1 from selection 2 to the bridging water 1
<donor index (0-based)>,
<acceptor index (0-based)>, <donor string>, <acceptor string>,
<distance>, <angle>
],
[ # hbond 2 from selection 2 to the bridging water 1
<donor index (0-based)>,
<acceptor index (0-based)>, <donor string>, <acceptor string>,
<distance>, <angle>
],
# hbonds linking the selection 1 and selection 2 to the bridging
# water 2
[ # hbond 1 from selection 1 to the bridging water 2
<donor index (0-based)>,
<acceptor index (0-based)>, <donor string>, <acceptor string>,
<distance>, <angle>
],
[ # hbond 1 from selection 2 to the bridging water 2
<donor index (0-based)>,
<acceptor index (0-based)>, <donor string>, <acceptor string>,
<distance>, <angle>
],
....
],
[ # frame 2
[ ... ], [ ... ], ...
],
...
]
Using the WaterBridgeAnalysis.generate_table()
method one can reformat
the results as a flat “normalised” table that is easier to import into a
database or dataframe for further processing.
WaterBridgeAnalysis.save_table()
saves the table to a pickled file. The
table itself is a numpy.recarray
.
5.3.3.2. Detection of water bridges¶
Water bridges are recorded if a bridging water simultaneously forms two hydrogen bonds with selection 1 and selection 2.
Hydrogen bonds are detected as is described in HydrogenBondAnalysis
, see Detection of hydrogen bonds.
The lists of donor and acceptor names can be extended by providing lists of
atom names in the donors and acceptors keywords to
WaterBridgeAnalysis
. If the lists are entirely inappropriate
(e.g. when analysing simulations done with a force field that uses very
different atom names) then one should either use the value “other” for
forcefield to set no default values, or derive a new class and set the
default list oneself:
class WaterBridgeAnalysis_OtherFF(WaterBridgeAnalysis):
DEFAULT_DONORS = {"OtherFF": tuple(set([...]))}
DEFAULT_ACCEPTORS = {"OtherFF": tuple(set([...]))}
Then simply use the new class instead of the parent class and call it with forcefield = “OtherFF”. Please also consider contributing the list of heavy atom names to MDAnalysis.
5.3.3.3. How to perform WaterBridgeAnalysis¶
All water bridges between arginine and aspartic acid can be analysed with
import MDAnalysis
import MDAnalysis.analysis.hbonds
u = MDAnalysis.Universe('topology', 'trajectory')
w = MDAnalysis.analysis.hbonds.WaterBridgeAnalysis(u, 'resname ARG', 'resname ASP')
w.run()
The results are stored as the attribute
WaterBridgeAnalysis.timeseries
; see Output for the
format.
An example of using the timeseries
would be
detecting the percentage of time a certain water bridge exits.
Trajectory u
has two frames, where the first frame contains a water
bridge from the oxygen of the first arginine to the oxygen of the third
aspartate. No water bridge is detected in the second frame.
print(w.timeseries)
prints out (the comments are not part of the data structure but are added here for clarity):
[ # frame 1
# A water bridge SOL2 links O from ARG1 and ASP3
[[0,1,'ARG1:O', 'SOL2:HW1',3.0,180],
[2,3,'SOL2:HW2','ASP3:O', 3.0,180],
],
# frame 2
# No water bridge detected
[]
]
To calculate the percentage, we can iterate through w.timeseries
.
water_bridge_presence = []
for frame in w.timeseries:
if frame:
water_bridge_presence.append(True)
else:
water_bridge_presence.append(False)
p_bridge = float(sum(water_bridge_presence))/len(water_bridge_presence)
print("Fraction of time with water bridge present: {}".format(p_bridge))
In the example above, p_bridge
would become 0.5, i.e., for 50% of the
trajectory a water bridge was detected between the selected residues.
Alternatively, count_by_type()
can also be used to
generate the frequence of all water bridges in the simulation.
w.count_by_type()
Returns
[(0, 3, 'ARG', 1, 'O', 'ASP', 3, 'O', 0.5)]
For further data analysis, it is convenient to process the
timeseries
data into a normalized table with the
generate_table()
method, which creates a new data
structure WaterBridgeAnalysis.table
that contains one row for each
observation of a hydrogen bond:
w.generate_table()
This table can then be easily turned into, e.g., a pandas.DataFrame, and further analyzed:
import pandas as pd
df = pd.DataFrame.from_records(w.table)
5.3.3.4. Classes¶
-
class
MDAnalysis.analysis.hbonds.wbridge_analysis.
WaterBridgeAnalysis
(universe, selection1='protein', selection2='not resname SOL', water_selection='resname SOL', selection1_type='both', update_selection1=False, update_selection2=False, update_water_selection=True, filter_first=True, distance_type='hydrogen', distance=3.0, angle=120.0, forcefield='CHARMM27', donors=None, acceptors=None, start=None, stop=None, step=None, debug=None, verbose=None)[source]¶ Perform a water bridge analysis
The analysis of the trajectory is performed with the
WaterBridgeAnalysis.run()
method. The result is stored inWaterBridgeAnalysis.timeseries
. Seerun()
for the format.WaterBridgeAnalysis
uses the same default atom names asHydrogenBondAnalysis
, see Default heavy atom names for CHARMM27 force field.New in version 0.17.0.
Set up the calculation of water bridges between two selections in a universe.
The timeseries is accessible as the attribute
WaterBridgeAnalysis.timeseries
.If no hydrogen bonds are detected or if the initial check fails, look at the log output (enable with
MDAnalysis.start_logging()
and set verbose=True
). It is likely that the default names for donors and acceptors are not suitable (especially for non-standard ligands). In this case, either change the forcefield or use customized donors and/or acceptors.Parameters: - universe (Universe) – Universe object
- selection1 (str (optional)) – Selection string for first selection [‘protein’]
- selection2 (str (optional)) – Selection string for second selection [‘not resname SOL’] This string selects everything except water where water is assumed to have a residue name as SOL.
- water_selection (str (optional)) –
Selection string for bridging water selection [‘resname SOL’] The default selection assumes that the water molecules have residue name “SOL”. Change it to the appropriate selection for your specific force field.
However, in theory this selection can be anything which forms hydrogen bond with selection 1 and selection 2.
- selection1_type ({"donor", "acceptor", "both"} (optional)) – Selection 1 can be ‘donor’, ‘acceptor’ or ‘both’. Note that the value for selection1_type automatically determines how selection2 handles donors and acceptors: If selection1 contains ‘both’ then selection2 will also contain ‘both’. If selection1 is set to ‘donor’ then selection2 is ‘acceptor’ (and vice versa). [‘both’].
- update_selection1 (bool (optional)) – Update selection 1 at each frame. Setting to
True
if the selection is not static. Selection 1 is filtered first to speed up performance. Thus, setting toTrue
is recommended if contact surface between selection 1 and selection 2 is constantly changing. [False
] - update_selection2 (bool (optional)) – Similiar to update_selection1 but is acted upon selection 2.
[
False
] - update_water_selection (bool (optional)) –
Update selection of water at each frame. Setting to
False
is only recommended when the total amount of water molecules in the simulation are small and when water molecules remain static across the simulation.However, in normal simulations, only a tiny proportion of water is engaged in the formation of water bridge. It is recommended to update the water selection and set keyword filter_first to
True
so as to filter out water not residing between the two selections. [True
] - filter_first (bool (optional)) – Filter the water selection to only include water within 3 *
distance away from both selection 1 and selection 2.
Selection 1 and selection 2 are both filtered to only include atoms
3 * distance away from the other selection. [
True
] - distance (float (optional)) – Distance cutoff for hydrogen bonds; only interactions with a H-A distance <= distance (and the appropriate D-H-A angle, see angle) are recorded. (Note: distance_type can change this to the D-A distance.) [3.0]
- angle (float (optional)) – Angle cutoff for hydrogen bonds; an ideal H-bond has an angle of 180º. A hydrogen bond is only recorded if the D-H-A angle is >= angle. The default of 120º also finds fairly non-specific hydrogen interactions and a possibly better value is 150º. [120.0]
- forcefield ({"CHARMM27", "GLYCAM06", "other"} (optional)) – Name of the forcefield used. Switches between different
DEFAULT_DONORS
andDEFAULT_ACCEPTORS
values. [“CHARMM27”] - donors (sequence (optional)) – Extra H donor atom types (in addition to those in
DEFAULT_DONORS
), must be a sequence. - acceptors (sequence (optional)) – Extra H acceptor atom types (in addition to those in
DEFAULT_ACCEPTORS
), must be a sequence. - start (int (optional)) – starting frame-index for analysis,
None
is the first one, 0. start and stop are 0-based frame indices and are used to slice the trajectory (if supported) [None
] - stop (int (optional)) – last trajectory frame for analysis,
None
is the last one [None
] - step (int (optional)) – read every step between start (included) and stop (excluded),
None
selects 1. [None
] - distance_type ({"hydrogen", "heavy"} (optional)) – Measure hydrogen bond lengths between donor and acceptor heavy attoms (“heavy”) or between donor hydrogen and acceptor heavy atom (“hydrogen”). If using “heavy” then one should set the distance cutoff to a higher value such as 3.5 Å. [“hydrogen”]
- debug (bool (optional)) – If set to
True
enables per-frame debug logging. This is disabled by default because it generates a very large amount of output in the log file. (Note that a logger must have been started to see the output, e.g. usingMDAnalysis.start_logging()
.) - verbose (bool (optional)) – Toggle progress output. (Can also be given as keyword argument to
run()
.)
Notes
In order to speed up processing, atoms are filtered by a coarse distance criterion before a detailed hydrogen bonding analysis is performed (filter_first =
True
).If selection 1 and selection 2 are very mobile during the simulation and the contact surface is constantly changing (i.e. residues are moving farther than 3 x distance), you might consider setting the update_selection1 and update_selection2 keywords to
True
to ensure correctness.-
timesteps
¶ List of the times of each timestep. This can be used together with
timeseries
to find the specific time point of a water bridge existence, or seetable
.
-
table
¶ A normalised table of the data in
WaterBridgeAnalysis.timeseries
, generated byWaterBridgeAnalysis.generate_table()
. It is anumpy.recarray
with the following columns:- “time”
- “donor_index”
- “acceptor_index”
- “donor_resnm”
- “donor_resid”
- “donor_atom”
- “acceptor_resnm”
- “acceptor_resid”
- “acceptor_atom”
- “distance”
- “angle”
It takes up more space than
timeseries
but it is easier to analyze and to import into databases or dataframes.Example
For example, to create a pandas.DataFrame from
h.table
:import pandas as pd df = pd.DataFrame.from_records(w.table)
-
count_by_type
()[source]¶ Counts the frequency of water bridge of a specific type.
If one atom A from selection 1 is linked to atom B from selection 2 through a bridging water, an entity will be created and the proportion of time that this linkage exists in the whole simulation will be calculated.
Returns a
numpy.recarray
containing atom indices for A and B, residue names, residue numbers, atom names (for both A and B) and the fraction of the total time during which the water bridge was detected. This method returns None if methodWaterBridgeAnalysis.run()
was not executed first.Returns: counts – Each row of the array contains data to define a unique water bridge together with the frequency (fraction of the total time) that it has been observed. Return type: numpy.recarray
-
generate_table
()[source]¶ Generate a normalised table of the results.
The table is stored as a
numpy.recarray
in the attributetable
.See also
-
run
(**kwargs)[source]¶ Analyze trajectory and produce timeseries.
Stores the water bridge data per frame as
WaterBridgeAnalysis.timeseries
(see there for output format).Parameters: - verbose (bool (optional)) – toggle progress meter output
ProgressMeter
[True
] - debug (bool (optional)) – enable detailed logging of debugging information; this can create
very big log files so it is disabled (
False
) by default; setting debug toggles the debug status forWaterBridgeAnalysis
, namely the value ofWaterBridgeAnalysis.debug
.
See also
WaterBridgeAnalysis.generate_table()
- processing the data into a different format.
- verbose (bool (optional)) – toggle progress meter output
-
save_table
(filename='wbridge_table.pickle')[source]¶ Saves
table
to a pickled file.If
table
does not exist yet,generate_table()
is called first.Parameters: filename (str (optional)) – path to the filename Example
Load with
import cPickle table = cPickle.load(open(filename))
-
timeseries
¶ Time series of water bridges.
Due to the intrinsic complexity of water bridge problem, where several atoms \(n\) can be linked by the same water. A simple
atom 1 - water bridge - atom 2
mapping will create a very large list \(n!/(2(n-2)!)\) due to the rule of combination.Thus, the output is arranged based on each individual bridging water allowing the later reconstruction of the water network. The hydrogen bonds from selection 1 to the first bridging water is followed by the hydrogen bonds from selection 2 to the same bridging water. After that, hydrogen bonds from selection 1 to the second bridging water is succeeded by hydrogen bonds from selection 2 to the same second bridging water. An example of the output is given in Output.
Note
To find an acceptor atom in
Universe.atoms
by index one would useu.atoms[acceptor_index]
.The
timeseries
is a managed attribute and it is generated from the underlying data in_timeseries
every time the attribute is accessed. It is therefore costly to call and iftimeseries
is needed repeatedly it is recommended that you assign to a variable:w = WaterBridgeAnalysis(u) w.run() timeseries = w.timeseries
See also
table
- structured array of the data