Date help created: 31 Oct 1996 Date last updated: 22 Jun 2001'connect' takes a shift file and a crosspeak file and matches the crosspeaks to one or more pairs of shifts.
To run the program type
connect <connect script file>
The program is intended to be used in conjunction with XPLOR, Per Kraulis' Ansig, and rdb scripts written by Andy Raine.
There must be no more than one key word per line in the script file.
Below <...> represents an argument for a key word and [...] represents a key word or argument that is optional.
The syntax for the key words are
input_par <par file of spectrum> input_shift <input shift file> input_crosspeak <input crosspeak file> [ output_crosspeak <output crosspeak file> ] [ output_match <output match file> ] [ output_xplor <output XPLOR file> ] [ output_nilges <output Nilges-style XPLOR file> ] [ output_null <output null matches file> ] columns <first column> [ <second column> ] intensity_dist <intensity> <distance> [ intensity_dist2 <intensity> <distance> <distance_minus> ] [ intensity_dist3 <intensity> <distance> <distance_minus> <distance_plus> ] [ exclude <column> <spectral width> <tolerance> ] [ residues <columns> <residue1> <residue2> [<atom_names>]] [ spectral_width <column> <spectral width> ] [ split_output ]
At least one of output_match, output_xplor or output_nilges must occur. The output_crosspeak file contains a list of crosspeaks that have not been matched.
All shifts are aliased according to the specified spectral width.
A description of the key words may be obtained by typing
connect help <key word>
A description of the format of the input shift file may be obtained by typing
connect help shift_format
A description of the format of the input and output crosspeak file may be obtained by typing
connect help crosspeak_format
A description of the format of the output match file may be obtained by typing
connect help match_format
A description of the format of the output XPLOR file may be obtained by typing
connect help xplor_format
A description of the format of the output Nilges-style XPLOR file may be obtained by typing
connect help nilges_format
The input shift file for the program has an ascii tab-separated format, with two header lines followed by one line (record; row) per shift data. The first header line contains the column titles. The second header line contains an 'N' or an 'S' in each column, consistent with rdb format.
Each record has data for the light atom (hydrogen) and the corresponding bonded heavy atom (anything other than hydrogen, e.g. carbon, nitrogen, oxygen or sulfur).
The first column contains the residue name of the amino acid, the second column contains the residue number of the amino acid, the third column contains the light atom name, the fourth column contains the light atom shift (in ppm), the fifth column contains the heavy atom atom name, the sixth column contains the heavy atom shift (in ppm), the seventh column contains the light atom tolerance (in ppm), and the eighth column contains the heavy atom tolerance (in ppm).
A shift of <= -99 is considered to be unknown.
The tolerances specify how close a crosspeak shift value must be to the specified atom shift in order for there to be a match.
A given atom is allowed to have more than one entry in the file. If so, they must be consecutive rows and if for a given peak more than one of these entries matches then the atom is only output once but the match counts reported include all entries matched.
The input and output crosspeak files for the program have an ascii tab-separated format, with two header lines followed by one line (record; row) per crosspeak. The first header line contains the column titles. The second header line contains an 'N' or an 'S' in each column, consistent with rdb format.
The records first have a set of data for each dimension, and then a dimension-independent set.
For each dimension (of the spectrum) there are five columns. The first column contains the residue name, the second column contains the residue number, the third column contains the atom name, the fourth column contains the atom type, and the fifth column contains the shift (in ppm). The first four of these columns can be null, but if not null (the residue and atom names are checked) this will be considered to be a valid assignment. The dimensions are ordered with the Ansig convention, which is opposite the Azara convention.
The dimension-independent set has four columns. The first column contains the unnormalized crosspeak intensity, the second column contains the spectrum name, the third column contains the crosspeak number, and the fourth column contains the normalized crosspeak intensity.
The output crosspeak file has two additional columns, giving the number of matches for the two sets of matched shifts.
The output match file for the program has an ascii tab-separated format, with two header lines followed by one line (record; row) per shift data. The first header line contains the column titles. The second header line contains an 'N' or an 'S' in each column, consistent with rdb format.
Each record has data for the two matched light atoms.
The first column contains the residue number of the first atom, the second column contains the residue name of the first atom, the third column contains the atom name of the first atom, the fourth column contains the residue number of the second atom, the fifth column contains the residue name of the second atom, the sixth column contains the atom name of the second atom, the seventh column contains the normalised intensity of the matched crosspeak, the eight column contains the crosspeak number of the matched crosspeak, and the ninth column contains an estimate of the implied distance between the light atoms.
The output XPLOR file for the program has a proprietary ascii format. See an XPLOR manual for more explanation.
The output Nilges-style XPLOR file for the program is a slight modification of the xplor_format.
input_par <par file of spectrum>
This specifies the par file name of the spectrum from which the crosspeaks were derived. The data file of the spectrum is not used. This should be the first key word in the script file.
input_shift <input shift file>
This specifies the input shift file. A description of the format may be obtained by typing
connect help shift_format
input_crosspeak <input crosspeak file>
This specifies the input crosspeak file. A description of the format may be obtained by typing
connect help crosspeak_format
output_crosspeak <output crosspeak file>
This specifies the output crosspeak file. This file contains those crosspeaks that have not been matched. A description of the format may be obtained by typing
connect help crosspeak_format
[ output_match <output match file> ]
This specifies the output match file. In content this file is equivalent to the output_xplor file and output_nilges file, and at least one of these three key words must appear. A description of the format may be obtained by typing
connect help match_format
[ output_xplor <output XPLOR file> ]
This specifies the output XPLOR file. In content this file is equivalent to the output_match file and output_nilges file, and at least one of these three key words must appear. A description of the format may be obtained by typing
connect help xplor_format
[ output_nilges <output Nilges-style XPLOR file> ]
This specifies the output Nilges-style XPLOR file. In content this file is equivalent to the output_match file and output_xplor file, and at least one of these three key words must appear. A description of the format may be obtained by typing
connect help nilges_format
[ output_null <output null matches file> ]
This specifies the output file for crosspeaks without any matches. The format is tab-separated with one header line followed by one line per crosspeak (without any matches), with the line containing the crosspeak number and spectrum.
columns <first column> [ <second column> ]
This specifies one or two columns, and the data in the corresponding column(s) in the input_crosspeak file are matched to the shifts in the input_shift file. The first column must be a light atom (hydrogen) and the second column, if it exists, must be the heavy atom to which thelight atom is bonded.
If the second column is negative the shift is not matched but the atom type is (for the column which is the negative of the specified value). The first column must be positive. This key word must appear twice.
intensity_dist <intensity> <distance>
This is used to specify how to convert the normalised intensity in the input_crosspeak file into a distance. This key word can appear more than once, and they must be listed in order of decreasing <intensity> (increasing <distance>). For a given crosspeak normalised intensity the first smaller <intensity> determines the <distance> to be used. If this key word and the other intensity_dist* key words do not appear then it is assumed that distance = intensity (this is useful for working with simulated data). In xplor terminology, this assumes distance_minus = distance and distance_plus = 0. To set these explicitly use either intensity_dist2 or intensity_dist3.
intensity_dist2 <intensity> <distance> <distance_minus>
This is used to specify how to convert the normalised intensity in the input_crosspeak file into a distance. This key word can appear more than once, and they must be listed in order of decreasing <intensity> (increasing <distance>). For a given crosspeak normalised intensity the first smaller <intensity> determines the <distance> to be used. If this key word and the other intensity_dist* key words do not appear then it is assumed that distance = intensity (this is useful for working with simulated data). In xplor terminology, this assumes distance_plus = 0. To set this explicitly use intensity_dist3.
intensity_dist3 <intensity> <distance> <distance_minus> <distance_plus>
This is used to specify how to convert the normalised intensity in the input_crosspeak file into a distance. This key word can appear more than once, and they must be listed in order of decreasing <intensity> (increasing <distance>). For a given crosspeak normalised intensity the first smaller <intensity> determines the <distance> to be used. If this key word and the other intensity_dist* key words do not appear then it is assumed that distance = intensity (this is useful for working with simulated data).
[ exclude <column> <spectral width> <tolerance> ]
This specifies that crosspeaks within <tolerance> of the <spectral width> for the given <column> are ignored. The <spectral width> is specified in ppm (not Hz).
[ residues <columns> <residue1> <residue2> [<atom_names>] ]
This specifies that only those shift matches for residues between <residue1> and <residue2> for the given <columns> (1 or 2) are output. If any atom_names are given then only those shift matches where one of the atom_names matches the given atom name. The atom_names can each have a * at the end which means that all the trailing characters match at that point. The default is that all matches are output. This can have multiple occurrences for a given choice of <columns> and if so then the shift matches for residues which lie in one of the specified residue ranges.
[ spectral_width <column> <spectral width> ]
This specifies that this is the <spectral width> for the given <column>. This key word should be used if the spectral width given in the par file is not correct for the aliasing. The <spectral width> is specified in ppm (not Hz).
[ split_output ]
This specifies that for output_xplor and output_nilges there should be two output files, one (suffix '0') for unassigned output and one (suffix '1') for assigned output. Azara help: connect / W. Boucher / azara@bioc.cam.ac.uk