seqmatchall
Function
Description
This takes a set of sequences and does an all-against-all pairwise
comparison of words (fragments of the sequences of a specified fixed
size) in the sequences, finding regions of identity between any two
sequences.
The larger the specified word size, the faster the comparison will
proceed. Regions whose stretches of identity are shorter than the word
size will be missed. You should therefore choose a word size that is
small enough to find those regions of similarity you are interested in
within a reasonable time-frame.
Usage
Command line arguments
Input file format
seqmatchall reads a set of sequence USAs.
The sequences must be either all protein or all nucleic acid.
Output file format
ECLAC (the complete E.coli lac operon) matches ECLACI ECLACZ ECLACY
and ECLACA (the individual genes), and there is a short overlap
between ECLACY and the flanking genes ECLACZ and ECLACA
The output is a list of regions of identity in pairs of sequences, each
consisting of one line with 7 columns of data separated by TABs or space
characters.
The columns of data consist of:
- The length of the region of identity.
- The start position in sequence 1.
- The end position in sequence 1.
- The name of sequence 1.
- The start position in sequence 2.
- The end position in sequence 2.
- The name of sequence 2.
Data files
None.
Notes
The larger the word size, the faster the comparisons will proceed, but
regions of identitly smaller than the word size will not be reported.
References
None.
Warnings
None.
Diagnostic Error Messages
None.
Exit status
It exits with a status of 0.
Known bugs
None.
polydot will give a graphical view of the
same matches.
Author(s)
History
Target users
Comments