A local alignment searches for regions of local similarity between two sequences and need not include the entire length of the sequences. Local alignment methods are very useful for scanning databases or other circumsatnces when you wish to find matches between small regions of sequences, for example between protein domains.
Dynamic programming methods ensure the optimal local alignment by exploring all possible alignments and choosing the best. It does this by reading in a scoring matrix that contains values for every possible residue or nucleotide match. water finds an alignment with the maximum possible score where the score of an alignment is equal to the sum of the matches taken from the scoring matrix.
An important problem is the treatment of gaps, i.e., spaces inserted to optimise the alignment score. A penalty is subtracted from the score for each gap opened (the 'gap open' penalty) and a penalty is subtracted from the score for the total number of gap spaces multiplied by a cost (the 'gap extension' penalty).
Typically, the cost of extending a gap is set to be 5-10 times lower than the cost for opening a gap.
There are two ways to compute a penalty for a gap of n positions :
gap opening penalty + (n - 1) * gap extension penalty gap penalty + n * gap length penalty
The first way is used by EMBOSS and WU-BLAST
The second way is used by NCBI-BLAST, GCG, Staden and CLUSTAL.
Fasta used it for a long time the first way, but Prof. Pearson decided
recently to shift to the second.
The two methods are basically equivalent.
The Smith-Waterman algorithm contains no negative scores in the path matrix it creates. The algorithm starts the alignment at the highest path matrix score and works backwards until a cell contains zero.
See the Reference Smith et al. for details.
|
The Identity: is the percentage of identical matches between the two sequences over the reported aligned region (including any gaps in the length).
The Similarity: is the percentage of matches between the two sequences over the reported aligned region (including any gaps in the length).
water is for aligning the best matching subsequences of two sequences. It does not necessarily align whole sequences against each other; you should use needle if you wish to align closely related sequences along their whole lengths.
A true Smith Waterman implementation like water needs memory proportional to the product of the sequence lengths. For two sequences of length 10,000,000 and 1,000 it therefore needs memory proportional to 10,000,000,000 characters. Two arrays of this size are produced, one of ints and one of floats so multiply that figure by 8 to get the memory usage in bytes. That doesn't include other overheads. Therefore only use water and needle for accurate alignment of reasonably short sequences.
If you run out of memory, try using supermatcher or matcher.
Uncaught exception Assertion failed raised at ajmem.c:xxx
Probably means you have run out of memory. Try using supermatcher or matcher if this happens.
matcher is a local alignment program that gives as good an alignment as
water
but it uses far less memory. However,water
runs twice as fast as matcher.supermatcher is designed for local alignments of very large sequences. It gives good results as long as there is not significant amounts of insertion or deletion in the alignment.
supermatcher Finds a match of a large sequence against one or more sequences matcher Finds the best local alignments between two sequences