In fact, if does more than just this as it removes ANY non-alphabetic character from the input sequence, so as well as removing the gap-characters, it will remove such things as the '*' in protein sequences that indicates the position of a 'translated' STOP codon.
There are many different formats for storing sequences in files. Some sequence formats allow you to store aligned sequences, including the information on where gaps have been introduced to make the sequence align properly. This is indicated by using a special character to indicate that there is a gap at that position. Different sequence formats use different characters to indicate gaps. Some formats may use more than one type of character to indicate different types of gaps (e.g. gaps at the ends of the sequences, internal gaps, gaps introduced by a program or by a person editing the alignment, etc.) Some typicate characters used to indicate where gaps are may be: '.', '-' and '~'.
When EMBOSS programs read in a sequence that has gap-characters in, all gap characters are internally changed to '-' characters. i.e. EMBOSS only has one type of gap character. Thus any distinguishing characters for different gap types are reduced to a '-'. There is only one type of gap in EMBOSS.
degapseq removes any non-alphabetic character in the sequence, in effect this means that gaps and '*' characters are removed. The sequence is then written out.
|
The input sequence can be nucleic or protein.
The input sequence can be gapped or ungapped.