extractfeat

Function

Description

extractfeat is a simple utility for extracting parts of a sequence that have been annotated as being a specific type of feature. These sub-sequences are writen to the output sequence file.

If the feature is annotated as being in the reverse sense of a nucleic acid sequence, then that feature's sub-sequence is reverse-complemented before being written out.

It is often useful to have some information on the context of the feature. extractfeat allows you to specify a number of bases or residues before and/or after the feature to write out.

If you are interested in extracting the sequence of the region around the start or end of the feature, then this can also be specified.

'joined' features can either be extracted as individual sequences, or as a single concatenated sequence if the '-join' qualifier is used.

Please remember that the output feature sequence is only as good as the annotation. If you rely upon other people's, or other program's annotation of features, then some of these will be incorrect.

Usage

Command line arguments


Input file format

extractfeat reads normal sequences with features.

Feature tables in Swissprot, EMBL, GFF, etc. format can be added using '-ufo featurefile' on the command line.

Output file format

The sequences of the specified features are written out.

The ID name of the sequence is formed from the original sequence name with the start and end positions of the feature appended to it. So if the feature came from a sequence with an ID name of 'XYZ' from positions 10 to 22, then the resulting ID name of the feature sequence will be 'XYZ_10_22'

The name of the type of feature is added to the start of the description of the sequence in brackets, e.g.: '[exon]'.

The sequence is written out as a normal sequence.

If the feature is in the reverse sense of a nucleic acid sequence, then it is reverse-complemented before being written.

Data files

None.

Notes

If a feature is specified as being a part of a different sequence entry in a database, then this feature is ignored.

If you are extracting 'joined' features and one of more of the component features is in a different sequence entry, then the whole joined feature is ignored.

References

None.

Warnings

None.

Diagnostic Error Messages

If the end position of the sequence to be written is less than the start position, then the warning message "Extraction region end less than start for feature type [start-end] in ID name" is written and no sequence is output.

Exit status

It always exits with status 0.

Known bugs

None.

Author(s)

History

Target users

Comments