This program was written for the case where a file containing several sequences is being used as a small database, but some of the sequences are no longer required and must be deleted from the file.
notseq splits the input sequences into those that you wish to keep and those you wish to exclude.
notseq takes a set of sequences as input together with a list of sequence names or accession numbers. It also takes the name of a new file to write the files that you want to keep into, and optionally the name of a file that will contain the files that you want excluded from the set.
notseq then reads in the input sequences. It outputs the ones that match one of the sequence names or acession numbers to the file of excluded sequences, and those that don't match are output to the file of sequences to be kept.
Note that the names of the sequences to be excluded are not standard EMBOSS USAs. Only the name or accession number shoudl be specified, not the database or file that these entries may occur in. These excluded sequence names will be matched against the names of the input sequences to see if there is a match. Wildcarded names may be specified by using '*'s. Any specified names of sequences to be excluded that are not found are simply ignored.
|
The names (or accession numbers) of the sequences to be excluded can be entered as a file of such names by specifying an '@' followed by the name of the file containing the sequence names. For example: '@names.dat'.
The names or accession numbers of the sequences to be excluded are not standard EMBOSS USAs. Only the ID name or accession number can be specified, you cannot specify the sequences as 'database:ID', 'file:accession', 'format::file', etc.