An old public domain version is available at: ftp://ftp.ebi.ac.uk/pub/databases/transfac/transfac32.tar.Z
The 'site.dat' data file from TRANSFAC contains information on individual (putatively) regulatory protein binding sites. It has been divided into the following taxonomic groups.
The program tfscan takes a sequence and the name of one of these taxonomic groups and does a fast match of the TRANSFAC sequences against the input sequence (optionally allowing mismatches).
The results is a list of the positions which match the binding sites in the TRANSFAC SITE database.
Because the binding sites are so small, there will be many spurious (false positive) matches.
|
The output consists of a title line then 5 columns separated by whitespace.
The first column is the identifier of the entry.
The second column is the Accession Number of the entry.
The third and fourth columns are the start and end positions of the match in your input sequence.
The fifth column is the sequence of the region where a match has been found.
Binding factor information, where available, is given at the end of the matches for each matching entry.
Your EMBOSS administrator will have to run the EMBOSS program tfextract in order to set these files up from the TRANSFAC distribution files.
This means that you should contact your EMBOSS administrator and ask them to run the tfextract program to set up the TRANSFAC data for EMBOSS.
Your EMBOSS administrator will have to run the EMBOSS program tfextract in order to set up the data files from the TRANSFAC distribution files.