The PROSITE database of protein families and domains
Release Notes

Release 19, April 2005


Table of contents

 1   Introduction
 2   Description of the changes made to PROSITE since release 18.0
 3   Forthcoming changes
 4   Status of the PROSITE files
 5   FTP access to PROSITE
 6   References
 7   Acknowledgments

(1)   Introduction

This release of PROSITE contains 1,344 documentation entries that describe 1322 patterns, 4 rules and 515 profiles/matrices
Since release 18.0, 528 entries have been updated, 144 documentation and 202 signatures have been added.

The following table shows the growth of the database since its creation in 1989.

Rel. Date Doc Entries Note
1.003/895860Only released in PC/Gene (Version 5.16)
2.003/89129132Only released in PC/Gene (Version 6.00)
3.005/89? 160 
4.010/89? 202Printed release (EMBL Biocomputing document)
5.004/90296 338 
6.011/90375433 
7.005/91441 508  
8.011/91530605  
9.006/91580 689  
10.012/92635 803  
11.010/93715 927  
12.006/94785 1029First release to include profiles
13.011/958891167 
14.012/979971335 
15.006/981014 1352 
16.0 07/9910341374 
17.012/0111081501 
18.007/0312001639 
19.004/0513441841 


(2)   Description of the changes made to PROSITE since release 18.0

2.1   New version of the PROSITE scan tools ps_scan.pl

For more details on new implementations see:

2.2   Modification of the method to scan repeats

We previously introduced a new approach to scan repeats, which uses two methods, one to recover extra repeat units in a given protein and one to identify new proteins that contain a given repeat (see user-manual for more details). Until now, both methods were simultanously applied. We modified this approach to apply only the first method in some particular cases. The profiles that use only the first method are tagged with 'R?' in the text field of the cut-off line (LEVEL=-1).

Example:

MA   /CUT_OFF: LEVEL=-1; SCORE=260; N_SCORE=5.5000; MODE=1; TEXT='R?';
2.3   Extension of the DR line length to 76 characters
Swiss-Prot has elongated the mnemonic code for the protein name from up to 4 characters to up to 5 characters. E.g. the mnemonic code for the meiotic recombination protein rec10 was 'RE10'. After the introduction of extended entry names it has been modified to the 5-letter code 'REC10'.

This Swiss-Prot modification introduced a change in the size of PROSITE DR lines. We thus have extended PROSITE DR lines to 76 characters.

2.4   Suppression of an obsolete note in the documentation

The following note has been removed from all documentation entries:

-Note: this  documentation  entry  is linked to both a signature pattern and a
 profile. As  the  profile is much more sensitive than the pattern, you should
 use it if you have access to the necessary software tools to do so.


(3)   Forthcoming changes

3.1   Introduction of a new line type for the post-processing retrieval of data

PROSITE profiles normally use two cut-off levels, a reliable cut-off (LEVEL=0) and a low confidence level cut-off (LEVEL=-1). The low level cut-off usually covers the twilight zone where few true positives, that cannot be separated from false positives, might be present. The output of pfsearch and pfscan programs indicate strong matches (level 0) with '! 'and weak matches (level -1) with '?'. This specific tagging in the match list can be used in post-processing, under some particular conditions, to validate some true positives present in the twilight zone or to eliminate some false positives detected with significant score.

We noticed that the sensitivity and specificity of PROSITE descriptors can be enhanced by taking into account some contextual information, like, the co-occurrence of other domains, the position in the protein, the taxonomic distribution, etc. Such information can be used to promote some weak matches or to shift down some irrelevant strong matches (see below).

We already started to introduce some contextual information for the detection of repeat units where a weak match can be promoted in some particular cases (see user-manual) and we want to generalize this approach to other contexts. To do so, we will introduce a new line type (PP) that will define conditions to retrieve matches in post processing.

The format of the line will be:

PP   promote:context;

where 'promote' indicates that a weak match is validated if a specific condition (described in the 'context' field) is fulfilled.

or
PP   shiftdown:context;

where 'shiftdown' indicates that a strong match must be removed if a specific condition (described in the 'context' field) is fulfilled.

A weak match can thus be promoted to strong match or a strong match can lose its status according to the co-occurrence of other features. The 'shiftdown' function can be applied to separate closely related families by using competing groups of profiles where only the profile matching with the higher score is retained.

3.2   Introduction of a new line type referring to the ProRule database

PROSITE is now complemented with a set of rules (ProRule) which are used to give extra meaningful information when a match with a PROSITE profile is detected. Each rule is triggered by a PROSITE profile and contains information linked to the domain or protein family covered by the profile. This information can be general, e.g. always associated with the domain or protein family, or conditional, depending on the presence of particular residues in functionally or structurally critical positions. The rule(s) associated with a profile will be cross-referenced in the profile in a new line type (PR line).

Example:

PR   PRU00001;

Some information given by the rules is already accessible to PROSITE users through our ScanProsite web page. The prorule.dat file containing all the rules will be available on our ftp site at the next release under the PROSITE copyright conditions.

Both new line types (PP and PR) will be introduced between the 3D and DO lines as shown in the following example:

3D   1BOR; 1CHC; 1E4U; 1FBV; 1G25; 1IYM; 1JM7; 1LDJ; 1LDK; 1RMD; 
PP   shiftdown:PS50016;
PR   PRU00175;
DO   PDOC00449;
3.3   Change in the format of references

The PROSITE documentation reference blocks will be completed with the PubMed identifier, the digital object identifier (DOI) and the title of the article.

The new format will be:

[ 1] Marshall R.D.
     Glycoproteins.
     Annu. Rev. Biochem. 41:673-702(1972).
     PubMed=4563441; DOI=10.1146/annurev.bi.41.070172.003325

The PubMed/DOI line will not be restricted to 78 characters like other documentation lines.

3.4    deletion of the CC FT_KEY line type

This line type was previously used to automatically generate Swiss-Prot FT lines from PROSITE profile matches. As we will move all annotation that can be generated from a PROSITE profile match in ProRule, this line will become obsolete and thus will be deleted at the next release when ProRule will be available.

(4)   Status of the PROSITE files

PROSITE is distributed with different data and documentation files. The following table lists the files that are currently available.

prosuser.txt User manual
profile.txt Description of the profile syntax
psrelnot.txt Release notes for the current release
prosite.dat Patterns, profiles and rules databases (updated weekly)
prosite.doc Documentation database for each pattern and profile (updated weekly)
prosite.lis List of documentation entries (updated weekly)
pautindex.txt Authors index (updated weekly)
psdelac.txt Deleted accession number index (updated weekly)
experts.txt List of on-line experts for PROSITE and Swiss-Prot (updated weekly)
jourlist.txt List of cited journals in PROSITE (updated weekly )
ps_98.txt Announcement concerning PROSITE

(5)   FTP access to PROSITE

PROSITE is available for download on the following anonymous FTP servers:

Organization Swiss Institute of Bioinformatics (SIB)
Address ftp.expasy.org
Directory /databases/prosite/


(6)   References

If you want to refer to the PROSITE database please cite:

Hulo N., Sigrist C.J.A., Le Saux V., Langendijk-Genevaux P.S., Bordoli L., Gattiker A., De Castro E., Bucher P., Bairoch A.
Recent improvements to the PROSITE database.
Nucleic Acids Res. 32:134-137(2004).

If you want to refer to the PROSITE methodology please cite:

Sigrist C.J.A., Cerutti L., Hulo N., Gattiker A., Falquet L., Pagni M., Bairoch A., Bucher P.
PROSITE: a documented database using patterns and profiles as motif descriptors.
Brief Bioinform. 3:265-274(2002).
PubMed: 12230035

If you want to refer to the stand-alone tool to scan PROSITE please cite:

Gattiker A., Gasteiger E. and Bairoch A.;
ScanProsite: a reference implementation of a PROSITE scanning tool
Applied Bioinformatics 1:107-108(2002)


(7)   Acknowledgments

This release of PROSITE has been prepared by:

Nicolas Hulo, Christian J.A. Sigrist, Edouard De Castro, Virginie Le Saux, Petra Langendijk-Genevaux and Amos Bairoch

(1) Swiss-Prot group, Swiss Institute of Bioinformatics.
(2) ISREC bioinformatics group, Swiss Institute of Bioinformatics.