![]() |
The PROSITE database of protein families and domains Release Notes Release 19, April 2005 |
Table of contents |
---|
(1) Introduction |
---|
The following table shows the growth of the database since its creation in 1989.
Rel. | Date | Doc | Entries | Note |
---|---|---|---|---|
1.0 | 03/89 | 58 | 60 | Only released in PC/Gene (Version 5.16) |
2.0 | 03/89 | 129 | 132 | Only released in PC/Gene (Version 6.00) |
3.0 | 05/89 | ? | 160 | |
4.0 | 10/89 | ? | 202 | Printed release (EMBL Biocomputing document) |
5.0 | 04/90 | 296 | 338 | |
6.0 | 11/90 | 375 | 433 | |
7.0 | 05/91 | 441 | 508 | |
8.0 | 11/91 | 530 | 605 | |
9.0 | 06/91 | 580 | 689 | |
10.0 | 12/92 | 635 | 803 | |
11.0 | 10/93 | 715 | 927 | |
12.0 | 06/94 | 785 | 1029 | First release to include profiles |
13.0 | 11/95 | 889 | 1167 | |
14.0 | 12/97 | 997 | 1335 | |
15.0 | 06/98 | 1014 | 1352 | |
16.0 | 07/99 | 1034 | 1374 | |
17.0 | 12/01 | 1108 | 1501 | |
18.0 | 07/03 | 1200 | 1639 | |
19.0 | 04/05 | 1344 | 1841 |
(2) Description of the changes made to PROSITE since release 18.0 |
---|
For more details on new implementations see:
We previously introduced a new approach to scan repeats, which uses two methods, one to recover extra repeat units in a given protein and one to identify new proteins that contain a given repeat (see user-manual for more details). Until now, both methods were simultanously applied. We modified this approach to apply only the first method in some particular cases. The profiles that use only the first method are tagged with 'R?' in the text field of the cut-off line (LEVEL=-1).
Example:
MA /CUT_OFF: LEVEL=-1; SCORE=260; N_SCORE=5.5000; MODE=1; TEXT='R?';
This Swiss-Prot modification introduced a change in the size of PROSITE DR lines. We thus have extended PROSITE DR lines to 76 characters.
The following note has been removed from all documentation entries:
-Note: this documentation entry is linked to both a signature pattern and a profile. As the profile is much more sensitive than the pattern, you should use it if you have access to the necessary software tools to do so.
(3) Forthcoming changes |
---|
PROSITE profiles normally use two cut-off levels, a reliable cut-off (LEVEL=0) and a low confidence level cut-off (LEVEL=-1). The low level cut-off usually covers the twilight zone where few true positives, that cannot be separated from false positives, might be present. The output of pfsearch and pfscan programs indicate strong matches (level 0) with '! 'and weak matches (level -1) with '?'. This specific tagging in the match list can be used in post-processing, under some particular conditions, to validate some true positives present in the twilight zone or to eliminate some false positives detected with significant score.
We noticed that the sensitivity and specificity of PROSITE descriptors can be enhanced by taking into account some contextual information, like, the co-occurrence of other domains, the position in the protein, the taxonomic distribution, etc. Such information can be used to promote some weak matches or to shift down some irrelevant strong matches (see below).
We already started to introduce some contextual information for the detection of repeat units where a weak match can be promoted in some particular cases (see user-manual) and we want to generalize this approach to other contexts. To do so, we will introduce a new line type (PP) that will define conditions to retrieve matches in post processing.
The format of the line will be:
PP promote:context;
where 'promote' indicates that a weak match is validated if a specific condition (described in the 'context' field) is fulfilled.
orPP shiftdown:context;
where 'shiftdown' indicates that a strong match must be removed if a specific condition (described in the 'context' field) is fulfilled.
A weak match can thus be promoted to strong match or a strong match can lose its status according to the co-occurrence of other features. The 'shiftdown' function can be applied to separate closely related families by using competing groups of profiles where only the profile matching with the higher score is retained.
PROSITE is now complemented with a set of rules (ProRule) which are used to give extra meaningful information when a match with a PROSITE profile is detected. Each rule is triggered by a PROSITE profile and contains information linked to the domain or protein family covered by the profile. This information can be general, e.g. always associated with the domain or protein family, or conditional, depending on the presence of particular residues in functionally or structurally critical positions. The rule(s) associated with a profile will be cross-referenced in the profile in a new line type (PR line).
Example:
PR PRU00001;
Some information given by the rules is already accessible to PROSITE users through our ScanProsite web page. The prorule.dat file containing all the rules will be available on our ftp site at the next release under the PROSITE copyright conditions.
Both new line types (PP and PR) will be introduced between the 3D and DO lines as shown in the following example:
3D 1BOR; 1CHC; 1E4U; 1FBV; 1G25; 1IYM; 1JM7; 1LDJ; 1LDK; 1RMD; PP shiftdown:PS50016; PR PRU00175; DO PDOC00449;
The PROSITE documentation reference blocks will be completed with the PubMed identifier, the digital object identifier (DOI) and the title of the article.
The new format will be:
[ 1] Marshall R.D. Glycoproteins. Annu. Rev. Biochem. 41:673-702(1972). PubMed=4563441; DOI=10.1146/annurev.bi.41.070172.003325
The PubMed/DOI line will not be restricted to 78 characters like other documentation lines.
This line type was previously used to automatically generate Swiss-Prot FT lines from PROSITE profile matches. As we will move all annotation that can be generated from a PROSITE profile match in ProRule, this line will become obsolete and thus will be deleted at the next release when ProRule will be available.
(4) Status of the PROSITE files |
---|
prosuser.txt | User manual |
profile.txt | Description of the profile syntax |
psrelnot.txt | Release notes for the current release |
prosite.dat | Patterns, profiles and rules databases (updated weekly) |
prosite.doc | Documentation database for each pattern and profile (updated weekly) |
prosite.lis | List of documentation entries (updated weekly) |
pautindex.txt | Authors index (updated weekly) |
psdelac.txt | Deleted accession number index (updated weekly) |
experts.txt | List of on-line experts for PROSITE and Swiss-Prot (updated weekly) |
jourlist.txt | List of cited journals in PROSITE (updated weekly ) |
ps_98.txt | Announcement concerning PROSITE |
(5) FTP access to PROSITE |
---|
Organization | Swiss Institute of Bioinformatics (SIB) |
Address | ftp.expasy.org |
Directory | /databases/prosite/ |
(6) References |
---|
If you want to refer to the PROSITE database please cite:
If you want to refer to the PROSITE methodology please cite:
If you want to refer to the stand-alone tool to scan PROSITE please cite:
(7) Acknowledgments |
---|