UniProtKB/Swiss-Prot protein knowledgebase release 57.15 statistics
1. INTRODUCTION
Release 57.15 of 02-Mar-10 of UniProtKB/Swiss-Prot contains 515203 sequence entries,
comprising 181334896 amino acids abstracted from 187376 references.
463 sequences have been added since release 57.14, the sequence data of
84 existing entries has been updated and the annotations of
471856 entries have been revised.
Number of fragments: 8451
Number of additional sequences produced by alternative splicing, initiation or promoter usage, or ribosomal frameshifting: 28874
Protein existence (PE): entries %
1: Evidence at protein level 68618 13.3%
2: Evidence at transcript level 66497 12.9%
3: Inferred from homology 364268 70.7%
4: Predicted 14285 2.8%
5: Uncertain 1535 0.3%
The growth of the database is summarized below.
2. TAXONOMIC ORIGIN
Total number of species represented in this release of UniProtKB/Swiss-Prot: 12042
The first twenty species represent 107335 sequences: 20.8 % of the total
number of entries.
2.1 Table of the frequency of occurrence of species
Species represented 1x: 5233
2x: 1698
3x: 894
4x: 573
5x: 422
6x: 349
7x: 244
8x: 204
9x: 183
10x: 103
11- 20x: 583
21- 50x: 368
51-100x: 175
>100x: 1013
2.2 Table of the most represented species
------ --------- --------------------------------------------
Number Frequency Species
------ --------- --------------------------------------------
1 20265 Homo sapiens (Human)
2 16224 Mus musculus (Mouse)
3 8876 Arabidopsis thaliana (Mouse-ear cress)
4 7483 Rattus norvegicus (Rat)
5 6558 Saccharomyces cerevisiae (Baker's yeast)
6 5748 Bos taurus (Bovine)
7 4974 Schizosaccharomyces pombe (Fission yeast)
8 4368 Escherichia coli (strain K12)
9 4258 Bacillus subtilis
10 4137 Dictyostelium discoideum (Slime mold)
11 3284 Caenorhabditis elegans
12 3216 Xenopus laevis (African clawed frog)
13 3058 Drosophila melanogaster (Fruit fly)
14 2608 Danio rerio (Zebrafish) (Brachydanio rerio)
15 2369 Oryza sativa subsp. japonica (Rice)
16 2208 Pongo abelii (Sumatran orangutan)
17 2153 Gallus gallus (Chicken)
18 1993 Escherichia coli O157:H7
19 1782 Methanocaldococcus jannaschii (Methanococcus jannaschii)
20 1773 Haemophilus influenzae
21 1767 Salmonella typhimurium
22 1668 Escherichia coli O6
23 1666 Shigella flexneri
24 1561 Mycobacterium tuberculosis
25 1520 Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
26 1364 Sus scrofa (Pig)
27 1341 Salmonella typhi
28 1273 Pseudomonas aeruginosa
29 1213 Mycobacterium bovis
30 1159 Macaca fascicularis (Crab eating macaque) (Cynomolgus monkey)
31 1015 Synechocystis sp. (strain PCC 6803)
32 995 Yersinia pestis
33 991 Archaeoglobus fulgidus
34 940 Vibrio cholerae
35 929 Salmonella paratyphi A
36 923 Staphylococcus aureus (strain N315)
37 922 Staphylococcus aureus (strain Mu50 / ATCC 700699)
38 912 Rhizobium meliloti (Sinorhizobium meliloti)
39 909 Acanthamoeba polyphaga mimivirus (APMV)
40 896 Staphylococcus aureus (strain COL)
41 894 Staphylococcus aureus (strain MW2)
42 888 Staphylococcus aureus (strain MSSA476)
43 885 Staphylococcus aureus (strain MRSA252)
44 882 Oryctolagus cuniculus (Rabbit)
45 879 Escherichia coli O6:K15:H31 (strain 536 / UPEC)
46 879 Salmonella choleraesuis
47 869 Shigella sonnei (strain Ss046)
48 863 Yersinia pseudotuberculosis
49 835 Escherichia coli O9:H4 (strain HS)
50 829 Escherichia coli O139:H28 (strain E24377A / ETEC)
51 824 Shigella boydii serotype 4 (strain Sb227)
52 818 Escherichia coli (strain UTI89 / UPEC)
53 817 Ashbya gossypii (Yeast) (Eremothecium gossypii)
54 814 Escherichia coli (strain ATCC 8739 / DSM 1576 / Crooks)
55 800 Shigella dysenteriae serotype 1 (strain Sd197)
56 795 Candida albicans (Yeast)
57 794 Vibrio parahaemolyticus
58 789 Kluyveromyces lactis (Yeast) (Candida sphaerica)
59 785 Escherichia coli (strain SMS-3-5 / SECEC)
60 778 Erwinia carotovora subsp. atroseptica (Pectobacterium atrosepticum)
61 776 Pasteurella multocida
62 773 Aquifex aeolicus
63 771 Neurospora crassa
64 765 Escherichia coli (strain K12 / DH10B)
65 764 Canis familiaris (Dog)
66 759 Escherichia coli O127:H6 (strain E2348/69 / EPEC)
67 759 Escherichia coli (strain K12 / BW2952)
68 757 Escherichia coli O17:K52:H18 (strain UMN026 / ExPEC)
69 757 Escherichia coli (strain 55989 / EAEC)
70 757 Staphylococcus epidermidis (strain ATCC 35984 / RP62A)
71 756 Escherichia coli O8 (strain IAI1)
72 756 Staphylococcus epidermidis (strain ATCC 12228)
73 751 Escherichia coli O45:K1 (strain S88 / ExPEC)
74 750 Escherichia coli (strain SE11)
75 750 Shigella flexneri serotype 5b (strain 8401)
76 748 Escherichia coli O7:K1 (strain IAI39 / ExPEC)
77 747 Candida glabrata (Yeast) (Torulopsis glabrata)
78 742 Escherichia coli O157:H7 (strain EC4115 / EHEC)
79 738 Streptomyces coelicolor
80 738 Photorhabdus luminescens subsp. laumondii
81 731 Vibrio vulnificus
82 730 Bacillus halodurans
83 726 Escherichia coli O81 (strain ED1a)
84 722 Bacillus anthracis
85 722 Yersinia enterocolitica serotype O:8 / biotype 1B (strain 8081)
86 719 Salmonella enteritidis PT4 (strain P125109)
87 715 Vibrio vulnificus (strain YJ016)
88 715 Salmonella paratyphi B (strain ATCC BAA-1250 / SPB7)
89 713 Yersinia pestis bv. Antiqua (strain Nepal516)
90 713 Salmonella paratyphi A (strain AKU_12601)
91 712 Yersinia pseudotuberculosis serotype O:1b (strain IP 31758)
92 711 Staphylococcus aureus (strain NCTC 8325)
93 710 Salmonella newport (strain SL254)
94 709 Salmonella heidelberg (strain SL476)
95 709 Yersinia pestis bv. Antiqua (strain Antiqua)
96 709 Salmonella agona (strain SL483)
97 708 Salmonella schwarzengrund (strain CVM19633)
98 706 Escherichia coli O1:K1 / APEC
99 699 Salmonella dublin (strain CT_02021853)
100 697 Enterobacter sp. (strain 638)
101 696 Klebsiella pneumoniae subsp. pneumoniae (strain ATCC 700721 / MGH 78578)
102 696 Shigella boydii serotype 18 (strain CDC 3083-94 / BS512)
103 687 Mycoplasma pneumoniae
104 685 Pan troglodytes (Chimpanzee)
105 685 Escherichia fergusonii (strain ATCC 35469 / DSM 13698 / CDC 0568-73)
106 684 Pseudomonas syringae pv. tomato
107 682 Salmonella gallinarum (strain 287/91 / NCTC 13346)
108 682 Klebsiella pneumoniae (strain 342)
109 676 Anabaena sp. (strain PCC 7120)
110 670 Pseudomonas putida (strain KT2440)
111 666 Yersinia pestis (strain Pestoides F)
112 665 Staphylococcus aureus (strain USA300)
113 664 Citrobacter koseri (strain ATCC BAA-895 / CDC 4225-83 / SGSC4696)
114 661 Mycobacterium leprae
115 658 Rhizobium sp. (strain NGR234)
116 653 Serratia proteamaculans (strain 568)
117 651 Zea mays (Maize)
118 646 Escherichia coli
119 645 Bradyrhizobium japonicum
120 641 Staphylococcus aureus (strain bovine RF122 / ET3-1)
121 638 Bacillus cereus (strain ATCC 14579 / DSM 31)
122 637 Yersinia pseudotuberculosis serotype O:3 (strain YPIII)
123 635 Salmonella arizonae (strain ATCC BAA-731 / CDC346-86 / RSK2980)
124 633 Yersinia pseudotuberculosis serotype IB (strain PB1/+)
125 620 Shewanella oneidensis
126 617 Agrobacterium tumefaciens (strain C58 / ATCC 33970)
127 615 Treponema pallidum
128 613 Ralstonia solanacearum (Pseudomonas solanacearum)
129 608 Staphylococcus haemolyticus (strain JCSC1435)
130 608 Enterobacter sakazakii (strain ATCC BAA-894)
131 602 Rhizobium loti (Mesorhizobium loti)
132 602 Staphylococcus saprophyticus subsp. saprophyticus
133 600 Methanobacterium thermoautotrophicum
134 598 Yersinia pestis bv. Antiqua (strain Angola)
135 598 Salmonella paratyphi C (strain RKS4594)
136 598 Emericella nidulans (Aspergillus nidulans)
137 596 Listeria monocytogenes
138 595 Photobacterium profundum (Photobacterium sp. (strain SS9))
139 593 Debaryomyces hansenii (Yeast) (Torulaspora hansenii)
140 592 Yarrowia lipolytica (Candida lipolytica)
141 590 Bacillus cereus (strain ATCC 10987)
142 589 Xanthomonas campestris pv. campestris
143 588 Listeria innocua
144 585 Rickettsia prowazekii
145 584 Helicobacter pylori (Campylobacter pylori)
146 582 Pectobacterium carotovorum subsp. carotovorum (strain PC1)
147 581 Lactococcus lactis subsp. lactis (Streptococcus lactis)
148 579 Neisseria meningitidis serogroup B
149 576 Brucella suis
150 572 Brucella melitensis
151 572 Buchnera aphidicola subsp. Acyrthosiphon pisum
152 567 Bacillus thuringiensis subsp. konkukian
153 565 Helicobacter pylori J99 (Campylobacter pylori J99)
154 562 Buchnera aphidicola subsp. Schizaphis graminum
155 560 Bacillus cereus (strain ZK / E33L)
156 560 Pseudomonas syringae pv. syringae (strain B728a)
157 557 Pseudomonas aeruginosa (strain UCBPP-PA14)
158 556 Neisseria meningitidis serogroup A
159 555 Bacillus licheniformis (strain DSM 13 / ATCC 14580)
160 555 Xanthomonas axonopodis pv. citri (Citrus canker)
161 553 Vibrio fischeri (strain ATCC 700601 / ES114)
162 551 Pseudomonas fluorescens (strain Pf0-1)
163 549 Oceanobacillus iheyensis
164 545 Caulobacter crescentus (Caulobacter vibrioides)
165 545 Clostridium acetobutylicum
166 545 Pseudomonas fluorescens (strain Pf-5 / ATCC BAA-477)
167 538 Pseudomonas syringae pv. phaseolicola (strain 1448A / Race 6)
168 529 Listeria monocytogenes serotype 4b (strain F2365)
169 523 Erwinia tasmaniensis (strain DSM 17950 / Et1/99)
170 522 Sodalis glossinidius (strain morsitans)
171 521 Bordetella bronchiseptica (Alcaligenes bronchisepticus)
172 521 Xylella fastidiosa
173 519 Streptococcus pneumoniae
174 512 Xylella fastidiosa (strain Temecula1 / ATCC 700964)
175 510 Chromobacterium violaceum
176 509 Thermotoga maritima
177 509 Vibrio cholerae serotype O1 (strain ATCC 39541 / Ogawa 395 / O395)
178 507 Bordetella parapertussis
179 507 Buchnera aphidicola subsp. Baizongia pistaciae
180 507 Pseudomonas aeruginosa (strain PA7)
181 505 Bordetella pertussis
182 504 Haemophilus ducreyi
183 504 Geobacillus kaustophilus
184 503 Staphylococcus aureus (strain Newman)
185 500 Pseudomonas entomophila (strain L48)
186 498 Brucella abortus
187 497 Rickettsia conorii
188 496 Bacillus clausii (strain KSM-K16)
189 492 Haemophilus influenzae (strain 86-028NP)
190 492 Deinococcus radiodurans
191 490 Xanthomonas campestris pv. campestris (strain 8004)
192 490 Vibrio harveyi (strain ATCC BAA-1116 / BB120)
193 490 Clostridium perfringens
194 488 Bacillus amyloliquefaciens (strain FZB42)
195 487 Burkholderia pseudomallei (Pseudomonas pseudomallei)
196 487 Shewanella sp. (strain MR-7)
197 485 Aspergillus fumigatus (Sartorya fumigata)
198 484 Pseudomonas aeruginosa (strain LESB58)
199 484 Shewanella sp. (strain MR-4)
200 483 Mannheimia succiniciproducens (strain MBEL55E)
201 483 Mycoplasma genitalium
202 483 Staphylococcus aureus (strain Mu3 / ATCC 700698)
203 482 Streptomyces avermitilis
204 481 Corynebacterium glutamicum (Brevibacterium flavum)
205 480 Proteus mirabilis (strain HI4320)
206 480 Caenorhabditis briggsae
207 478 Oryza sativa subsp. indica (Rice)
208 475 Methanosarcina acetivorans
209 475 Synechococcus elongatus (strain PCC 7942) (Anacystis nidulans R2)
210 472 Burkholderia sp. (strain 383) (Burkholderia cepacia
211 472 Pseudomonas putida (strain F1 / ATCC 700007)
212 472 Brucella abortus (strain 2308)
213 472 Thermosynechococcus elongatus (strain BP-1)
214 468 Enterococcus faecalis (Streptococcus faecalis)
215 466 Acinetobacter sp. (strain ADP1)
216 465 Pyrococcus horikoshii
217 465 Xanthomonas campestris pv. vesicatoria (strain 85-10)
218 465 Pseudomonas putida (strain GB-1)
219 464 Rhodopseudomonas palustris
220 464 Shewanella frigidimarina (strain NCIMB 400)
221 462 Anabaena variabilis (strain ATCC 29413 / PCC 7937)
222 462 Shewanella sp. (strain ANA-3)
223 461 Burkholderia mallei (Pseudomonas mallei)
224 460 Ralstonia eutropha (Cupriavidus necator
225 458 Lactobacillus plantarum
226 457 Streptococcus pneumoniae (strain ATCC BAA-255 / R6)
227 457 Pyrococcus abyssi
228 457 Ralstonia eutropha (strain JMP134) (Alcaligenes eutrophus)
229 456 Methanosarcina mazei (Methanosarcina frisia)
230 455 Aeromonas hydrophila subsp. hydrophila (strain ATCC 7966 / NCIB 9240)
231 454 Staphylococcus aureus (strain JH1)
232 453 Rickettsia felis (Rickettsia azadi)
233 453 Xanthomonas oryzae pv. oryzae (strain MAFF 311018)
234 452 Shewanella baltica (strain OS185)
235 452 Pseudomonas putida (strain W619)
236 452 Halobacterium salinarium (Halobacterium halobium)
237 448 Staphylococcus aureus (strain JH9)
238 448 Thermoanaerobacter tengcongensis
239 448 Streptococcus mutans
240 447 Methylococcus capsulatus
241 447 Aeromonas salmonicida (strain A449)
242 446 Ovis aries (Sheep)
243 446 Rhodobacter sphaeroides (strain ATCC 17023 / 2.4.1 / NCIB 8253 / DSM 158)
244 445 Vibrio fischeri (strain MJ11)
245 444 Pseudomonas mendocina (strain ymp)
246 443 Hahella chejuensis (strain KCTC 2396)
247 441 Streptococcus pyogenes serotype M6
248 441 Dechloromonas aromatica (strain RCB)
249 440 Pyrococcus furiosus
250 439 Nicotiana tabacum (Common tobacco)
2.3 Taxonomic distribution of the sequences
Kingdom sequences (% of the database)
Archaea 18183 ( 4%)
Bacteria 323233 ( 63%)
Eukaryota 158932 ( 31%)
Viruses 14855 ( 3%)
Within Eukaryota:
Category sequences (% of Eukaryota) (% of the complete database)
Human 20266 ( 13%) ( 4%)
Other Mammalia 44572 ( 28%) ( 9%)
Other Vertebrata 15995 ( 10%) ( 3%)
Viridiplantae 28783 ( 18%) ( 6%)
Fungi 25145 ( 16%) ( 5%)
Insecta 7719 ( 5%) ( 1%)
Nematoda 4041 ( 3%) ( 1%)
Other 12411 ( 8%) ( 2%)
3. SEQUENCE SIZE
Repartition of the sequences by size (excluding fragments)
From To Number From To Number
1- 50 8374 1001-1100 3461
51- 100 39851 1101-1200 2393
101- 150 55768 1201-1300 1901
151- 200 55885 1301-1400 1784
201- 250 54401 1401-1500 1419
251- 300 47843 1501-1600 630
301- 350 48255 1601-1700 496
351- 400 41252 1701-1800 409
401- 450 33736 1801-1900 390
451- 500 27114 1901-2000 322
501- 550 19180 2001-2100 193
551- 600 13689 2101-2200 261
601- 650 11453 2201-2300 274
651- 700 8151 2301-2400 168
701- 750 6789 2401-2500 129
751- 800 4797 >2500 1000
801- 850 4121
851- 900 4739
901- 950 3605
951-1000 2519
The average sequence length in UniProtKB/Swiss-Prot is 351 amino acids.
The shortest sequence is GWA_SEPOF (P83570): 2 amino acids.
The longest sequence is TITIN_MOUSE (A2ASS6): 35213 amino acids.
4. JOURNAL CITATIONS
Note: the following citation statistics reflect the number of distinct
journal citations.
Total number of journals cited in this release of UniProtKB/Swiss-Prot: 2048
4.1 Table of the frequency of journal citations
Journals cited 1x: 659
2x: 282
3x: 139
4x: 103
5x: 87
6x: 62
7x: 35
8x: 39
9x: 39
10x: 24
11- 20x: 161
21- 50x: 165
51-100x: 96
>100x: 157
4.2 List of the most cited journals in UniProtKB/Swiss-Prot
Nb Citations Journal name
-- --------- -------------------------------------------------------------
1 17746 Journal of Biological Chemistry
2 8225 Proceedings of the National Academy of Sciences of the U.S.A.
3 4987 Journal of Bacteriology
4 4491 Gene
5 4481 Biochemical and Biophysical Research Communications
6 4290 Nucleic Acids Research
7 3933 FEBS Letters
8 3791 Biochemistry
9 3713 The EMBO Journal
10 3382 Molecular and Cellular Biology
11 3199 Nature
12 3082 European Journal of Biochemistry
13 2999 Journal of Molecular Biology
14 2959 Biochimica et Biophysica Acta
15 2646 Cell
16 2471 Genomics
17 2155 Biochemical Journal
18 2100 Science
19 2024 Journal of Virology
20 1747 Molecular Microbiology
21 1556 Journal of Cell Biology
22 1489 Plant Molecular Biology
23 1353 Genes and Development
24 1347 Virology
25 1304 Nature Genetics
26 1303 Molecular and General Genetics
27 1301 Human Molecular Genetics
28 1286 Plant Physiology
29 1199 The American Journal of Human Genetics
30 1167 Oncogene
31 1154 Journal of Biochemistry
32 1139 Development
33 1082 Human Mutation
34 1004 Molecular Biology of the Cell
35 1001 Journal of Immunology
36 973 Genetics
37 879 Structure
38 868 Journal of General Virology
39 864 Infection and Immunity
40 840 The Plant Cell
41 814 Archives of Biochemistry and Biophysics
42 793 Molecular Cell
43 790 Blood
44 756 Yeast
45 743 Microbiology
46 718 Developmental Biology
47 718 The Plant Journal
48 714 Journal of Cell Science
49 662 Cancer Research
50 648 FEMS Microbiology Letters
51 635 Current Biology
52 590 Human Genetics
53 586 Mechanisms of Development
54 585 Nature Structural Biology
55 538 Acta Crystallographica, Section D
56 533 Protein Science
57 527 Journal of Neuroscience
58 523 Current Genetics
59 519 Applied and Environmental Microbiology
60 504 Toxicon
61 499 Journal of Clinical Investigation
62 496 Neuron
63 469 Mammalian Genome
64 452 American Journal of Physiology
65 445 Immunogenetics
66 440 The Journal of Experimental Medicine
67 436 Molecular Endocrinology
68 419 Molecular and Biochemical Parasitology
69 407 Journal of Neurochemistry
70 396 The Journal of Clinical Endocrinology and Metabolism
71 385 Endocrinology
72 376 Journal of Molecular Evolution
73 365 DNA and Cell Biology
74 355 Proteins
75 354 DNA Sequence
76 351 Molecular Biology and Evolution
77 350 Bioscience, Biotechnology, and Biochemistry
78 346 Journal of Medical Genetics
79 314 Brain Research. Molecular Brain Research
80 292 Plant and Cell Physiology
81 290 Experimental Cell Research
82 289 Biological Chemistry Hoppe-Seyler
83 288 Peptides
84 287 Nature Cell Biology
85 285 Comparative Biochemistry and Physiology
86 280 Tissue Antigens
87 279 Antimicrobial Agents and Chemotherapy
88 277 Journal of Investigative Dermatology
89 274 Cytogenetics and Cell Genetics
90 267 Molecular Pharmacology
91 255 Biology of Reproduction
92 247 Journal of General Microbiology
93 245 Genome Research
94 241 Neurology
95 239 RNA
96 238 Developmental Dynamics
97 237 Developmental Cell
98 231 Virus Research
99 215 Hoppe-Seyler's Zeitschrift fur Physiologische Chemie
100 205 DNA Research
101 204 Planta
102 203 European Journal of Immunology
103 202 Molecular Plant-Microbe Interactions
104 199 Biochimie
105 199 Annals of Neurology
106 193 European Journal of Human Genetics
107 193 Genes to Cells
108 189 Eukaryotic cell
109 181 Immunity
110 179 Journal of Human Genetics
111 173 The New England Journal of Medicine
112 171 Molecular and Cellular Endocrinology
113 168 Nature Structural and Molecular Biology
114 167 Investigative Ophthalmology and Visual Science
115 164 Archives of Microbiology
116 163 American Journal of Medical Genetics
117 163 Molecular Phylogenetics and Evolution
118 159 DNA
119 156 EMBO Reports
120 155 Insect Biochemistry and Molecular Biology
121 153 Hemoglobin
122 152 The FASEB Journal
123 151 Bioorganicheskaia Khimiia
124 149 The FEBS Journal
125 148 Molecular Reproduction and Development
126 148 Diabetes
127 147 Molecular Immunology
128 145 Archives of Virology
129 142 Glycobiology
130 142 Clinical Genetics
131 136 General and Comparative Endocrinology
132 135 International Journal of Cancer
133 135 Animal Genetics
134 135 Molecular Genetics and Metabolism
135 132 Molecular and Cellular Neuroscience
136 130 British Journal of Haematology
137 128 Journal of Cellular Biochemistry
138 125 Biological Chemistry
139 123 American Journal of Medical Genetics. Part A
140 122 Molecular Genetics and Genomics
141 121 Journal of the American Chemical Society
142 120 Agricultural and Biological Chemistry
143 119 Nature Immunology
144 118 BMC Genomics
145 118 Journal of Lipid Research
146 114 Proteomics
147 113 Thrombosis and Haemostasis
148 113 Circulation Research
149 113 Neuroscience Letters
150 113 Journal of Protein Chemistry
5. STATISTICS FOR SOME LINE TYPES
The following table summarizes the total number of some UniProtKB/Swiss-Prot lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.
Total Number of Average
Line type / subtype number entries per entry
------------------------------------ -------- --------- ---------
References (RL) 916523 1.78
Journal 724695 385502 1.41 1
Submitted to EMBL/GenBank/DDBJ 179159 165908 0.35 2
Submitted to other databases 10622 9218 0.02 3
Book citation 635 621 <0.01 4
Plant Gene Register 560 548 <0.01 5
Thesis 395 392 <0.01 6
Unpublished observations 294 290 <0.01 7
Patent 157 155 <0.01 8
Worm Breeder's Gazette 6 6 <0.01 9
Total number of distinct authors cited in UniProtKB/Swiss-Prot: 285894
Total Number of Average
Line type / subtype number entries per entry Rank
------------------------------------ -------- --------- --------- ----
Comments (CC) 2174677 4.22
ALLERGEN 460 460 <0.01 26
ALTERNATIVE PRODUCTS 18657 18657 0.04 12
BIOPHYSICOCHEMICAL PROPERTIES 2918 2918 0.01 22
BIOTECHNOLOGY 256 254 <0.01 28
CATALYTIC ACTIVITY 218486 199288 0.42 5
CAUTION 6803 6664 0.01 19
COFACTOR 99099 91004 0.19 7
DEVELOPMENTAL STAGE 8728 8728 0.02 16
DISEASE 4344 2940 0.01 20
DISRUPTION PHENOTYPE 2441 2441 <0.01 23
DOMAIN 31032 27605 0.06 10
ENZYME REGULATION 7742 7742 0.02 18
FUNCTION 383140 367142 0.74 2
INDUCTION 11469 11469 0.02 15
INTERACTION 12308 12308 0.02 14
MASS SPECTROMETRY 4235 3196 0.01 21
MISCELLANEOUS 30073 27800 0.06 11
PATHWAY 126370 115423 0.25 6
PHARMACEUTICAL 83 83 <0.01 29
POLYMORPHISM 771 739 <0.01 24
PTM 35224 28544 0.07 8
RNA EDITING 603 603 <0.01 25
SEQUENCE CAUTION 12702 12702 0.02 13
SIMILARITY 598859 490601 1.16 1
SUBCELLULAR LOCATION 296364 291347 0.58 3
SUBUNIT 220331 220331 0.43 4
TISSUE SPECIFICITY 32472 32472 0.06 9
TOXIC DOSE 417 406 <0.01 27
WEB RESOURCE 8290 6581 0.02 17
Total number of comment topics: 29
Total Number of Average
Line type / subtype number entries per entry Rank
------------------------------------ -------- --------- --------- ----
Features (FT) 3204025 6.22
ACT_SITE 127567 76218 0.25 9
BINDING 200356 57370 0.39 4
CA_BIND 3651 1479 0.01 35
CARBOHYD 95872 24560 0.19 13
CHAIN 521717 510497 1.01 1
COILED 18142 12239 0.04 26
COMPBIAS 48780 25480 0.09 18
CONFLICT 115522 40534 0.22 10
CROSSLNK 4794 3114 0.01 34
DISULFID 94293 25035 0.18 14
DNA_BIND 10866 10000 0.02 29
DOMAIN 142726 85177 0.28 6
HELIX 130288 13628 0.25 8
INIT_MET 14802 14802 0.03 27
LIPID 10518 6700 0.02 30
METAL 271321 66752 0.53 3
MOD_RES 177672 58392 0.34 5
MOTIF 31998 20591 0.06 22
MUTAGEN 30053 7163 0.06 24
NON_CONS 1545 634 <0.01 36
NON_STD 348 273 <0.01 38
NON_TER 11476 8710 0.02 28
NP_BIND 104535 68224 0.20 12
PEPTIDE 8491 5423 0.02 32
PROPEP 10415 8781 0.02 31
REGION 91553 50349 0.18 15
REPEAT 87521 12938 0.17 16
SIGNAL 33689 33679 0.07 21
SITE 36775 21756 0.07 20
STRAND 130850 12739 0.25 7
TOPO_DOM 115001 23624 0.22 11
TRANSIT 6466 6380 0.01 33
TRANSMEM 336584 68799 0.65 2
TURN 31089 10765 0.06 23
UNSURE 1105 350 <0.01 37
VAR_SEQ 38660 16599 0.08 19
VARIANT 79053 16425 0.15 17
ZN_FING 27931 12202 0.05 25
Total number of feature keys: 38
Total Number of Average
Line type / subtype number entries per entry Rank Category
------------------------------------ -------- --------- --------- ---- -------------------------------------------
Cross-references (DR) 12981936 25.20
2DBase-Ecoli 84 84 <0.01 116 2D gel databases
Aarhus/Ghent-2DPAGE 126 96 <0.01 113 2D gel databases
AGD 823 817 <0.01 89 Organism-specific databases
ANU-2DPAGE 23 23 <0.01 123 2D gel databases
ArachnoServer 462 458 <0.01 97 Organism-specific databases
ArrayExpress 58045 58045 0.11 38 Gene expression databases
Bgee 37641 37638 0.07 44 Gene expression databases
BindingDB 297 297 <0.01 106 Other
BioCyc 160537 147681 0.31 21 Enzyme and pathway databases
BRENDA 65155 62359 0.13 35 Enzyme and pathway databases
BuruList 330 330 <0.01 105 Organism-specific databases
CAZy 5647 5026 0.01 65 Protein family/group databases
CGD 554 550 <0.01 94 Organism-specific databases
CleanEx 30211 29564 0.06 46 Gene expression databases
COMPLUYEAST-2DPAGE 59 59 <0.01 118 2D gel databases
Cornea-2DPAGE 67 67 <0.01 117 2D gel databases
CTD 63479 62910 0.12 37 Organism-specific databases
CYGD 6628 6524 0.01 64 Organism-specific databases
dictyBase 4260 4137 0.01 73 Organism-specific databases
DIP 11496 11391 0.02 56 Protein-protein interaction databases
DisProt 397 394 <0.01 100 3D structure databases
DOSAC-COBS-2DPAGE 150 150 <0.01 112 2D gel databases
DrugBank 5317 1626 0.01 67 Other
EchoBASE 4159 4124 0.01 75 Organism-specific databases
ECO2DBASE 351 299 <0.01 104 2D gel databases
EcoGene 4354 4351 0.01 71 Organism-specific databases
eggNOG 216413 216413 0.42 18 Phylogenomic databases
EMBL 848161 505513 1.65 3 Sequence databases
Ensembl 90110 69668 0.17 28 Genome annotation databases
euHCVdb 55 44 <0.01 119 Organism-specific databases
EuPathDB 231 231 <0.01 110 Organism-specific databases
FlyBase 5463 5087 0.01 66 Organism-specific databases
Gene3D 235321 193263 0.46 17 Family and domain databases
GeneCards 21070 19810 0.04 50 Organism-specific databases
GeneDB_Spombe 4976 4931 0.01 69 Organism-specific databases
GeneFarm 2690 2675 0.01 81 Organism-specific databases
GeneID 467139 448177 0.91 6 Genome annotation databases
Genevestigator 64358 64358 0.12 36 Gene expression databases
GenomeReviews 376632 356619 0.73 9 Genome annotation databases
GermOnline 41923 41316 0.08 43 Gene expression databases
GlycoSuiteDB 280 280 <0.01 107 PTM databases
GO 2152314 481476 4.18 1 Ontologies
Gramene 4290 4290 0.01 72 Organism-specific databases
H-InvDB 10859 9799 0.02 57 Organism-specific databases
HAMAP 307274 307130 0.60 15 Family and domain databases
HGNC 19528 19356 0.04 51 Organism-specific databases
HOGENOM 359249 359249 0.70 10 Phylogenomic databases
HOVERGEN 74264 74264 0.14 31 Phylogenomic databases
HPA 8704 6562 0.02 60 Organism-specific databases
HSC-2DPAGE 85 85 <0.01 115 2D gel databases
HSSP 28888 28888 0.06 47 3D structure databases
InParanoid 65753 65753 0.13 32 Phylogenomic databases
IntAct 21754 21754 0.04 49 Protein-protein interaction databases
InterPro 1583334 486831 3.07 2 Family and domain databases
IPI 88263 63320 0.17 29 Sequence databases
KEGG 438773 417044 0.85 8 Genome annotation databases
LegioList 760 758 <0.01 90 Organism-specific databases
Leproma 664 661 <0.01 93 Organism-specific databases
ListiList 1185 1177 <0.01 86 Organism-specific databases
MaizeGDB 472 467 <0.01 96 Organism-specific databases
MEROPS 9946 9624 0.02 58 Protein family/group databases
MGI 16104 16053 0.03 53 Organism-specific databases
MIM 15806 12440 0.03 55 Organism-specific databases
MypuList 203 203 <0.01 111 Organism-specific databases
NextBio 48682 48682 0.09 41 Other
NMPDR 130076 130072 0.25 24 Genome annotation databases
OGP 377 377 <0.01 103 2D gel databases
OMA 353254 353254 0.69 11 Phylogenomic databases
Orphanet 3674 2131 0.01 78 Organism-specific databases
OrthoDB 55415 55415 0.11 39 Phylogenomic databases
PANTHER 184791 169613 0.36 20 Family and domain databases
Pathway_Interaction_DB 4567 1665 0.01 70 Enzyme and pathway databases
PDB 65686 15487 0.13 34 3D structure databases
PDBsum 65686 15488 0.13 33 3D structure databases
PeptideAtlas 5167 5167 0.01 68 Proteomic databases
PeroxiBase 677 665 <0.01 92 Protein family/group databases
Pfam 661344 466824 1.28 4 Family and domain databases
PharmGKB 15809 15798 0.03 54 Organism-specific databases
PHCI-2DPAGE 247 247 <0.01 109 2D gel databases
PhosphoSite 19301 19301 0.04 52 PTM databases
PhosSite 267 267 <0.01 108 PTM databases
PhotoList 738 738 <0.01 91 Organism-specific databases
PhylomeDB 121182 121182 0.24 25 Phylogenomic databases
PIR 115021 105057 0.22 26 Sequence databases
PIRSF 82100 82100 0.16 30 Family and domain databases
PMAP-CutDB 1394 1394 <0.01 84 Other
PMMA-2DPAGE 52 52 <0.01 120 2D gel databases
PptaseDB 34 34 <0.01 121 Protein family/group databases
PRIDE 53651 53651 0.10 40 Proteomic databases
PRINTS 136299 117891 0.26 23 Family and domain databases
ProDom 27781 27452 0.05 48 Family and domain databases
ProMEX 439 439 <0.01 98 Proteomic databases
PROSITE 456482 290969 0.89 7 Family and domain databases
ProtClustDB 323922 323922 0.63 13 Phylogenomic databases
PseudoCAP 1212 1203 <0.01 85 Organism-specific databases
Rat-heart-2DPAGE 28 28 <0.01 122 2D gel databases
Reactome 7331 4257 0.01 62 Enzyme and pathway databases
REBASE 379 358 <0.01 102 Protein family/group databases
RefSeq 487669 448457 0.95 5 Sequence databases
REPRODUCTION-2DPAGE 1030 942 <0.01 88 2D gel databases
RGD 7364 7360 0.01 61 Organism-specific databases
SagaList 389 388 <0.01 101 Organism-specific databases
SGD 6641 6540 0.01 63 Organism-specific databases
Siena-2DPAGE 103 103 <0.01 114 2D gel databases
SMART 141828 109431 0.28 22 Family and domain databases
SMR 345706 345706 0.67 12 3D structure databases
STRING 203542 203530 0.40 19 Protein-protein interaction databases
SubtiList 4200 4191 0.01 74 Organism-specific databases
SUPFAM 311933 247029 0.61 14 Family and domain databases
SWISS-2DPAGE 1182 1182 <0.01 87 2D gel databases
TAIR 8959 8847 0.02 59 Organism-specific databases
TCDB 3293 3252 0.01 80 Protein family/group databases
TIGR 33909 33142 0.07 45 Genome annotation databases
TIGRFAMs 280320 261597 0.54 16 Family and domain databases
TubercuList 1587 1551 <0.01 83 Organism-specific databases
UCSC 48477 39503 0.09 42 Genome annotation databases
UniGene 91817 80915 0.18 27 Sequence databases
VectorBase 418 404 <0.01 99 Genome annotation databases
World-2DPAGE 507 507 <0.01 95 2D gel databases
WormBase 3818 3733 0.01 77 Organism-specific databases
WormPep 4055 3275 0.01 76 Organism-specific databases
Xenbase 3663 3590 0.01 79 Organism-specific databases
ZFIN 2515 2504 <0.01 82 Organism-specific databases
Total number of cross-referenced databases: 123
6. AMINO ACID COMPOSITION
6.1 Composition in percent for the complete database
Ala (A) 8.28 Gln (Q) 3.94 Leu (L) 9.67 Ser (S) 6.50
Arg (R) 5.54 Glu (E) 6.77 Lys (K) 5.85 Thr (T) 5.32
Asn (N) 4.05 Gly (G) 7.09 Met (M) 2.43 Trp (W) 1.07
Asp (D) 5.45 His (H) 2.27 Phe (F) 3.86 Tyr (Y) 2.91
Cys (C) 1.36 Ile (I) 5.99 Pro (P) 4.68 Val (V) 6.88
Asx (B) 0.000 Glx (Z) 0.000 Xaa (X) 0.00
Legend: gray = aliphatic, red = acidic, green = small hydroxy,
blue = basic, black = aromatic, white = amide, yellow = sulfur
6.2 Classification of the amino acids by their frequency
Leu, Ala, Gly, Val, Glu, Ser, Ile, Lys, Arg, Asp, Thr, Pro, Asn, Gln,
Phe, Tyr, Met, His, Cys, Trp
7. MISCELLANEOUS STATISTICS
4446 entries are encoded on a mitochondrion, and 3555 are encoded on a plasmid.
12174 entries are encoded on a plastid,
of which 21 are encoded on apicoplasts,
11616 on chloroplasts,
44 on organellar chromatophores,
145 on cyanelles,
149 on non-photosynthetic plastids and
199 on unspecified types of plastid.
Number of entries with at least one sequence correction: 68420