UniProtKB/Swiss-Prot protein knowledgebase release 57.14 statistics
1. INTRODUCTION
Release 57.14 of 09-Feb-10 of UniProtKB/Swiss-Prot contains 514789 sequence entries,
comprising 181163771 amino acids abstracted from 186824 references.
668 sequences have been added since release 57.13, the sequence data of
56 existing entries has been updated and the annotations of
316460 entries have been revised.
Number of fragments: 8440
Number of additional sequences produced by alternative splicing, initiation or promoter usage, or ribosomal frameshifting: 28833
Protein existence (PE): entries %
1: Evidence at protein level 68292 13.3%
2: Evidence at transcript level 66408 12.9%
3: Inferred from homology 364228 70.8%
4: Predicted 14329 2.8%
5: Uncertain 1532 0.3%
The growth of the database is summarized below.
2. TAXONOMIC ORIGIN
Total number of species represented in this release of UniProtKB/Swiss-Prot: 12037
The first twenty species represent 107231 sequences: 20.8 % of the total
number of entries.
2.1 Table of the frequency of occurrence of species
Species represented 1x: 5235
2x: 1698
3x: 897
4x: 571
5x: 421
6x: 346
7x: 246
8x: 205
9x: 183
10x: 105
11- 20x: 574
21- 50x: 368
51-100x: 176
>100x: 1012
2.2 Table of the most represented species
------ --------- --------------------------------------------
Number Frequency Species
------ --------- --------------------------------------------
1 20272 Homo sapiens (Human)
2 16216 Mus musculus (Mouse)
3 8847 Arabidopsis thaliana (Mouse-ear cress)
4 7476 Rattus norvegicus (Rat)
5 6552 Saccharomyces cerevisiae (Baker's yeast)
6 5743 Bos taurus (Bovine)
7 4974 Schizosaccharomyces pombe (Fission yeast)
8 4367 Escherichia coli (strain K12)
9 4249 Bacillus subtilis
10 4129 Dictyostelium discoideum (Slime mold)
11 3281 Caenorhabditis elegans
12 3205 Xenopus laevis (African clawed frog)
13 3052 Drosophila melanogaster (Fruit fly)
14 2598 Danio rerio (Zebrafish) (Brachydanio rerio)
15 2365 Oryza sativa subsp. japonica (Rice)
16 2206 Pongo abelii (Sumatran orangutan)
17 2151 Gallus gallus (Chicken)
18 1993 Escherichia coli O157:H7
19 1782 Methanocaldococcus jannaschii (Methanococcus jannaschii)
20 1773 Haemophilus influenzae
21 1757 Salmonella typhimurium
22 1668 Escherichia coli O6
23 1665 Shigella flexneri
24 1558 Mycobacterium tuberculosis
25 1512 Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
26 1361 Sus scrofa (Pig)
27 1341 Salmonella typhi
28 1273 Pseudomonas aeruginosa
29 1213 Mycobacterium bovis
30 1159 Macaca fascicularis (Crab eating macaque) (Cynomolgus monkey)
31 1015 Synechocystis sp. (strain PCC 6803)
32 995 Yersinia pestis
33 991 Archaeoglobus fulgidus
34 940 Vibrio cholerae
35 929 Salmonella paratyphi A
36 922 Staphylococcus aureus (strain N315)
37 922 Staphylococcus aureus (strain Mu50 / ATCC 700699)
38 912 Rhizobium meliloti (Sinorhizobium meliloti)
39 909 Acanthamoeba polyphaga mimivirus (APMV)
40 896 Staphylococcus aureus (strain COL)
41 894 Staphylococcus aureus (strain MW2)
42 888 Staphylococcus aureus (strain MSSA476)
43 885 Staphylococcus aureus (strain MRSA252)
44 881 Oryctolagus cuniculus (Rabbit)
45 879 Escherichia coli O6:K15:H31 (strain 536 / UPEC)
46 879 Salmonella choleraesuis
47 869 Shigella sonnei (strain Ss046)
48 863 Yersinia pseudotuberculosis
49 835 Escherichia coli O9:H4 (strain HS)
50 829 Escherichia coli O139:H28 (strain E24377A / ETEC)
51 824 Shigella boydii serotype 4 (strain Sb227)
52 818 Escherichia coli (strain UTI89 / UPEC)
53 817 Ashbya gossypii (Yeast) (Eremothecium gossypii)
54 814 Escherichia coli (strain ATCC 8739 / DSM 1576 / Crooks)
55 800 Shigella dysenteriae serotype 1 (strain Sd197)
56 795 Candida albicans (Yeast)
57 794 Vibrio parahaemolyticus
58 789 Kluyveromyces lactis (Yeast) (Candida sphaerica)
59 785 Escherichia coli (strain SMS-3-5 / SECEC)
60 778 Erwinia carotovora subsp. atroseptica (Pectobacterium atrosepticum)
61 776 Pasteurella multocida
62 773 Aquifex aeolicus
63 771 Neurospora crassa
64 765 Escherichia coli (strain K12 / DH10B)
65 764 Canis familiaris (Dog)
66 759 Escherichia coli O127:H6 (strain E2348/69 / EPEC)
67 759 Escherichia coli (strain K12 / BW2952)
68 757 Escherichia coli (strain 55989 / EAEC)
69 757 Staphylococcus epidermidis (strain ATCC 35984 / RP62A)
70 756 Escherichia coli O17:K52:H18 (strain UMN026 / ExPEC)
71 756 Escherichia coli O8 (strain IAI1)
72 756 Staphylococcus epidermidis (strain ATCC 12228)
73 750 Escherichia coli (strain SE11)
74 750 Shigella flexneri serotype 5b (strain 8401)
75 750 Escherichia coli O45:K1 (strain S88 / ExPEC)
76 748 Escherichia coli O7:K1 (strain IAI39 / ExPEC)
77 747 Candida glabrata (Yeast) (Torulopsis glabrata)
78 742 Escherichia coli O157:H7 (strain EC4115 / EHEC)
79 738 Streptomyces coelicolor
80 738 Photorhabdus luminescens subsp. laumondii
81 731 Vibrio vulnificus
82 730 Bacillus halodurans
83 726 Escherichia coli O81 (strain ED1a)
84 722 Bacillus anthracis
85 722 Yersinia enterocolitica serotype O:8 / biotype 1B (strain 8081)
86 719 Salmonella enteritidis PT4 (strain P125109)
87 715 Vibrio vulnificus (strain YJ016)
88 715 Salmonella paratyphi B (strain ATCC BAA-1250 / SPB7)
89 713 Yersinia pestis bv. Antiqua (strain Nepal516)
90 713 Salmonella paratyphi A (strain AKU_12601)
91 712 Yersinia pseudotuberculosis serotype O:1b (strain IP 31758)
92 711 Staphylococcus aureus (strain NCTC 8325)
93 710 Salmonella newport (strain SL254)
94 709 Salmonella heidelberg (strain SL476)
95 709 Yersinia pestis bv. Antiqua (strain Antiqua)
96 709 Salmonella agona (strain SL483)
97 708 Salmonella schwarzengrund (strain CVM19633)
98 705 Escherichia coli O1:K1 / APEC
99 699 Salmonella dublin (strain CT_02021853)
100 697 Enterobacter sp. (strain 638)
101 696 Klebsiella pneumoniae subsp. pneumoniae (strain ATCC 700721 / MGH 78578)
102 696 Shigella boydii serotype 18 (strain CDC 3083-94 / BS512)
103 687 Mycoplasma pneumoniae
104 685 Escherichia fergusonii (strain ATCC 35469 / DSM 13698 / CDC 0568-73)
105 684 Pseudomonas syringae pv. tomato
106 683 Pan troglodytes (Chimpanzee)
107 682 Salmonella gallinarum (strain 287/91 / NCTC 13346)
108 682 Klebsiella pneumoniae (strain 342)
109 676 Anabaena sp. (strain PCC 7120)
110 670 Pseudomonas putida (strain KT2440)
111 666 Yersinia pestis (strain Pestoides F)
112 665 Staphylococcus aureus (strain USA300)
113 664 Citrobacter koseri (strain ATCC BAA-895 / CDC 4225-83 / SGSC4696)
114 661 Mycobacterium leprae
115 658 Rhizobium sp. (strain NGR234)
116 653 Serratia proteamaculans (strain 568)
117 651 Zea mays (Maize)
118 645 Escherichia coli
119 645 Bradyrhizobium japonicum
120 641 Staphylococcus aureus (strain bovine RF122 / ET3-1)
121 638 Bacillus cereus (strain ATCC 14579 / DSM 31)
122 637 Yersinia pseudotuberculosis serotype O:3 (strain YPIII)
123 634 Salmonella arizonae (strain ATCC BAA-731 / CDC346-86 / RSK2980)
124 633 Yersinia pseudotuberculosis serotype IB (strain PB1/+)
125 620 Shewanella oneidensis
126 617 Agrobacterium tumefaciens (strain C58 / ATCC 33970)
127 615 Treponema pallidum
128 613 Ralstonia solanacearum (Pseudomonas solanacearum)
129 608 Staphylococcus haemolyticus (strain JCSC1435)
130 608 Enterobacter sakazakii (strain ATCC BAA-894)
131 602 Rhizobium loti (Mesorhizobium loti)
132 602 Staphylococcus saprophyticus subsp. saprophyticus
133 600 Methanobacterium thermoautotrophicum
134 598 Yersinia pestis bv. Antiqua (strain Angola)
135 598 Salmonella paratyphi C (strain RKS4594)
136 598 Emericella nidulans (Aspergillus nidulans)
137 596 Listeria monocytogenes
138 595 Photobacterium profundum (Photobacterium sp. (strain SS9))
139 593 Debaryomyces hansenii (Yeast) (Torulaspora hansenii)
140 592 Yarrowia lipolytica (Candida lipolytica)
141 590 Bacillus cereus (strain ATCC 10987)
142 589 Xanthomonas campestris pv. campestris
143 588 Listeria innocua
144 585 Rickettsia prowazekii
145 584 Helicobacter pylori (Campylobacter pylori)
146 582 Pectobacterium carotovorum subsp. carotovorum (strain PC1)
147 581 Lactococcus lactis subsp. lactis (Streptococcus lactis)
148 579 Neisseria meningitidis serogroup B
149 576 Brucella suis
150 572 Brucella melitensis
151 572 Buchnera aphidicola subsp. Acyrthosiphon pisum
152 567 Bacillus thuringiensis subsp. konkukian
153 565 Helicobacter pylori J99 (Campylobacter pylori J99)
154 562 Buchnera aphidicola subsp. Schizaphis graminum
155 560 Bacillus cereus (strain ZK / E33L)
156 560 Pseudomonas syringae pv. syringae (strain B728a)
157 557 Pseudomonas aeruginosa (strain UCBPP-PA14)
158 556 Neisseria meningitidis serogroup A
159 555 Bacillus licheniformis (strain DSM 13 / ATCC 14580)
160 555 Xanthomonas axonopodis pv. citri (Citrus canker)
161 553 Vibrio fischeri (strain ATCC 700601 / ES114)
162 551 Pseudomonas fluorescens (strain Pf0-1)
163 549 Oceanobacillus iheyensis
164 545 Caulobacter crescentus (Caulobacter vibrioides)
165 545 Clostridium acetobutylicum
166 545 Pseudomonas fluorescens (strain Pf-5 / ATCC BAA-477)
167 538 Pseudomonas syringae pv. phaseolicola (strain 1448A / Race 6)
168 529 Listeria monocytogenes serotype 4b (strain F2365)
169 523 Erwinia tasmaniensis (strain DSM 17950 / Et1/99)
170 522 Sodalis glossinidius (strain morsitans)
171 521 Bordetella bronchiseptica (Alcaligenes bronchisepticus)
172 521 Xylella fastidiosa
173 519 Streptococcus pneumoniae
174 512 Xylella fastidiosa (strain Temecula1 / ATCC 700964)
175 510 Chromobacterium violaceum
176 509 Thermotoga maritima
177 509 Vibrio cholerae serotype O1 (strain ATCC 39541 / Ogawa 395 / O395)
178 507 Bordetella parapertussis
179 507 Buchnera aphidicola subsp. Baizongia pistaciae
180 507 Pseudomonas aeruginosa (strain PA7)
181 505 Bordetella pertussis
182 504 Haemophilus ducreyi
183 504 Geobacillus kaustophilus
184 503 Staphylococcus aureus (strain Newman)
185 500 Pseudomonas entomophila (strain L48)
186 498 Brucella abortus
187 497 Rickettsia conorii
188 496 Bacillus clausii (strain KSM-K16)
189 492 Haemophilus influenzae (strain 86-028NP)
190 492 Deinococcus radiodurans
191 490 Xanthomonas campestris pv. campestris (strain 8004)
192 490 Vibrio harveyi (strain ATCC BAA-1116 / BB120)
193 490 Clostridium perfringens
194 488 Bacillus amyloliquefaciens (strain FZB42)
195 487 Burkholderia pseudomallei (Pseudomonas pseudomallei)
196 487 Shewanella sp. (strain MR-7)
197 485 Aspergillus fumigatus (Sartorya fumigata)
198 484 Pseudomonas aeruginosa (strain LESB58)
199 484 Shewanella sp. (strain MR-4)
200 483 Mannheimia succiniciproducens (strain MBEL55E)
201 483 Mycoplasma genitalium
202 483 Staphylococcus aureus (strain Mu3 / ATCC 700698)
203 482 Streptomyces avermitilis
204 481 Corynebacterium glutamicum (Brevibacterium flavum)
205 480 Proteus mirabilis (strain HI4320)
206 477 Caenorhabditis briggsae
207 476 Oryza sativa subsp. indica (Rice)
208 475 Synechococcus elongatus (strain PCC 7942) (Anacystis nidulans R2)
209 474 Methanosarcina acetivorans
210 472 Burkholderia sp. (strain 383) (Burkholderia cepacia
211 472 Pseudomonas putida (strain F1 / ATCC 700007)
212 472 Brucella abortus (strain 2308)
213 472 Thermosynechococcus elongatus (strain BP-1)
214 468 Enterococcus faecalis (Streptococcus faecalis)
215 466 Acinetobacter sp. (strain ADP1)
216 465 Xanthomonas campestris pv. vesicatoria (strain 85-10)
217 465 Pseudomonas putida (strain GB-1)
218 464 Rhodopseudomonas palustris
219 464 Shewanella frigidimarina (strain NCIMB 400)
220 462 Pyrococcus horikoshii
221 462 Anabaena variabilis (strain ATCC 29413 / PCC 7937)
222 462 Shewanella sp. (strain ANA-3)
223 461 Burkholderia mallei (Pseudomonas mallei)
224 460 Ralstonia eutropha (Cupriavidus necator
225 458 Lactobacillus plantarum
226 457 Streptococcus pneumoniae (strain ATCC BAA-255 / R6)
227 457 Pyrococcus abyssi
228 457 Ralstonia eutropha (strain JMP134) (Alcaligenes eutrophus)
229 455 Aeromonas hydrophila subsp. hydrophila (strain ATCC 7966 / NCIB 9240)
230 455 Methanosarcina mazei (Methanosarcina frisia)
231 454 Staphylococcus aureus (strain JH1)
232 453 Rickettsia felis (Rickettsia azadi)
233 453 Xanthomonas oryzae pv. oryzae (strain MAFF 311018)
234 452 Shewanella baltica (strain OS185)
235 452 Pseudomonas putida (strain W619)
236 452 Halobacterium salinarium (Halobacterium halobium)
237 448 Staphylococcus aureus (strain JH9)
238 448 Thermoanaerobacter tengcongensis
239 448 Streptococcus mutans
240 447 Methylococcus capsulatus
241 447 Aeromonas salmonicida (strain A449)
242 446 Ovis aries (Sheep)
243 446 Rhodobacter sphaeroides (strain ATCC 17023 / 2.4.1 / NCIB 8253 / DSM 158)
244 445 Vibrio fischeri (strain MJ11)
245 444 Pseudomonas mendocina (strain ymp)
246 443 Hahella chejuensis (strain KCTC 2396)
247 441 Streptococcus pyogenes serotype M6
248 441 Chlamydia trachomatis
249 441 Dechloromonas aromatica (strain RCB)
250 439 Nicotiana tabacum (Common tobacco)
2.3 Taxonomic distribution of the sequences
Kingdom sequences (% of the database)
Archaea 18175 ( 4%)
Bacteria 323186 ( 63%)
Eukaryota 158574 ( 31%)
Viruses 14854 ( 3%)
Within Eukaryota:
Category sequences (% of Eukaryota) (% of the complete database)
Human 20273 ( 13%) ( 4%)
Other Mammalia 44541 ( 28%) ( 9%)
Other Vertebrata 15951 ( 10%) ( 3%)
Viridiplantae 28745 ( 18%) ( 6%)
Fungi 25082 ( 16%) ( 5%)
Insecta 7628 ( 5%) ( 1%)
Nematoda 4032 ( 3%) ( 1%)
Other 12322 ( 8%) ( 2%)
3. SEQUENCE SIZE
Repartition of the sequences by size (excluding fragments)
From To Number From To Number
1- 50 8377 1001-1100 3460
51- 100 39784 1101-1200 2390
101- 150 55709 1201-1300 1902
151- 200 55834 1301-1400 1774
201- 250 54376 1401-1500 1403
251- 300 47833 1501-1600 629
301- 350 48222 1601-1700 496
351- 400 41232 1701-1800 408
401- 450 33711 1801-1900 388
451- 500 27104 1901-2000 321
501- 550 19170 2001-2100 193
551- 600 13682 2101-2200 261
601- 650 11447 2201-2300 270
651- 700 8145 2301-2400 168
701- 750 6786 2401-2500 129
751- 800 4770 >2500 1000
801- 850 4117
851- 900 4736
901- 950 3603
951-1000 2519
The average sequence length in UniProtKB/Swiss-Prot is 351 amino acids.
The shortest sequence is GWA_SEPOF (P83570): 2 amino acids.
The longest sequence is TITIN_MOUSE (A2ASS6): 35213 amino acids.
4. JOURNAL CITATIONS
Note: the following citation statistics reflect the number of distinct
journal citations.
Total number of journals cited in this release of UniProtKB/Swiss-Prot: 2046
4.1 Table of the frequency of journal citations
Journals cited 1x: 659
2x: 283
3x: 138
4x: 105
5x: 85
6x: 62
7x: 35
8x: 38
9x: 40
10x: 23
11- 20x: 163
21- 50x: 162
51-100x: 96
>100x: 157
4.2 List of the most cited journals in UniProtKB/Swiss-Prot
Nb Citations Journal name
-- --------- -------------------------------------------------------------
1 17688 Journal of Biological Chemistry
2 8195 Proceedings of the National Academy of Sciences of the U.S.A.
3 4981 Journal of Bacteriology
4 4490 Gene
5 4468 Biochemical and Biophysical Research Communications
6 4289 Nucleic Acids Research
7 3927 FEBS Letters
8 3773 Biochemistry
9 3706 The EMBO Journal
10 3374 Molecular and Cellular Biology
11 3190 Nature
12 3082 European Journal of Biochemistry
13 2992 Journal of Molecular Biology
14 2959 Biochimica et Biophysica Acta
15 2637 Cell
16 2471 Genomics
17 2151 Biochemical Journal
18 2090 Science
19 2019 Journal of Virology
20 1741 Molecular Microbiology
21 1551 Journal of Cell Biology
22 1488 Plant Molecular Biology
23 1345 Virology
24 1344 Genes and Development
25 1303 Nature Genetics
26 1302 Molecular and General Genetics
27 1293 Human Molecular Genetics
28 1279 Plant Physiology
29 1199 The American Journal of Human Genetics
30 1166 Oncogene
31 1153 Journal of Biochemistry
32 1131 Development
33 1076 Human Mutation
34 1003 Molecular Biology of the Cell
35 999 Journal of Immunology
36 972 Genetics
37 877 Structure
38 868 Journal of General Virology
39 859 Infection and Immunity
40 838 The Plant Cell
41 812 Archives of Biochemistry and Biophysics
42 788 Blood
43 788 Molecular Cell
44 755 Yeast
45 740 Microbiology
46 715 The Plant Journal
47 714 Developmental Biology
48 710 Journal of Cell Science
49 662 Cancer Research
50 647 FEMS Microbiology Letters
51 632 Current Biology
52 590 Human Genetics
53 584 Nature Structural Biology
54 580 Mechanisms of Development
55 537 Acta Crystallographica, Section D
56 529 Protein Science
57 524 Journal of Neuroscience
58 523 Current Genetics
59 519 Applied and Environmental Microbiology
60 503 Toxicon
61 499 Journal of Clinical Investigation
62 495 Neuron
63 469 Mammalian Genome
64 449 American Journal of Physiology
65 441 Immunogenetics
66 440 The Journal of Experimental Medicine
67 435 Molecular Endocrinology
68 419 Molecular and Biochemical Parasitology
69 406 Journal of Neurochemistry
70 396 The Journal of Clinical Endocrinology and Metabolism
71 384 Endocrinology
72 376 Journal of Molecular Evolution
73 364 DNA and Cell Biology
74 354 DNA Sequence
75 351 Molecular Biology and Evolution
76 350 Bioscience, Biotechnology, and Biochemistry
77 349 Proteins
78 346 Journal of Medical Genetics
79 314 Brain Research. Molecular Brain Research
80 291 Plant and Cell Physiology
81 289 Biological Chemistry Hoppe-Seyler
82 288 Experimental Cell Research
83 287 Nature Cell Biology
84 285 Peptides
85 284 Comparative Biochemistry and Physiology
86 278 Antimicrobial Agents and Chemotherapy
87 277 Journal of Investigative Dermatology
88 274 Cytogenetics and Cell Genetics
89 267 Molecular Pharmacology
90 255 Biology of Reproduction
91 248 Tissue Antigens
92 247 Journal of General Microbiology
93 245 Genome Research
94 241 Neurology
95 239 RNA
96 236 Developmental Dynamics
97 231 Virus Research
98 230 Developmental Cell
99 215 Hoppe-Seyler's Zeitschrift fur Physiologische Chemie
100 205 DNA Research
101 204 Planta
102 203 European Journal of Immunology
103 202 Molecular Plant-Microbe Interactions
104 199 Biochimie
105 196 Annals of Neurology
106 193 European Journal of Human Genetics
107 192 Genes to Cells
108 187 Eukaryotic cell
109 181 Immunity
110 178 Journal of Human Genetics
111 173 The New England Journal of Medicine
112 170 Molecular and Cellular Endocrinology
113 166 Investigative Ophthalmology and Visual Science
114 164 Archives of Microbiology
115 163 American Journal of Medical Genetics
116 163 Molecular Phylogenetics and Evolution
117 162 Nature Structural and Molecular Biology
118 159 DNA
119 156 EMBO Reports
120 155 Insect Biochemistry and Molecular Biology
121 153 Hemoglobin
122 151 Bioorganicheskaia Khimiia
123 151 The FASEB Journal
124 148 Molecular Reproduction and Development
125 148 Diabetes
126 147 Molecular Immunology
127 147 The FEBS Journal
128 145 Archives of Virology
129 142 Glycobiology
130 141 Clinical Genetics
131 136 General and Comparative Endocrinology
132 135 Animal Genetics
133 134 Molecular Genetics and Metabolism
134 134 International Journal of Cancer
135 132 Molecular and Cellular Neuroscience
136 130 British Journal of Haematology
137 128 Journal of Cellular Biochemistry
138 124 Biological Chemistry
139 123 American Journal of Medical Genetics. Part A
140 122 Molecular Genetics and Genomics
141 120 Agricultural and Biological Chemistry
142 119 Nature Immunology
143 118 Journal of the American Chemical Society
144 117 BMC Genomics
145 117 Journal of Lipid Research
146 113 Thrombosis and Haemostasis
147 113 Circulation Research
148 113 Journal of Protein Chemistry
149 113 Proteomics
150 111 Neuroscience Letters
5. STATISTICS FOR SOME LINE TYPES
The following table summarizes the total number of some UniProtKB/Swiss-Prot lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.
Total Number of Average
Line type / subtype number entries per entry
------------------------------------ -------- --------- ---------
References (RL) 914938 1.78
Journal 722534 384425 1.40 1
Submitted to EMBL/GenBank/DDBJ 179798 166573 0.35 2
Submitted to other databases 10566 9168 0.02 3
Book citation 632 618 <0.01 4
Plant Gene Register 560 548 <0.01 5
Thesis 393 391 <0.01 6
Unpublished observations 292 288 <0.01 7
Patent 157 155 <0.01 8
Worm Breeder's Gazette 6 6 <0.01 9
Total number of distinct authors cited in UniProtKB/Swiss-Prot: 285145
Total Number of Average
Line type / subtype number entries per entry Rank
------------------------------------ -------- --------- --------- ----
Comments (CC) 2165340 4.21
ALLERGEN 460 460 <0.01 26
ALTERNATIVE PRODUCTS 18639 18639 0.04 12
BIOPHYSICOCHEMICAL PROPERTIES 2898 2898 0.01 22
BIOTECHNOLOGY 255 253 <0.01 28
CATALYTIC ACTIVITY 215848 197020 0.42 5
CAUTION 6771 6632 0.01 19
COFACTOR 97491 89512 0.19 7
DEVELOPMENTAL STAGE 8703 8703 0.02 16
DISEASE 4457 3038 0.01 20
DISRUPTION PHENOTYPE 2403 2403 <0.01 23
DOMAIN 30840 27466 0.06 10
ENZYME REGULATION 7726 7726 0.02 18
FUNCTION 381594 365603 0.74 2
INDUCTION 11438 11438 0.02 15
INTERACTION 12077 12077 0.02 14
MASS SPECTROMETRY 4208 3179 0.01 21
MISCELLANEOUS 29552 27278 0.06 11
PATHWAY 126332 115383 0.25 6
PHARMACEUTICAL 83 83 <0.01 29
POLYMORPHISM 767 737 <0.01 24
PTM 35049 28418 0.07 8
RNA EDITING 603 603 <0.01 25
SEQUENCE CAUTION 12664 12664 0.02 13
SIMILARITY 598272 490179 1.16 1
SUBCELLULAR LOCATION 295928 290916 0.57 3
SUBUNIT 219217 219217 0.43 4
TISSUE SPECIFICITY 32360 32360 0.06 9
TOXIC DOSE 413 402 <0.01 27
WEB RESOURCE 8292 6582 0.02 17
Total number of comment topics: 29
Total Number of Average
Line type / subtype number entries per entry Rank
------------------------------------ -------- --------- --------- ----
Features (FT) 3182617 6.18
ACT_SITE 127063 75718 0.25 9
BINDING 194440 55807 0.38 4
CA_BIND 3649 1477 0.01 35
CARBOHYD 95740 24536 0.19 13
CHAIN 521332 510132 1.01 1
COILED 18120 12226 0.04 26
COMPBIAS 48709 25443 0.09 18
CONFLICT 115305 40468 0.22 10
CROSSLNK 4739 3085 0.01 34
DISULFID 93951 24858 0.18 14
DNA_BIND 10860 9994 0.02 29
DOMAIN 142524 85090 0.28 6
HELIX 130331 13633 0.25 8
INIT_MET 14780 14780 0.03 27
LIPID 10514 6696 0.02 30
METAL 263421 64758 0.51 3
MOD_RES 177420 58292 0.34 5
MOTIF 31896 20520 0.06 22
MUTAGEN 29735 7105 0.06 24
NON_CONS 1542 632 <0.01 36
NON_STD 348 273 <0.01 38
NON_TER 11468 8701 0.02 28
NP_BIND 102627 67086 0.20 12
PEPTIDE 8433 5371 0.02 32
PROPEP 10325 8691 0.02 31
REGION 89607 49456 0.17 15
REPEAT 87401 12926 0.17 16
SIGNAL 33580 33570 0.07 21
SITE 36687 21703 0.07 20
STRAND 130878 12742 0.25 7
TOPO_DOM 114609 23529 0.22 11
TRANSIT 6451 6365 0.01 33
TRANSMEM 336352 68738 0.65 2
TURN 31101 10768 0.06 23
UNSURE 1105 350 <0.01 37
VAR_SEQ 38613 16582 0.08 19
VARIANT 78945 16437 0.15 17
ZN_FING 28016 12317 0.05 25
Total number of feature keys: 38
Total Number of Average
Line type / subtype number entries per entry Rank Category
------------------------------------ -------- --------- --------- ---- -------------------------------------------
Cross-references (DR) 12315772 23.92
2DBase-Ecoli 84 84 <0.01 113 2D gel databases
Aarhus/Ghent-2DPAGE 126 96 <0.01 110 2D gel databases
AGD 823 817 <0.01 87 Organism-specific databases
ANU-2DPAGE 23 23 <0.01 120 2D gel databases
ArachnoServer 462 458 <0.01 95
ArrayExpress 58025 58025 0.11 36 Gene expression databases
Bgee 37624 37623 0.07 42 Gene expression databases
BindingDB 297 297 <0.01 104 Other
BioCyc 160507 147651 0.31 19 Enzyme and pathway databases
BRENDA 65151 62355 0.13 33 Enzyme and pathway databases
BuruList 330 330 <0.01 103 Organism-specific databases
CAZy 5646 5025 0.01 63 Protein family/group databases
CGD 554 550 <0.01 92 Organism-specific databases
CleanEx 30219 29571 0.06 44 Gene expression databases
COMPLUYEAST-2DPAGE 59 59 <0.01 115 2D gel databases
Cornea-2DPAGE 67 67 <0.01 114 2D gel databases
CTD 61888 61326 0.12 35 Organism-specific databases
CYGD 6628 6522 0.01 62 Organism-specific databases
dictyBase 4252 4129 0.01 71 Organism-specific databases
DIP 10424 10319 0.02 55 Protein-protein interaction databases
DisProt 397 394 <0.01 98 3D structure databases
DOSAC-COBS-2DPAGE 150 150 <0.01 109 2D gel databases
DrugBank 5317 1626 0.01 65 Other
EchoBASE 4159 4124 0.01 73 Organism-specific databases
ECO2DBASE 351 299 <0.01 102 2D gel databases
EcoGene 4353 4350 0.01 69 Organism-specific databases
eggNOG 216353 216353 0.42 16 Phylogenomic databases
EMBL 847079 505106 1.65 3 Sequence databases
Ensembl 90026 69633 0.17 26 Genome annotation databases
euHCVdb 55 44 <0.01 116 Organism-specific databases
FlyBase 5390 5014 0.01 64 Organism-specific databases
Gene3D 235272 193221 0.46 15 Family and domain databases
GeneCards 21079 19817 0.04 48 Organism-specific databases
GeneDB_Spombe 4976 4931 0.01 67 Organism-specific databases
GeneFarm 2690 2675 0.01 79 Organism-specific databases
GeneID 466887 447919 0.91 6 Genome annotation databases
Genevestigator 64330 64330 0.12 34 Gene expression databases
GenomeReviews 350729 334967 0.68 11 Genome annotation databases
GermOnline 41925 41318 0.08 41 Gene expression databases
GlycoSuiteDB 280 280 <0.01 105 PTM databases
GO 2169903 481255 4.22 1 Ontologies
Gramene 4287 4287 0.01 70 Organism-specific databases
H-InvDB 11249 9556 0.02 54 Organism-specific databases
HAMAP 307272 307129 0.60 13 Family and domain databases
HGNC 19532 19360 0.04 49 Organism-specific databases
HOGENOM 359172 359172 0.70 9 Phylogenomic databases
HOVERGEN 75054 75054 0.15 29 Phylogenomic databases
HPA 8705 6563 0.02 57 Organism-specific databases
HSC-2DPAGE 85 85 <0.01 112 2D gel databases
HSSP 28864 28864 0.06 45 3D structure databases
InParanoid 65670 65670 0.13 30 Phylogenomic databases
IntAct 21370 21370 0.04 47 Protein-protein interaction databases
InterPro 1577615 485356 3.06 2 Family and domain databases
IPI 88191 63273 0.17 27 Sequence databases
KEGG 438648 416909 0.85 8 Genome annotation databases
LegioList 760 758 <0.01 88 Organism-specific databases
Leproma 664 661 <0.01 91 Organism-specific databases
ListiList 1185 1177 <0.01 84 Organism-specific databases
MaizeGDB 472 467 <0.01 94 Organism-specific databases
MEROPS 8469 8210 0.02 58 Protein family/group databases
MGI 16096 16045 0.03 51 Organism-specific databases
MIM 15816 12443 0.03 52 Organism-specific databases
MypuList 203 203 <0.01 108 Organism-specific databases
NextBio 48668 48668 0.09 39 Other
NMPDR 130022 130018 0.25 22 Genome annotation databases
OGP 377 377 <0.01 100 2D gel databases
OMA 352998 352998 0.69 10 Phylogenomic databases
Orphanet 3675 2132 0.01 76 Organism-specific databases
OrthoDB 55299 55299 0.11 37 Phylogenomic databases
PANTHER 184759 169579 0.36 18 Family and domain databases
Pathway_Interaction_DB 4567 1665 0.01 68 Enzyme and pathway databases
PDB 65533 15408 0.13 32 3D structure databases
PDBsum 65533 15408 0.13 31 3D structure databases
PeptideAtlas 5168 5168 0.01 66 Proteomic databases
PeroxiBase 676 664 <0.01 90 Protein family/group databases
Pfam 656238 463935 1.27 4 Family and domain databases
PharmGKB 15813 15802 0.03 53 Organism-specific databases
PHCI-2DPAGE 244 244 <0.01 107 2D gel databases
PhosphoSite 19298 19298 0.04 50 PTM databases
PhosSite 267 267 <0.01 106 PTM databases
PhotoList 738 738 <0.01 89 Organism-specific databases
PhylomeDB 121107 121107 0.24 23 Phylogenomic databases
PIR 114946 104996 0.22 24 Sequence databases
PIRSF 80000 80000 0.16 28 Family and domain databases
PMAP-CutDB 1394 1394 <0.01 82 Other
PMMA-2DPAGE 52 52 <0.01 117 2D gel databases
PptaseDB 34 34 <0.01 118 Protein family/group databases
PRIDE 53372 53372 0.10 38 Proteomic databases
PRINTS 136260 117842 0.26 21 Family and domain databases
ProDom 27769 27440 0.05 46 Family and domain databases
ProMEX 438 438 <0.01 96 Proteomic databases
PROSITE 456163 290831 0.89 7 Family and domain databases
PseudoCAP 1212 1203 <0.01 83 Organism-specific databases
Rat-heart-2DPAGE 28 28 <0.01 119 2D gel databases
Reactome 7333 4259 0.01 60 Enzyme and pathway databases
REBASE 374 353 <0.01 101 Protein family/group databases
RefSeq 487288 448193 0.95 5 Sequence databases
REPRODUCTION-2DPAGE 1030 942 <0.01 86 2D gel databases
RGD 7360 7356 0.01 59 Organism-specific databases
SagaList 389 388 <0.01 99 Organism-specific databases
SGD 6640 6537 0.01 61 Organism-specific databases
Siena-2DPAGE 102 102 <0.01 111 2D gel databases
SMART 141673 109353 0.28 20 Family and domain databases
SMR 345633 345633 0.67 12 3D structure databases
STRING 203510 203501 0.40 17 Protein-protein interaction databases
SubtiList 4192 4183 0.01 72 Organism-specific databases
SWISS-2DPAGE 1182 1182 <0.01 85 2D gel databases
TAIR 8930 8818 0.02 56 Organism-specific databases
TCDB 3290 3249 0.01 78 Protein family/group databases
TIGR 33906 33139 0.07 43 Genome annotation databases
TIGRFAMs 279338 260622 0.54 14 Family and domain databases
TubercuList 1584 1548 <0.01 81 Organism-specific databases
UCSC 48461 39488 0.09 40 Genome annotation databases
UniGene 91718 80830 0.18 25 Sequence databases
VectorBase 403 389 <0.01 97 Genome annotation databases
World-2DPAGE 507 507 <0.01 93 2D gel databases
WormBase 3812 3727 0.01 75 Organism-specific databases
WormPep 4051 3272 0.01 74 Organism-specific databases
Xenbase 3642 3569 0.01 77 Organism-specific databases
ZFIN 2507 2496 <0.01 80 Organism-specific databases
Total number of cross-referenced databases: 120
6. AMINO ACID COMPOSITION
6.1 Composition in percent for the complete database
Ala (A) 8.28 Gln (Q) 3.94 Leu (L) 9.67 Ser (S) 6.50
Arg (R) 5.54 Glu (E) 6.77 Lys (K) 5.85 Thr (T) 5.32
Asn (N) 4.05 Gly (G) 7.09 Met (M) 2.43 Trp (W) 1.07
Asp (D) 5.45 His (H) 2.27 Phe (F) 3.86 Tyr (Y) 2.91
Cys (C) 1.35 Ile (I) 5.99 Pro (P) 4.68 Val (V) 6.88
Asx (B) 0.000 Glx (Z) 0.000 Xaa (X) 0.00
Legend: gray = aliphatic, red = acidic, green = small hydroxy,
blue = basic, black = aromatic, white = amide, yellow = sulfur
6.2 Classification of the amino acids by their frequency
Leu, Ala, Gly, Val, Glu, Ser, Ile, Lys, Arg, Asp, Thr, Pro, Asn, Gln,
Phe, Tyr, Met, His, Cys, Trp
7. MISCELLANEOUS STATISTICS
4446 entries are encoded on a mitochondrion, and 3549 are encoded on a plasmid.
12174 entries are encoded on a plastid,
of which 21 are encoded on apicoplasts,
11616 on chloroplasts,
44 on organellar chromatophores,
145 on cyanelles,
149 on non-photosynthetic plastids and
199 on unspecified types of plastid.
Number of entries with at least one sequence correction: 68295