Search for
You are here: ExPASy CH  > Databases  > Around UniProtKB

         UniProtKB/Swiss-Prot protein knowledgebase release 57.14 statistics


1.  INTRODUCTION

Release 57.14 of 09-Feb-10 of UniProtKB/Swiss-Prot contains 514789 sequence entries,
comprising 181163771 amino acids abstracted from 186824 references. 

668 sequences have been added since release 57.13, the sequence data of
56 existing entries has been updated and the annotations of
316460 entries have been revised.

Number of fragments: 8440
Number of additional sequences produced by alternative splicing, initiation or promoter usage, or ribosomal frameshifting: 28833


Protein existence (PE):           entries     %

1: Evidence at protein level        68292   13.3%
2: Evidence at transcript level     66408   12.9%
3: Inferred from homology          364228   70.8%
4: Predicted                        14329    2.8%
5: Uncertain                         1532    0.3%

The growth of the database is summarized below.

   


2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/Swiss-Prot: 12037

   The first twenty species represent 107231 sequences:  20.8 % of the total
   number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x: 5235
                            2x: 1698
                            3x:  897
                            4x:  571
                            5x:  421
                            6x:  346
                            7x:  246
                            8x:  205
                            9x:  183
                           10x:  105
                       11- 20x:  574
                       21- 50x:  368
                       51-100x:  176
                         >100x: 1012


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1      20272  Homo sapiens (Human)
       2      16216  Mus musculus (Mouse)
       3       8847  Arabidopsis thaliana (Mouse-ear cress)
       4       7476  Rattus norvegicus (Rat)
       5       6552  Saccharomyces cerevisiae (Baker's yeast)
       6       5743  Bos taurus (Bovine)
       7       4974  Schizosaccharomyces pombe (Fission yeast)
       8       4367  Escherichia coli (strain K12)
       9       4249  Bacillus subtilis
      10       4129  Dictyostelium discoideum (Slime mold)
      11       3281  Caenorhabditis elegans
      12       3205  Xenopus laevis (African clawed frog)
      13       3052  Drosophila melanogaster (Fruit fly)
      14       2598  Danio rerio (Zebrafish) (Brachydanio rerio)
      15       2365  Oryza sativa subsp. japonica (Rice)
      16       2206  Pongo abelii (Sumatran orangutan)
      17       2151  Gallus gallus (Chicken)
      18       1993  Escherichia coli O157:H7
      19       1782  Methanocaldococcus jannaschii (Methanococcus jannaschii)
      20       1773  Haemophilus influenzae
      21       1757  Salmonella typhimurium
      22       1668  Escherichia coli O6
      23       1665  Shigella flexneri
      24       1558  Mycobacterium tuberculosis
      25       1512  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      26       1361  Sus scrofa (Pig)
      27       1341  Salmonella typhi
      28       1273  Pseudomonas aeruginosa
      29       1213  Mycobacterium bovis
      30       1159  Macaca fascicularis (Crab eating macaque) (Cynomolgus monkey)
      31       1015  Synechocystis sp. (strain PCC 6803)
      32        995  Yersinia pestis
      33        991  Archaeoglobus fulgidus
      34        940  Vibrio cholerae
      35        929  Salmonella paratyphi A
      36        922  Staphylococcus aureus (strain N315)
      37        922  Staphylococcus aureus (strain Mu50 / ATCC 700699)
      38        912  Rhizobium meliloti (Sinorhizobium meliloti)
      39        909  Acanthamoeba polyphaga mimivirus (APMV)
      40        896  Staphylococcus aureus (strain COL)
      41        894  Staphylococcus aureus (strain MW2)
      42        888  Staphylococcus aureus (strain MSSA476)
      43        885  Staphylococcus aureus (strain MRSA252)
      44        881  Oryctolagus cuniculus (Rabbit)
      45        879  Escherichia coli O6:K15:H31 (strain 536 / UPEC)
      46        879  Salmonella choleraesuis
      47        869  Shigella sonnei (strain Ss046)
      48        863  Yersinia pseudotuberculosis
      49        835  Escherichia coli O9:H4 (strain HS)
      50        829  Escherichia coli O139:H28 (strain E24377A / ETEC)
      51        824  Shigella boydii serotype 4 (strain Sb227)
      52        818  Escherichia coli (strain UTI89 / UPEC)
      53        817  Ashbya gossypii (Yeast) (Eremothecium gossypii)
      54        814  Escherichia coli (strain ATCC 8739 / DSM 1576 / Crooks)
      55        800  Shigella dysenteriae serotype 1 (strain Sd197)
      56        795  Candida albicans (Yeast)
      57        794  Vibrio parahaemolyticus
      58        789  Kluyveromyces lactis (Yeast) (Candida sphaerica)
      59        785  Escherichia coli (strain SMS-3-5 / SECEC)
      60        778  Erwinia carotovora subsp. atroseptica (Pectobacterium atrosepticum)
      61        776  Pasteurella multocida
      62        773  Aquifex aeolicus
      63        771  Neurospora crassa
      64        765  Escherichia coli (strain K12 / DH10B)
      65        764  Canis familiaris (Dog)
      66        759  Escherichia coli O127:H6 (strain E2348/69 / EPEC)
      67        759  Escherichia coli (strain K12 / BW2952)
      68        757  Escherichia coli (strain 55989 / EAEC)
      69        757  Staphylococcus epidermidis (strain ATCC 35984 / RP62A)
      70        756  Escherichia coli O17:K52:H18 (strain UMN026 / ExPEC)
      71        756  Escherichia coli O8 (strain IAI1)
      72        756  Staphylococcus epidermidis (strain ATCC 12228)
      73        750  Escherichia coli (strain SE11)
      74        750  Shigella flexneri serotype 5b (strain 8401)
      75        750  Escherichia coli O45:K1 (strain S88 / ExPEC)
      76        748  Escherichia coli O7:K1 (strain IAI39 / ExPEC)
      77        747  Candida glabrata (Yeast) (Torulopsis glabrata)
      78        742  Escherichia coli O157:H7 (strain EC4115 / EHEC)
      79        738  Streptomyces coelicolor
      80        738  Photorhabdus luminescens subsp. laumondii
      81        731  Vibrio vulnificus
      82        730  Bacillus halodurans
      83        726  Escherichia coli O81 (strain ED1a)
      84        722  Bacillus anthracis
      85        722  Yersinia enterocolitica serotype O:8 / biotype 1B (strain 8081)
      86        719  Salmonella enteritidis PT4 (strain P125109)
      87        715  Vibrio vulnificus (strain YJ016)
      88        715  Salmonella paratyphi B (strain ATCC BAA-1250 / SPB7)
      89        713  Yersinia pestis bv. Antiqua (strain Nepal516)
      90        713  Salmonella paratyphi A (strain AKU_12601)
      91        712  Yersinia pseudotuberculosis serotype O:1b (strain IP 31758)
      92        711  Staphylococcus aureus (strain NCTC 8325)
      93        710  Salmonella newport (strain SL254)
      94        709  Salmonella heidelberg (strain SL476)
      95        709  Yersinia pestis bv. Antiqua (strain Antiqua)
      96        709  Salmonella agona (strain SL483)
      97        708  Salmonella schwarzengrund (strain CVM19633)
      98        705  Escherichia coli O1:K1 / APEC
      99        699  Salmonella dublin (strain CT_02021853)
     100        697  Enterobacter sp. (strain 638)
     101        696  Klebsiella pneumoniae subsp. pneumoniae (strain ATCC 700721 / MGH 78578)
     102        696  Shigella boydii serotype 18 (strain CDC 3083-94 / BS512)
     103        687  Mycoplasma pneumoniae
     104        685  Escherichia fergusonii (strain ATCC 35469 / DSM 13698 / CDC 0568-73)
     105        684  Pseudomonas syringae pv. tomato
     106        683  Pan troglodytes (Chimpanzee)
     107        682  Salmonella gallinarum (strain 287/91 / NCTC 13346)
     108        682  Klebsiella pneumoniae (strain 342)
     109        676  Anabaena sp. (strain PCC 7120)
     110        670  Pseudomonas putida (strain KT2440)
     111        666  Yersinia pestis (strain Pestoides F)
     112        665  Staphylococcus aureus (strain USA300)
     113        664  Citrobacter koseri (strain ATCC BAA-895 / CDC 4225-83 / SGSC4696)
     114        661  Mycobacterium leprae
     115        658  Rhizobium sp. (strain NGR234)
     116        653  Serratia proteamaculans (strain 568)
     117        651  Zea mays (Maize)
     118        645  Escherichia coli
     119        645  Bradyrhizobium japonicum
     120        641  Staphylococcus aureus (strain bovine RF122 / ET3-1)
     121        638  Bacillus cereus (strain ATCC 14579 / DSM 31)
     122        637  Yersinia pseudotuberculosis serotype O:3 (strain YPIII)
     123        634  Salmonella arizonae (strain ATCC BAA-731 / CDC346-86 / RSK2980)
     124        633  Yersinia pseudotuberculosis serotype IB (strain PB1/+)
     125        620  Shewanella oneidensis
     126        617  Agrobacterium tumefaciens (strain C58 / ATCC 33970)
     127        615  Treponema pallidum
     128        613  Ralstonia solanacearum (Pseudomonas solanacearum)
     129        608  Staphylococcus haemolyticus (strain JCSC1435)
     130        608  Enterobacter sakazakii (strain ATCC BAA-894)
     131        602  Rhizobium loti (Mesorhizobium loti)
     132        602  Staphylococcus saprophyticus subsp. saprophyticus 
     133        600  Methanobacterium thermoautotrophicum
     134        598  Yersinia pestis bv. Antiqua (strain Angola)
     135        598  Salmonella paratyphi C (strain RKS4594)
     136        598  Emericella nidulans (Aspergillus nidulans)
     137        596  Listeria monocytogenes
     138        595  Photobacterium profundum (Photobacterium sp. (strain SS9))
     139        593  Debaryomyces hansenii (Yeast) (Torulaspora hansenii)
     140        592  Yarrowia lipolytica (Candida lipolytica)
     141        590  Bacillus cereus (strain ATCC 10987)
     142        589  Xanthomonas campestris pv. campestris
     143        588  Listeria innocua
     144        585  Rickettsia prowazekii
     145        584  Helicobacter pylori (Campylobacter pylori)
     146        582  Pectobacterium carotovorum subsp. carotovorum (strain PC1)
     147        581  Lactococcus lactis subsp. lactis (Streptococcus lactis)
     148        579  Neisseria meningitidis serogroup B
     149        576  Brucella suis
     150        572  Brucella melitensis
     151        572  Buchnera aphidicola subsp. Acyrthosiphon pisum 
     152        567  Bacillus thuringiensis subsp. konkukian
     153        565  Helicobacter pylori J99 (Campylobacter pylori J99)
     154        562  Buchnera aphidicola subsp. Schizaphis graminum
     155        560  Bacillus cereus (strain ZK / E33L)
     156        560  Pseudomonas syringae pv. syringae (strain B728a)
     157        557  Pseudomonas aeruginosa (strain UCBPP-PA14)
     158        556  Neisseria meningitidis serogroup A
     159        555  Bacillus licheniformis (strain DSM 13 / ATCC 14580)
     160        555  Xanthomonas axonopodis pv. citri (Citrus canker)
     161        553  Vibrio fischeri (strain ATCC 700601 / ES114)
     162        551  Pseudomonas fluorescens (strain Pf0-1)
     163        549  Oceanobacillus iheyensis
     164        545  Caulobacter crescentus (Caulobacter vibrioides)
     165        545  Clostridium acetobutylicum
     166        545  Pseudomonas fluorescens (strain Pf-5 / ATCC BAA-477)
     167        538  Pseudomonas syringae pv. phaseolicola (strain 1448A / Race 6)
     168        529  Listeria monocytogenes serotype 4b (strain F2365)
     169        523  Erwinia tasmaniensis (strain DSM 17950 / Et1/99)
     170        522  Sodalis glossinidius (strain morsitans)
     171        521  Bordetella bronchiseptica (Alcaligenes bronchisepticus)
     172        521  Xylella fastidiosa
     173        519  Streptococcus pneumoniae
     174        512  Xylella fastidiosa (strain Temecula1 / ATCC 700964)
     175        510  Chromobacterium violaceum
     176        509  Thermotoga maritima
     177        509  Vibrio cholerae serotype O1 (strain ATCC 39541 / Ogawa 395 / O395)
     178        507  Bordetella parapertussis
     179        507  Buchnera aphidicola subsp. Baizongia pistaciae
     180        507  Pseudomonas aeruginosa (strain PA7)
     181        505  Bordetella pertussis
     182        504  Haemophilus ducreyi
     183        504  Geobacillus kaustophilus
     184        503  Staphylococcus aureus (strain Newman)
     185        500  Pseudomonas entomophila (strain L48)
     186        498  Brucella abortus
     187        497  Rickettsia conorii
     188        496  Bacillus clausii (strain KSM-K16)
     189        492  Haemophilus influenzae (strain 86-028NP)
     190        492  Deinococcus radiodurans
     191        490  Xanthomonas campestris pv. campestris (strain 8004)
     192        490  Vibrio harveyi (strain ATCC BAA-1116 / BB120)
     193        490  Clostridium perfringens
     194        488  Bacillus amyloliquefaciens (strain FZB42)
     195        487  Burkholderia pseudomallei (Pseudomonas pseudomallei)
     196        487  Shewanella sp. (strain MR-7)
     197        485  Aspergillus fumigatus (Sartorya fumigata)
     198        484  Pseudomonas aeruginosa (strain LESB58)
     199        484  Shewanella sp. (strain MR-4)
     200        483  Mannheimia succiniciproducens (strain MBEL55E)
     201        483  Mycoplasma genitalium
     202        483  Staphylococcus aureus (strain Mu3 / ATCC 700698)
     203        482  Streptomyces avermitilis
     204        481  Corynebacterium glutamicum (Brevibacterium flavum)
     205        480  Proteus mirabilis (strain HI4320)
     206        477  Caenorhabditis briggsae
     207        476  Oryza sativa subsp. indica (Rice)
     208        475  Synechococcus elongatus (strain PCC 7942) (Anacystis nidulans R2)
     209        474  Methanosarcina acetivorans
     210        472  Burkholderia sp. (strain 383) (Burkholderia cepacia 
     211        472  Pseudomonas putida (strain F1 / ATCC 700007)
     212        472  Brucella abortus (strain 2308)
     213        472  Thermosynechococcus elongatus (strain BP-1)
     214        468  Enterococcus faecalis (Streptococcus faecalis)
     215        466  Acinetobacter sp. (strain ADP1)
     216        465  Xanthomonas campestris pv. vesicatoria (strain 85-10)
     217        465  Pseudomonas putida (strain GB-1)
     218        464  Rhodopseudomonas palustris
     219        464  Shewanella frigidimarina (strain NCIMB 400)
     220        462  Pyrococcus horikoshii
     221        462  Anabaena variabilis (strain ATCC 29413 / PCC 7937)
     222        462  Shewanella sp. (strain ANA-3)
     223        461  Burkholderia mallei (Pseudomonas mallei)
     224        460  Ralstonia eutropha  (Cupriavidus necator 
     225        458  Lactobacillus plantarum
     226        457  Streptococcus pneumoniae (strain ATCC BAA-255 / R6)
     227        457  Pyrococcus abyssi
     228        457  Ralstonia eutropha (strain JMP134) (Alcaligenes eutrophus)
     229        455  Aeromonas hydrophila subsp. hydrophila (strain ATCC 7966 / NCIB 9240)
     230        455  Methanosarcina mazei (Methanosarcina frisia)
     231        454  Staphylococcus aureus (strain JH1)
     232        453  Rickettsia felis (Rickettsia azadi)
     233        453  Xanthomonas oryzae pv. oryzae (strain MAFF 311018)
     234        452  Shewanella baltica (strain OS185)
     235        452  Pseudomonas putida (strain W619)
     236        452  Halobacterium salinarium (Halobacterium halobium)
     237        448  Staphylococcus aureus (strain JH9)
     238        448  Thermoanaerobacter tengcongensis
     239        448  Streptococcus mutans
     240        447  Methylococcus capsulatus
     241        447  Aeromonas salmonicida (strain A449)
     242        446  Ovis aries (Sheep)
     243        446  Rhodobacter sphaeroides (strain ATCC 17023 / 2.4.1 / NCIB 8253 / DSM 158)
     244        445  Vibrio fischeri (strain MJ11)
     245        444  Pseudomonas mendocina (strain ymp)
     246        443  Hahella chejuensis (strain KCTC 2396)
     247        441  Streptococcus pyogenes serotype M6
     248        441  Chlamydia trachomatis
     249        441  Dechloromonas aromatica (strain RCB)
     250        439  Nicotiana tabacum (Common tobacco)


   
   2.3  Taxonomic distribution of the sequences

   

   Kingdom        sequences (% of the database)
    Archaea           18175 (  4%)
    Bacteria         323186 ( 63%)
    Eukaryota        158574 ( 31%)
    Viruses           14854 (  3%)


   Within Eukaryota:

   

    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                  20273 ( 13%)           (  4%)
     Other Mammalia         44541 ( 28%)           (  9%)
     Other Vertebrata       15951 ( 10%)           (  3%)
     Viridiplantae          28745 ( 18%)           (  6%)
     Fungi                  25082 ( 16%)           (  5%)
     Insecta                 7628 (  5%)           (  1%)
     Nematoda                4032 (  3%)           (  1%)
     Other                  12322 (  8%)           (  2%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50    8377             1001-1100     3460
                 51- 100   39784             1101-1200     2390
                101- 150   55709             1201-1300     1902
                151- 200   55834             1301-1400     1774
                201- 250   54376             1401-1500     1403
                251- 300   47833             1501-1600      629
                301- 350   48222             1601-1700      496
                351- 400   41232             1701-1800      408
                401- 450   33711             1801-1900      388
                451- 500   27104             1901-2000      321
                501- 550   19170             2001-2100      193
                551- 600   13682             2101-2200      261
                601- 650   11447             2201-2300      270
                651- 700    8145             2301-2400      168
                701- 750    6786             2401-2500      129
                751- 800    4770             >2500         1000
                801- 850    4117
                851- 900    4736
                901- 950    3603
                951-1000    2519

   


   The average sequence length in UniProtKB/Swiss-Prot is 351 amino acids.

   The shortest sequence is   GWA_SEPOF (P83570):     2 amino acids.
   The longest sequence is  TITIN_MOUSE (A2ASS6): 35213 amino acids.


4.  JOURNAL CITATIONS

   Note: the following citation statistics reflect the number of distinct
         journal citations.

   Total number of journals cited in this release of UniProtKB/Swiss-Prot: 2046


   4.1 Table of the frequency of journal citations

        Journals cited 1x:  659
                       2x:  283
                       3x:  138
                       4x:  105
                       5x:   85
                       6x:   62
                       7x:   35
                       8x:   38
                       9x:   40
                      10x:   23
                  11- 20x:  163
                  21- 50x:  162
                  51-100x:   96
                    >100x:  157


   4.2  List of the most cited journals in UniProtKB/Swiss-Prot

   Nb    Citations   Journal name
   --    ---------   -------------------------------------------------------------
    1        17688   Journal of Biological Chemistry
    2         8195   Proceedings of the National Academy of Sciences of the U.S.A.
    3         4981   Journal of Bacteriology
    4         4490   Gene
    5         4468   Biochemical and Biophysical Research Communications
    6         4289   Nucleic Acids Research
    7         3927   FEBS Letters
    8         3773   Biochemistry
    9         3706   The EMBO Journal
   10         3374   Molecular and Cellular Biology
   11         3190   Nature
   12         3082   European Journal of Biochemistry
   13         2992   Journal of Molecular Biology
   14         2959   Biochimica et Biophysica Acta
   15         2637   Cell
   16         2471   Genomics
   17         2151   Biochemical Journal
   18         2090   Science
   19         2019   Journal of Virology
   20         1741   Molecular Microbiology
   21         1551   Journal of Cell Biology
   22         1488   Plant Molecular Biology
   23         1345   Virology
   24         1344   Genes and Development
   25         1303   Nature Genetics
   26         1302   Molecular and General Genetics
   27         1293   Human Molecular Genetics
   28         1279   Plant Physiology
   29         1199   The American Journal of Human Genetics
   30         1166   Oncogene
   31         1153   Journal of Biochemistry
   32         1131   Development
   33         1076   Human Mutation
   34         1003   Molecular Biology of the Cell
   35          999   Journal of Immunology
   36          972   Genetics
   37          877   Structure
   38          868   Journal of General Virology
   39          859   Infection and Immunity
   40          838   The Plant Cell
   41          812   Archives of Biochemistry and Biophysics
   42          788   Blood
   43          788   Molecular Cell
   44          755   Yeast
   45          740   Microbiology
   46          715   The Plant Journal
   47          714   Developmental Biology
   48          710   Journal of Cell Science
   49          662   Cancer Research
   50          647   FEMS Microbiology Letters
   51          632   Current Biology
   52          590   Human Genetics
   53          584   Nature Structural Biology
   54          580   Mechanisms of Development
   55          537   Acta Crystallographica, Section D
   56          529   Protein Science
   57          524   Journal of Neuroscience
   58          523   Current Genetics
   59          519   Applied and Environmental Microbiology
   60          503   Toxicon
   61          499   Journal of Clinical Investigation
   62          495   Neuron
   63          469   Mammalian Genome
   64          449   American Journal of Physiology
   65          441   Immunogenetics
   66          440   The Journal of Experimental Medicine
   67          435   Molecular Endocrinology
   68          419   Molecular and Biochemical Parasitology
   69          406   Journal of Neurochemistry
   70          396   The Journal of Clinical Endocrinology and Metabolism
   71          384   Endocrinology
   72          376   Journal of Molecular Evolution
   73          364   DNA and Cell Biology
   74          354   DNA Sequence
   75          351   Molecular Biology and Evolution
   76          350   Bioscience, Biotechnology, and Biochemistry
   77          349   Proteins
   78          346   Journal of Medical Genetics
   79          314   Brain Research. Molecular Brain Research
   80          291   Plant and Cell Physiology
   81          289   Biological Chemistry Hoppe-Seyler
   82          288   Experimental Cell Research
   83          287   Nature Cell Biology
   84          285   Peptides
   85          284   Comparative Biochemistry and Physiology
   86          278   Antimicrobial Agents and Chemotherapy
   87          277   Journal of Investigative Dermatology
   88          274   Cytogenetics and Cell Genetics
   89          267   Molecular Pharmacology
   90          255   Biology of Reproduction
   91          248   Tissue Antigens
   92          247   Journal of General Microbiology
   93          245   Genome Research
   94          241   Neurology
   95          239   RNA
   96          236   Developmental Dynamics
   97          231   Virus Research
   98          230   Developmental Cell
   99          215   Hoppe-Seyler's Zeitschrift fur Physiologische Chemie
  100          205   DNA Research
  101          204   Planta
  102          203   European Journal of Immunology
  103          202   Molecular Plant-Microbe Interactions
  104          199   Biochimie
  105          196   Annals of Neurology
  106          193   European Journal of Human Genetics
  107          192   Genes to Cells
  108          187   Eukaryotic cell
  109          181   Immunity
  110          178   Journal of Human Genetics
  111          173   The New England Journal of Medicine
  112          170   Molecular and Cellular Endocrinology
  113          166   Investigative Ophthalmology and Visual Science
  114          164   Archives of Microbiology
  115          163   American Journal of Medical Genetics
  116          163   Molecular Phylogenetics and Evolution
  117          162   Nature Structural and Molecular Biology
  118          159   DNA
  119          156   EMBO Reports
  120          155   Insect Biochemistry and Molecular Biology
  121          153   Hemoglobin
  122          151   Bioorganicheskaia Khimiia
  123          151   The FASEB Journal
  124          148   Molecular Reproduction and Development
  125          148   Diabetes
  126          147   Molecular Immunology
  127          147   The FEBS Journal
  128          145   Archives of Virology
  129          142   Glycobiology
  130          141   Clinical Genetics
  131          136   General and Comparative Endocrinology
  132          135   Animal Genetics
  133          134   Molecular Genetics and Metabolism
  134          134   International Journal of Cancer
  135          132   Molecular and Cellular Neuroscience
  136          130   British Journal of Haematology
  137          128   Journal of Cellular Biochemistry
  138          124   Biological Chemistry
  139          123   American Journal of Medical Genetics. Part A
  140          122   Molecular Genetics and Genomics
  141          120   Agricultural and Biological Chemistry
  142          119   Nature Immunology
  143          118   Journal of the American Chemical Society
  144          117   BMC Genomics
  145          117   Journal of Lipid Research
  146          113   Thrombosis and Haemostasis
  147          113   Circulation Research
  148          113   Journal of Protein Chemistry
  149          113   Proteomics
  150          111   Neuroscience Letters


5.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/Swiss-Prot lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                      Total    Number of  Average
   Line type / subtype                number   entries    per entry
------------------------------------  -------- ---------  ---------

References (RL)                       914938                 1.78                                         
   Journal                            722534     384425      1.40       1                                 
   Submitted to EMBL/GenBank/DDBJ     179798     166573      0.35       2                                 
   Submitted to other databases        10566       9168      0.02       3                                 
   Book citation                         632        618     <0.01       4                                 
   Plant Gene Register                   560        548     <0.01       5                                 
   Thesis                                393        391     <0.01       6                                 
   Unpublished observations              292        288     <0.01       7                                 
   Patent                                157        155     <0.01       8                                 
   Worm Breeder's Gazette                  6          6     <0.01       9                                 

Total number of distinct authors cited in UniProtKB/Swiss-Prot: 285145

                                      Total    Number of  Average
   Line type / subtype                number   entries    per entry  Rank
------------------------------------  -------- ---------  ---------  ----
Comments (CC)                        2165340                 4.21                                         
   ALLERGEN                              460        460     <0.01      26                                 
   ALTERNATIVE PRODUCTS                18639      18639      0.04      12                                 
   BIOPHYSICOCHEMICAL PROPERTIES        2898       2898      0.01      22                                 
   BIOTECHNOLOGY                         255        253     <0.01      28                                 
   CATALYTIC ACTIVITY                 215848     197020      0.42       5                                 
   CAUTION                              6771       6632      0.01      19                                 
   COFACTOR                            97491      89512      0.19       7                                 
   DEVELOPMENTAL STAGE                  8703       8703      0.02      16                                 
   DISEASE                              4457       3038      0.01      20                                 
   DISRUPTION PHENOTYPE                 2403       2403     <0.01      23                                 
   DOMAIN                              30840      27466      0.06      10                                 
   ENZYME REGULATION                    7726       7726      0.02      18                                 
   FUNCTION                           381594     365603      0.74       2                                 
   INDUCTION                           11438      11438      0.02      15                                 
   INTERACTION                         12077      12077      0.02      14                                 
   MASS SPECTROMETRY                    4208       3179      0.01      21                                 
   MISCELLANEOUS                       29552      27278      0.06      11                                 
   PATHWAY                            126332     115383      0.25       6                                 
   PHARMACEUTICAL                         83         83     <0.01      29                                 
   POLYMORPHISM                          767        737     <0.01      24                                 
   PTM                                 35049      28418      0.07       8                                 
   RNA EDITING                           603        603     <0.01      25                                 
   SEQUENCE CAUTION                    12664      12664      0.02      13                                 
   SIMILARITY                         598272     490179      1.16       1                                 
   SUBCELLULAR LOCATION               295928     290916      0.57       3                                 
   SUBUNIT                            219217     219217      0.43       4                                 
   TISSUE SPECIFICITY                  32360      32360      0.06       9                                 
   TOXIC DOSE                            413        402     <0.01      27                                 
   WEB RESOURCE                         8292       6582      0.02      17                                 

Total number of comment topics: 29


                                      Total    Number of  Average
   Line type / subtype                number   entries    per entry  Rank
------------------------------------  -------- ---------  ---------  ----
Features (FT)                        3182617                 6.18                                         
   ACT_SITE                           127063      75718      0.25       9                                 
   BINDING                            194440      55807      0.38       4                                 
   CA_BIND                              3649       1477      0.01      35                                 
   CARBOHYD                            95740      24536      0.19      13                                 
   CHAIN                              521332     510132      1.01       1                                 
   COILED                              18120      12226      0.04      26                                 
   COMPBIAS                            48709      25443      0.09      18                                 
   CONFLICT                           115305      40468      0.22      10                                 
   CROSSLNK                             4739       3085      0.01      34                                 
   DISULFID                            93951      24858      0.18      14                                 
   DNA_BIND                            10860       9994      0.02      29                                 
   DOMAIN                             142524      85090      0.28       6                                 
   HELIX                              130331      13633      0.25       8                                 
   INIT_MET                            14780      14780      0.03      27                                 
   LIPID                               10514       6696      0.02      30                                 
   METAL                              263421      64758      0.51       3                                 
   MOD_RES                            177420      58292      0.34       5                                 
   MOTIF                               31896      20520      0.06      22                                 
   MUTAGEN                             29735       7105      0.06      24                                 
   NON_CONS                             1542        632     <0.01      36                                 
   NON_STD                               348        273     <0.01      38                                 
   NON_TER                             11468       8701      0.02      28                                 
   NP_BIND                            102627      67086      0.20      12                                 
   PEPTIDE                              8433       5371      0.02      32                                 
   PROPEP                              10325       8691      0.02      31                                 
   REGION                              89607      49456      0.17      15                                 
   REPEAT                              87401      12926      0.17      16                                 
   SIGNAL                              33580      33570      0.07      21                                 
   SITE                                36687      21703      0.07      20                                 
   STRAND                             130878      12742      0.25       7                                 
   TOPO_DOM                           114609      23529      0.22      11                                 
   TRANSIT                              6451       6365      0.01      33                                 
   TRANSMEM                           336352      68738      0.65       2                                 
   TURN                                31101      10768      0.06      23                                 
   UNSURE                               1105        350     <0.01      37                                 
   VAR_SEQ                             38613      16582      0.08      19                                 
   VARIANT                             78945      16437      0.15      17                                 
   ZN_FING                             28016      12317      0.05      25                                 

Total number of feature keys: 38



                                      Total    Number of  Average
   Line type / subtype                number   entries    per entry  Rank      Category
------------------------------------  -------- ---------  ---------  ----      -------------------------------------------
Cross-references (DR)               12315772                23.92                                                           
   2DBase-Ecoli                           84         84     <0.01     113      2D gel databases                             
   Aarhus/Ghent-2DPAGE                   126         96     <0.01     110      2D gel databases                             
   AGD                                   823        817     <0.01      87      Organism-specific databases                  
   ANU-2DPAGE                             23         23     <0.01     120      2D gel databases                             
   ArachnoServer                         462        458     <0.01      95                                                   
   ArrayExpress                        58025      58025      0.11      36      Gene expression databases                    
   Bgee                                37624      37623      0.07      42      Gene expression databases                    
   BindingDB                             297        297     <0.01     104      Other                                        
   BioCyc                             160507     147651      0.31      19      Enzyme and pathway databases                 
   BRENDA                              65151      62355      0.13      33      Enzyme and pathway databases                 
   BuruList                              330        330     <0.01     103      Organism-specific databases                  
   CAZy                                 5646       5025      0.01      63      Protein family/group databases               
   CGD                                   554        550     <0.01      92      Organism-specific databases                  
   CleanEx                             30219      29571      0.06      44      Gene expression databases                    
   COMPLUYEAST-2DPAGE                     59         59     <0.01     115      2D gel databases                             
   Cornea-2DPAGE                          67         67     <0.01     114      2D gel databases                             
   CTD                                 61888      61326      0.12      35      Organism-specific databases                  
   CYGD                                 6628       6522      0.01      62      Organism-specific databases                  
   dictyBase                            4252       4129      0.01      71      Organism-specific databases                  
   DIP                                 10424      10319      0.02      55      Protein-protein interaction databases        
   DisProt                               397        394     <0.01      98      3D structure databases                       
   DOSAC-COBS-2DPAGE                     150        150     <0.01     109      2D gel databases                             
   DrugBank                             5317       1626      0.01      65      Other                                        
   EchoBASE                             4159       4124      0.01      73      Organism-specific databases                  
   ECO2DBASE                             351        299     <0.01     102      2D gel databases                             
   EcoGene                              4353       4350      0.01      69      Organism-specific databases                  
   eggNOG                             216353     216353      0.42      16      Phylogenomic databases                       
   EMBL                               847079     505106      1.65       3      Sequence databases                           
   Ensembl                             90026      69633      0.17      26      Genome annotation databases                  
   euHCVdb                                55         44     <0.01     116      Organism-specific databases                  
   FlyBase                              5390       5014      0.01      64      Organism-specific databases                  
   Gene3D                             235272     193221      0.46      15      Family and domain databases                  
   GeneCards                           21079      19817      0.04      48      Organism-specific databases                  
   GeneDB_Spombe                        4976       4931      0.01      67      Organism-specific databases                  
   GeneFarm                             2690       2675      0.01      79      Organism-specific databases                  
   GeneID                             466887     447919      0.91       6      Genome annotation databases                  
   Genevestigator                      64330      64330      0.12      34      Gene expression databases                    
   GenomeReviews                      350729     334967      0.68      11      Genome annotation databases                  
   GermOnline                          41925      41318      0.08      41      Gene expression databases                    
   GlycoSuiteDB                          280        280     <0.01     105      PTM databases                                
   GO                                2169903     481255      4.22       1      Ontologies                                   
   Gramene                              4287       4287      0.01      70      Organism-specific databases                  
   H-InvDB                             11249       9556      0.02      54      Organism-specific databases                  
   HAMAP                              307272     307129      0.60      13      Family and domain databases                  
   HGNC                                19532      19360      0.04      49      Organism-specific databases                  
   HOGENOM                            359172     359172      0.70       9      Phylogenomic databases                       
   HOVERGEN                            75054      75054      0.15      29      Phylogenomic databases                       
   HPA                                  8705       6563      0.02      57      Organism-specific databases                  
   HSC-2DPAGE                             85         85     <0.01     112      2D gel databases                             
   HSSP                                28864      28864      0.06      45      3D structure databases                       
   InParanoid                          65670      65670      0.13      30      Phylogenomic databases                       
   IntAct                              21370      21370      0.04      47      Protein-protein interaction databases        
   InterPro                          1577615     485356      3.06       2      Family and domain databases                  
   IPI                                 88191      63273      0.17      27      Sequence databases                           
   KEGG                               438648     416909      0.85       8      Genome annotation databases                  
   LegioList                             760        758     <0.01      88      Organism-specific databases                  
   Leproma                               664        661     <0.01      91      Organism-specific databases                  
   ListiList                            1185       1177     <0.01      84      Organism-specific databases                  
   MaizeGDB                              472        467     <0.01      94      Organism-specific databases                  
   MEROPS                               8469       8210      0.02      58      Protein family/group databases               
   MGI                                 16096      16045      0.03      51      Organism-specific databases                  
   MIM                                 15816      12443      0.03      52      Organism-specific databases                  
   MypuList                              203        203     <0.01     108      Organism-specific databases                  
   NextBio                             48668      48668      0.09      39      Other                                        
   NMPDR                              130022     130018      0.25      22      Genome annotation databases                  
   OGP                                   377        377     <0.01     100      2D gel databases                             
   OMA                                352998     352998      0.69      10      Phylogenomic databases                       
   Orphanet                             3675       2132      0.01      76      Organism-specific databases                  
   OrthoDB                             55299      55299      0.11      37      Phylogenomic databases                       
   PANTHER                            184759     169579      0.36      18      Family and domain databases                  
   Pathway_Interaction_DB               4567       1665      0.01      68      Enzyme and pathway databases                 
   PDB                                 65533      15408      0.13      32      3D structure databases                       
   PDBsum                              65533      15408      0.13      31      3D structure databases                       
   PeptideAtlas                         5168       5168      0.01      66      Proteomic databases                          
   PeroxiBase                            676        664     <0.01      90      Protein family/group databases               
   Pfam                               656238     463935      1.27       4      Family and domain databases                  
   PharmGKB                            15813      15802      0.03      53      Organism-specific databases                  
   PHCI-2DPAGE                           244        244     <0.01     107      2D gel databases                             
   PhosphoSite                         19298      19298      0.04      50      PTM databases                                
   PhosSite                              267        267     <0.01     106      PTM databases                                
   PhotoList                             738        738     <0.01      89      Organism-specific databases                  
   PhylomeDB                          121107     121107      0.24      23      Phylogenomic databases                       
   PIR                                114946     104996      0.22      24      Sequence databases                           
   PIRSF                               80000      80000      0.16      28      Family and domain databases                  
   PMAP-CutDB                           1394       1394     <0.01      82      Other                                        
   PMMA-2DPAGE                            52         52     <0.01     117      2D gel databases                             
   PptaseDB                               34         34     <0.01     118      Protein family/group databases               
   PRIDE                               53372      53372      0.10      38      Proteomic databases                          
   PRINTS                             136260     117842      0.26      21      Family and domain databases                  
   ProDom                              27769      27440      0.05      46      Family and domain databases                  
   ProMEX                                438        438     <0.01      96      Proteomic databases                          
   PROSITE                            456163     290831      0.89       7      Family and domain databases                  
   PseudoCAP                            1212       1203     <0.01      83      Organism-specific databases                  
   Rat-heart-2DPAGE                       28         28     <0.01     119      2D gel databases                             
   Reactome                             7333       4259      0.01      60      Enzyme and pathway databases                 
   REBASE                                374        353     <0.01     101      Protein family/group databases               
   RefSeq                             487288     448193      0.95       5      Sequence databases                           
   REPRODUCTION-2DPAGE                  1030        942     <0.01      86      2D gel databases                             
   RGD                                  7360       7356      0.01      59      Organism-specific databases                  
   SagaList                              389        388     <0.01      99      Organism-specific databases                  
   SGD                                  6640       6537      0.01      61      Organism-specific databases                  
   Siena-2DPAGE                          102        102     <0.01     111      2D gel databases                             
   SMART                              141673     109353      0.28      20      Family and domain databases                  
   SMR                                345633     345633      0.67      12      3D structure databases                       
   STRING                             203510     203501      0.40      17      Protein-protein interaction databases        
   SubtiList                            4192       4183      0.01      72      Organism-specific databases                  
   SWISS-2DPAGE                         1182       1182     <0.01      85      2D gel databases                             
   TAIR                                 8930       8818      0.02      56      Organism-specific databases                  
   TCDB                                 3290       3249      0.01      78      Protein family/group databases               
   TIGR                                33906      33139      0.07      43      Genome annotation databases                  
   TIGRFAMs                           279338     260622      0.54      14      Family and domain databases                  
   TubercuList                          1584       1548     <0.01      81      Organism-specific databases                  
   UCSC                                48461      39488      0.09      40      Genome annotation databases                  
   UniGene                             91718      80830      0.18      25      Sequence databases                           
   VectorBase                            403        389     <0.01      97      Genome annotation databases                  
   World-2DPAGE                          507        507     <0.01      93      2D gel databases                             
   WormBase                             3812       3727      0.01      75      Organism-specific databases                  
   WormPep                              4051       3272      0.01      74      Organism-specific databases                  
   Xenbase                              3642       3569      0.01      77      Organism-specific databases                  
   ZFIN                                 2507       2496     <0.01      80      Organism-specific databases                  

Total number of cross-referenced databases: 120

6.  AMINO ACID COMPOSITION

   6.1  Composition in percent for the complete database

   Ala (A) 8.28   Gln (Q) 3.94   Leu (L) 9.67   Ser (S) 6.50
   Arg (R) 5.54   Glu (E) 6.77   Lys (K) 5.85   Thr (T) 5.32
   Asn (N) 4.05   Gly (G) 7.09   Met (M) 2.43   Trp (W) 1.07
   Asp (D) 5.45   His (H) 2.27   Phe (F) 3.86   Tyr (Y) 2.91
   Cys (C) 1.35   Ile (I) 5.99   Pro (P) 4.68   Val (V) 6.88

   Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.00

   

   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   6.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Glu, Ser, Ile, Lys, Arg, Asp, Thr, Pro, Asn, Gln,
   Phe, Tyr, Met, His, Cys, Trp


7.  MISCELLANEOUS STATISTICS

4446 entries are encoded on a mitochondrion, and 3549 are encoded on a plasmid.

12174 entries are encoded on a plastid, 
of which 21 are encoded on apicoplasts, 
11616 on chloroplasts, 
44 on organellar chromatophores,
145 on cyanelles, 
149 on non-photosynthetic plastids and 
199 on unspecified types of plastid.

Number of entries with at least one sequence correction: 68295