Welcome to the GlobDB genomes database

This website hosts the GlobDB, a dereplicated set of species representative microbial genomes. The genomic era offers great opportunities for microbial genome analyses, and individual (meta)genome studies can generate thousands of microbial genomes. Although multiple databases are available to store these datasets, the integration of large scale studies sometimes has proven challenging. The GlobDB aims to integrate several resources that are currently not (yet) consolidated otherwise.

As of version 220, the GlobDB includes four distinct databases:
- the species representatives of the genome taxonomy database (GTDB), sourced from NCBI genome
- the species representatives of the genomic catalog of earth's microbiomes (GEM).
- the species representatives of the searchable, planetary-scale microbiome resource (SPIRE).
- the species representatives of the genomic catalog of soil microbiomes (SMAG).

These datasets are further dereplicated (in the order they are listed above) and then processed in a standardised way to yield a comprehensive dataset that can be used for further analyses. 
Currently, the GlobDB comprises 202,601 (partial) microbial genomes after dereplication of the four source datasets. There are 113,104 genomes from NCBI (GTDB species representatives), 10,662 from GEM, 66,971 from SPIRE, and 11,864 genomes from SMAG. 
For the total set of 202,601 genomes, anvi'o databases, genome fasta, protein (amino acid) fasta, and KEGG/COG/Pfam annotations are available for download.

See the Methods page for full details on data processing, and the Downloads page for a description of available files.

If you use the GlobDB, make sure to cite the underlying data sources and methods as appropriate, see How to cite page.
 

Updates, versions, maintenance

The current GlobDB version is 220. The GlobDB follows the GTDB update schedule, which is currently once per year. The version numbering is linked to GTDB, which is in turn taken from to the NCBI RefSeq versioning.

The GlobDB is maintained by Daan Speth, senior scientist at the division of microbial ecology (DoME) of the centre for microbiology and environmental systems science (CeMESS) at the University of Vienna, and is hosted on the life science compute cluster (LiSC). For any questions related to the GlobDB, please use the form on the Contact page.
 

License

the GlobDB propagates the licenses of the underlying data sources, and is licensed under CC BY-SA 4.0. 

This means you are free to:
Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
Adapt — remix, transform, and build upon the material for any purpose, even commercially.

Under the following terms:
Attribution — You must give appropriate credit , provide a link to the license, and indicate if changes were made . You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.

See https://creativecommons.org/licenses/by-sa/4.0/ for full license details