downloads | GlobDB

GlobDB release 232

this page describes all files comprising the current release of the GlobDB. Links to files from previous releases (including the web pages) can be found below the file descriptions. Third party resources are listed at the bottom of the page.

GlobDB download location

All release 232 files are described below (in alphabetical order), and can be accessed here. The headers above the descriptions below are also download links for the individual files. Please note that some of the files are very large.

File descriptions (release 232)

Top level directory

globdb_r232_anvio_dbs.tar.gz
Anvi'o contigs databases for all GlobDB genomes, generated using the program anvi-gen-contigs-database from the fasta files provided in globdb_r226_genome_fasta.tar.gz. These anvi'o databases include the gene calling and annotation information of the genomes, and can be used to generate the protein fasta and gff files. Furthermore, the anvi'o databases can be used as a resource for phylogenomics, pangenomics, or other comparative analyses. Note that the anvi'o contigs databases are created with anvi'o v9 and therefore can only be used with that version.

globdb_r232_checkm2.tsv.gz
Tab delimited file containing the output of CheckM2 for all GlobDB genomes.

globdb_r232_dataset_list.tsv
tab delimited file listing the datasets included in the GlobDB r232 and their contribution to the final dataset.

globdb_r232_dictionaries.tar.gz
Dictionaries of the genome identifiers in source datasets and the corresponding standardized GlobDB IDs. For datasets where the IDs were already sufficiently standardized no dictionary is provided.

globdb_r232_genome_fasta.tar.gz
FastA files for all genomes in the GlobDB. Fasta headers have been renamed using the anvi'o script anvi-script-reformat-fasta to simplify the contig header lines and include the GlobDB genome ID at the start of every contig.

globdb_r232_gff_cog.tar.gz
Anvi'o exported files in GFF format for all GlobDB genomes with COG annotation as done using the anvi'o program anvi-run-ncbi-cogs. See the methods page for more details.

globdb_r232_md5sum.txt
md5sums of the downloadable files in the top level directory.

globdb_r232_protein_annotations.tar.gz
Tab delimited files of KEGG/COG/Pfam/dbCAN2 annotations of all GlobDB genomes exported from anvi'o..

globdb_r232_protein_fasta.tar.gz
FastA files for all proteins in the GlobDB genomes annotated using prodigal as integrated in anvi'o, exported using the program anvi-get-sequences-for-gene-calls. Note that the headers of all proteins were renamed after export to include the GlobDB genome ID, and to be consistent with the IDs in the exported gff files.

globdb_r232_taxonomy.tsv.gz
Two column Tab delimited file containing GlobDB genome ID as well as a full 7 level taxonomy. Taxonomy assignment of non-GTDB genomes is described on the methods page.

taxonomic_profiling
Directory containing the sylph databases and SingleM metapackage that can be used to generate taxonomic profiles of metagenomes using the GlobDB taxonomy

taxonomic_profiling/GlobDB_r232.metapackage_v4.smpkg.tar.gz
SingleM metapackage for taxonomic profiling of metagenome datatsets using SingleM and the GlobDB taxonomy

taxonomic_profiling/globdb_r232_sylph_c1000.syldb
Sylph database created with subsampling rate 1000 for taxonomic profiling (with lower RAM requirement) of metagenome datatsets using Sylph and the GlobDB taxonomy.

taxonomic_profiling/globdb_r232_sylph_c200.syldb
Sylph database created with subsampling rate 200 for taxonomic profiling (with higher RAM requirement) of metagenome datatsets using Sylph and the GlobDB taxonomy.

taxonomic_profiling/globdb_r232_taxonomy_sylph.tsv.gz
Two-column tab delimited file required for assigning taxonomy (using sylph-tax) to outputs of sylph profiling with the sylph databases in this directory

taxonomic_profiling/globdb_r232_tax_profile_md5sums.txt
md5sums of the downloadable files in the taxonomic_profiling directory

taxonomic_profiling/globdb_r232_trees.tar.gz
Bacterial and Archaeal trees for the GlobDB genomes that were used to assign taxonomy to GlobDB genomes not sourced from the GTDB, as described on the methods page.

Previous releases

The current release of the GlobDB (GlobDB 232) is the third public release of this resource. Files from previous releases can be accessed below:

Release 226

All release 226 files can be accessed here

Release 220

All release 220 files can be accessed here

Third party download locations

In addition to the files available at the link above, Dr. Marco Gabrielli has generated genbank files for all GlobDB release 226 genomes, that can be downloaded from his repository at eawag.
Important: the gene calling for these genbank was done independently using prokka 1.14.5. Although anvi'o and prokka both use prodigal 2.6.3 to call genes, prokka removes gene calls overlapping rRNA/tRNA genes and therefore the locus tags in these genbank files do not necessarily correspond to the GlobDB locus tags.