By globdbadmin, 11 March, 2026

We are very happy to announce the first release of the the amino acid sequence toolkit (AASTK). 

AASTK is a suite of tools designed to leverage the genomic diversity captured by the GlobDB to create and analyze datasets of homologous proteins. Current functionality of AASTK includes tree-of-life scale dataset building, curation, and maintenance, as well as clustering of protein datasets, genomic context analysis, and metadata retrieval. 

The genomic context tool (CUGO) and the metadata retrieval tool (meta) use an SQL database that stores all GlobDB protein sequences, as well as their annotations, taxonomy, and environmental distribution. This database is available from the downloads page.

For more information about AASTK, check out the documentation or the codebase. We are looking forward to any feedback, from bug reports to ideas for potential future extensions of AASTK

Comments