GMGC: Global Microbial Gene Catalog

GMGC version 1

The Global Microbial Gene Catalog (GMGC) is an effort to catalog all the genes in the global microbiome. It is based on a collection of hand-curated metagenomes, which were consistently processed with the same computational pipeline.

Version 1 of the catalog (available at https://gmgc.embl.de) was derived from 13 thousand metagenomes. This resulted in just over 2 billion ORF (individual input sequences), which were first clustered into 303 million unigenes (this is a species-level clustering, at 95% nucleotide identity). The unigenes were then clustered into 32 million protein families (distant homology).

Furthermore, these unigenes are functionally and taxonomically annotated and MAGs are available from the same contigs (46,655 high-quality MAGs).

This project is conducted in collaboration with the Bork Group at EMBL and the Huerta-Cepas group at CPGB.


Copyright (c) 2018–2024. Luis Pedro Coelho and other group members. All rights reserved.

Navigated to GMGC: Global Microbial Gene Catalog