The main catalog was done at species-level (95% nucleotide identity) and includes 302,655,267 genes. Additionally, we make available a 100% non-redundant catalog (including 966,108,540 genes) and a 90% amino-acid level catalog (210,478,083 genes). Note that the 90% catalog is a subcatalog of the main one and identifiers are kept consistent.
The 14 habitats considered in this version of the catalog give rise to 14 different sub-catalogs. Additionally, for convenience, we provide versions which exclude rare genes and may be more appropriate for uses such as short read mapping.