GMGC API (Version 1.0)

The webserver provides an API for advanced users as described in this page.

The base URL for API calls is https://gmgc.embl.de/api/v1.0/ and API calls return JSON (except where noted).

Also, note that the resources can all be downloaded for local processing. For large scale analyses, that will be more efficient than repeatedly calling the API.

Version

Returns the version of the resource.

Example
curl https://gmgc.embl.de/api/v1.0/version
Output
{
    "gmgc-version": "1.0.0",
    "last-updated": "Jun 1 2020"
}

Lookup

Results returned by lookup addresses and matching a single unigene the information documented below. Alternatively, if the provided <identifier> matches more than one unigene, the the reply will include a list of matches.

Matching multiple unigenes

Example
# Using an eggNOG identifier
curl https://gmgc.embl.de/api/v1.0/unigene/4PQJ6
Output
{
  "matches": [
    {
      "source": "SAMN06172490",
      "biome": [
        "dog gut",
        "human skin",
        "cat gut"
      ],
      "taxonomy": "1262977",
      "id": "GMGC10.000_000_027.PEPT"
    },
    {
      "source": "SAMN06172460",
      "biome": [
        "dog gut"
      ],
      "taxonomy": "742823",
      "id": "GMGC10.000_186_864.PEPT"
    },
(...)
  ]
}

Matching a single unigene

Example
curl https://gmgc.embl.de/api/v1.0/unigene/GMGC10.054_598_380.SCLAV_5304
Output
{
  "gene_family": "GMGC10.205_457_183.UNKNOWN",
  "cluster": "GMGC10.146_435_694.SCLAV_5304",
  "query": "GMGC10.054_598_380.SCLAV_5304",
  "name": "GMGC10.054_598_380.SCLAV_5304",
  "taxonomy": [
    {
      "name": "environmental samples",
      "id": "59619",
      "rank": "genus"
    }
  ],
  "samples": 573,
  "length": 88062,
  "habitat": [
    "human skin",
    "human gut"
  ],
  "genome_bins": 25,
  "strand": "+",
  "complete": 1
}
Example
curl https://gmgc.embl.de/api/v1.0/unigene/GMGC10.054_598_380.SCLAV_5304/dna_sequence
Output
{
  "query": "GMGC10.054_598_380.SCLAV_5304",
  "name": "GMGC10.054_598_380.SCLAV_5304",
  "dna_sequence": "ATGAAGTTAGGGGAGAAAATAA..."
}
Example
curl https://gmgc.embl.de/api/v1.0/unigene/GMGC10.054_598_380.SCLAV_5304/protein_sequence
Output
{
  "query": "GMGC10.054_598_380.SCLAV_5304",
  "name": "GMGC10.054_598_380.SCLAV_5304",
  "protein_sequence": "MKLGEKIMRLGKKTSRAISIALL..."
}
Example
curl https://gmgc.embl.de/api/v1.0/unigene/GMGC10.054_598_380.SCLAV_5304/features
Output
{
  "features": {
    "intrinsic": [
      {
        "feature": "COIL",
        "end": 351,
        "start": 321
      }
    ],
    "pfam": [
      {
        "domain": "Pfam:RCC1",
        "evalue": 3.4e-06,
        "bitscore": 24.2,
        "end": 891,
        "start": 834
      },
      (...)
    ],
    "smart": [
      {
        "domain": "PbH1",
        "evalue": 4989.07360092172,
        "bitscore": 2.3,
        "end": 782,
        "start": 755
      },
      (...)
    ],
    "eggnog": {
      "cog_functional_category": "M",
      "eggnog_ogs": [
        "2IDH9@201174",
        "4D0CR@85004",
        "COG5184@1",
        "COG5184@2"
      ],
      "seed_ortholog_score": 340.5,
      "go_terms": [],
      "kegg_pathway": [],
      "bigg_reaction": [],
      "ec_number": "-",
      "cazy": "-",
      "kegg_reaction": [],
      "seed_eggnog_ortholog": "1394175.AWUN01000002_gene844",
      "predicted_protein_name": "-",
      "seed_ortholog_evalue": 7.8e-89,
      "kegg_ko": [],
      "eggnog_free_text_description": "Listeria-Bacteroides repeat domain (List_Bact_rpt)",
      "brite": [],
      "kegg_module": []
    }
  },
  "query": "GMGC10.054_598_380.SCLAV_5304",
  "name": "GMGC10.054_598_380.SCLAV_5304"
}
Example
curl https://gmgc.embl.de/api/v1.0/unigene/GMGC10.054_598_380.SCLAV_5304/samples
Output
{
  "samples": [
    "SAMEA1906425",
    "SAMEA1906421",
    "SAMEA1906417",
    (...)
  ],
  "query": "GMGC10.054_598_380.SCLAV_5304",
  "name": "GMGC10.054_598_380.SCLAV_5304"
}
Example
curl https://gmgc.embl.de/api/v1.0/unigene/GMGC10.054_598_380.SCLAV_5304/genome_bins
Output
{
  "genome_bins": [
    "GMBC10.101_017",
    "GMBC10.160_410",
    "GMBC10.133_210",
    (...)
  ],
  "query": "GMGC10.054_598_380.SCLAV_5304",
  "name": "GMGC10.054_598_380.SCLAV_5304"
}

Antibiotics

Example
curl https://gmgc.embl.de/api/v1.0/unigene/GMGC10.000_001_095.UGD/antibiotics
Output
{
  "antiobiotics": [
    "actinomycin",
    "actinomycind",
    "arylomycin",
    (...)
  ],
  "query": "GMGC10.000_001_095.UGD",
  "name": "GMGC10.000_001_095.UGD"
}
Example
curl https://gmgc.embl.de/api/v1.0/unigene/GMGC10.000_001_095.UGD/aro_terms
Output
{
  "aro_terms": [
    "ARO:1000001",
    "ARO:3000000",
    "ARO:3002984",
    "ARO:3003577",
    "ARO:3003580",
    "ARO:3004112",
    "ARO:3004269"
  ],
  "query": "GMGC10.000_001_095.UGD",
  "name": "GMGC10.000_001_095.UGD"
}

Genome bins

Example
curl https://gmgc.embl.de/api/v1.0/genome_bin/GMBC10.001_023
Output
{
  "min_contig_size": 3323,
  "name": "GMBC10.001_023",
  "contamination": 0,
  "genome": "SAMEA3708885.bin.14",
  "N50": 2611971,
  "total_bp_size": 2611971,
  "nr_contigs": 72,
  "GTDB_tk": "d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnospira;s__Lachnospira rogosae",
  "category": "high-quality",
  "quality": 99.33,
  "completeness": 99.33,
  "max_contig_size": 243991
}

Samples

Example
curl https://gmgc.embl.de/api/v1.0/sample/SAMEA3708885
Output
{
  "ena_link": "https://www.ebi.ac.uk/ena/data/view/SAMEA3708885",
  "longitude": 12.568337,
  "habitat": "human gut",
  "latitude": 55.676097,
  "name": "SAMEA3708885"
}

Batched queries

There are also plural versions of the lookups above, that work with POST:

These correspond to the calls above, except that they work for multiple inputs, passed in as a JSON (with correct content type), in the format: {"names": ["gene-A", "gene-B"]} (or sample A, or genome_bin A…).

E.g:

curl --header 'Content-Type: application/json' \
     --request POST \
     --data '{"names": ["GMGC10.003_873_867.PHOA", "GMGC10.016_471_114.PHOA"]}' \
     'https://gmgc.embl.de/api/v1.0/unigenes/genome_bins'

Habitats

Example
curl https://gmgc.embl.de/api/v1.0/habitat/human_gut
Output
{
  "samples": [
    "SAMEA1906426",
    "SAMEA1906424",
    "SAMEA1906422",
    (...)
  ],
  "subcatalog_url": "https://gmgc.embl.de/downloads/v1.0/GMGC10.human-gut.95nr.fna.gz",
  "name": "human gut",
  "subcatalog_no_rare_url": "https://gmgc.embl.de/downloads/v1.0/GMGC10.human-gut.no-rare.95nr.fna.gz"
}

Query by sequence

The call is as a POST request of up to 50 sequences as an attached FASTA file. The attachment should be called fasta.

For example, create test.fasta containing:

>MySeq
AALAMSALMALSJLAJLACAOSIJDAOSIJDALAASKJDASLKJALCEMALWPQRODASLKJALCKMALWPQRODASLKJ
ALCCKMALWPQRODASLKJALCKMALWPQROQUPJALSFAASLUFPASUFASFJA

and with curl:

curl -X POST \
     -F 'fasta=@test.fasta' \
     -F 'mode=all' \
     -F 'return_seqs=true' \
     -F 'return_bins=true' \
     'https://gmgc.embl.de/api/v1.0/query/sequence'

Note that the algorithm will always returns its best matches as hits and it is the user’s responsibility to filter them appropriately (i.e., if no good matches exist in the catalog, the algorithm will still return something, but it will be returned with a high e-value).

Parameters:

    {
        "results":
            [{
                "query_name": "Q1",
                "hits":
                    [ # hits is always a list, but if the request included `mode=besthit`, this will be a list of size one.
                    {"unigene_id": "GMGC10.000_000_000.NAME",
                     "evalue": 12e-23,
                     "bitscore": 232.2,
                        # sequences are only provided if the request included `return_seqs=true`
                     "dna_sequence": "ATTATACAA...",
                     "protein_sequence": "MEPATA..."
                        # Genome bins are only provided if the request included `return_bins=true`
                     "genome_bins":
                        [ "GMBC10.001_023"
                        , "GMBC10.202_232"
                        ]
                    },
                    (...)
                    ]
              }, {
                  "query_name": "Q2",
                  "hits": (...)
              }]
    }