The Map of Life

integrating species distribution knowledge

New feature: better synonymy in Map of Life

Map of Life lets you search by species name across the 366 million records we’ve imported so far. These records comes from fifty-four different datasets, some of which use different species names to refer to the same species. Last September, Rod Page pointed out one particular case:

An extract from Rod Page's blog post talking about a problem with how Map of Life treats names

The problem here is one of synonymy. A species may be referred to by a number of names over the course of its history: for example, the western hoolock gibbon, known today as Hoolock hoolock, was known as Hylobates hoolock until at least the 1980s, when the name Bunopithecus hoolock was gradually adopted as the correct one. The species was then renamed Hoolock hoolock in 2005. (If you have access to an academic library, you can find all the details in Mootnick and Groves, 2005). Only one of these names is considered valid today; all the alternate names are known as synonyms.

Map of Life’s records come from a wide variety of sources, from century-old checklists to surveys carried out in the 1950s to expert range maps drawn in the 21th century. We have records for the western hoolock gibbon under all three names — Hoolock hoolock is used by the IUCN Red List, Bunopithecus hoolock by the WWF Ecoregion Species Checklists, and Hylobates hoolock by GBIF. This is a relatively simple case: the World Register of Marine Species (WoRMS) lists six alternate names for the Giant Pacific octopus. When you search for a name on Map of Life, the search results ought to contain not just the searched name but all alternate, synonymous names as well.

We decided that the best way to implement this would be by developing an in-house, expert-curated list of synonyms and accepted names, and to supplement these by using the new GBIF Species API. There are a wealth of options available today when picking web services for species name resolution, many of which make their entire list of synonyms available for download. We picked GBIF because of the large number of taxonomic checklists it incorporates — 270 separate checklists, including important taxonomic databases such as ITIS, WoRMS, the Catalogue of Life, Mammal Species of the World and others. These checklists cover every kingdom of life, giving us wide taxonomic and spatial coverage through a single JSONP query. It also allowed us to reuse our code for accessing this API from our work on name validation in OpenRefine.

When you search for a name on Map of Life, this is what happens:

  1. Before anything else, we search for your query on Map of Life and present you with the results as quickly as possible, as we always have.

  2. As you look through the direct search results, we search for your query in our internal list of synonyms. This table contains only vertebrate synonyms for now. If we find a match, we add the alternate names to your search.

  3. If we do not find a match, we try to match your name against all of GBIF’s hundreds of checklists. If we find the queried name there, we add every valid name recognized for that species by any checklist to your search.

This turns out be a “good enough” solution for an incredible variety of names. Try searching for Caminus osculosus (a sponge), Octopus dofleini (an outdated name for the Giant Pacific Octopus), Anser hutchinsii (an outdated name for the Cackling Goose), or even the rotifer Lecane kasumiensis.

Synonymy on Map of Life isn’t completely solved yet: for example, we currently return only one of the two possible synonyms for Hoolock hoolock. This is because we currently look up only the valid name of the species you search for: a search for Hylobates hoolock will display records stored under its valid name, Hoolock hoolock, but will not display records stored under its alternate synonym, Bunopithecus hoolock. The next step is to look up not just the valid name, but all synonyms ever used to refer to that name, so that we can find every record which might be relevant to your search. This should be achievable using GBIF’s /species/search API call, as long as we make sure that these searches can be made fast and responsive, and don’t add too many unhelpful names to your search.

What do you think of our new feature? Please let us know if you have any problems with it, or have suggestions on how we can improve it!


Written by Gaurav

February 4, 2014 at 5:44 pm

Posted in technology, updates

%d bloggers like this: