Taxonomic name verification is necessary because, for most organisms, checking the validity of taxonomic names is a difficult task that requires expert skills to ensure a correct diagnosis. Given a new taxonomic name that is detected in a historic monograph, the taxonomic process may involve working though the older literature (perhaps 200 years old) and comparing it with the sample descriptions associated with existing species and classifications in the taxonomic database. If there is no match, the organism that the taxonomic name refers to suggests a new species.
However, this effort is compounded by the difficulty in term variation. The taxonomic name assigned to a particular species may have changed since the publication of the original description of the species. Furthermore, there may be orthographic and other term variation in names assigned to the same species. For example, Agelaus phœniceus, Agelæus phœnicei, Agelæus phœniceus, and Agelæus phœnicio could all be variants of the same name. In addition, errors are frequently introduced by imperfect OCR (Optical Character Recognition) technology at the scanning stage, so erroneous recognition of 'o' in place of 'c' might propose the taxon Pioa, not a known name, rather than Pica (European magpie).
Figure 1. Extracted context evidences to support taxon name verification
To support verification and to bring it into the working patterns of taxonomists, our project will work by attempting to identify terms in the biodiversity documents which are possible taxonomic names. As shown in Figure 1 (see the sample page), the verification process will be combined with a simple recommender system, so that users are presented with a small number of text "snippets" containing the proposed names with some contextual information (Keywords in Context) . This will enable the user to decide whether to "click-through" some of the snippets to gain access to the complete paper.
Figure 2. A multi-option form for judgment collection
The recommender system provides a judgment-making form (see Figure 2) to the users. The form includes several possible options that allow the user to determine the most likely name type (e.g., a potential scientific name of a new taxon, a synonym, name variant, misspelling, or common name of an existing taxon) for the target taxonomic name. Each taxonomic name will be presented to multiple curators and their judgments are collected after they submit the form. To cope with potential divergence between human judgments as shown in Figure 3, we will develop an appropriate crowd voting mechanism to determine the validity of the taxonomic name based on the distribution of human judgments. Once a new name has been verified as a valid taxonomic name, it will be published, along with appropriate metadata, on the taxon curation Scratchpad website for further scrutiny by the expert research community.
Figure 3. The distribution of human judgments
Sample cases:
Here we list a number of interesting ambiguous names, which probably suggest the same taxon. Please determine the taxonomic property of these names with the help of evidences found in the text.
For more verification instances, please visit our Taxonomic Name Curation web page.