The ComTax Project, A Community-driven Curation Process for Taxonomic Databases, is a JISC-funded research tool project by UK Government. To motivate multiple research communities to engage in the identification of potentially new taxonomic names in the literature and linking these to known hierarchies, the project will to build curation web services under the Scratchpad framework. These will make use of the extensive and flexible data models and potential wider biodiversity community network provided by Scratchpads to enhance the applicability and utilisation of the service.
Figure 1. A community-driven curation process for taxonomic databases
Figure 1 shows the proposed community supported curation process. Given a set of biodiversity literature, a list of potential taxonomic names are identified using NER (Named Entity Recognition) tools. The potential taxonomic names are mapped onto the identified taxonomic names in a number of existing large-scale taxonomic databases such as Encyclopedia of Life (EoL), Catalogue of Life (CoL), and uBio Name Bank. Taxonomic names that cannot be found in these databases will be collected for validation by taxonomists. Those names that do appear will be collated to build indices to support search of the literature.
Figure 2. A new taxonomic name with associated taxonomic descriptions in the literature
Potential taxonomic names will be presented, together with the relevant context (as shown in Figure 2). These will be shown (via a plugin) to the research community who are members of the Scratchpads social network (e.g., professional taxonomists, experienced citizen scientists and other biodiversity specialists) to initiate a community-driven curation process. Once this new name is verified as a valid taxonomic name based on collective human judgments, the system will publish the name, its context and bibliographic details to a Scratchpad developed for this purpose. There, the names will be available for further scrutiny and verification by the community.