With the help of our developed taxonomic name identification and mapping tools, we created a semantically-enhanced linked dataset that is constructed based on extracted taxonomic information and the linkages to different biodiversity resources. The dataset is currently available on the web as an open resource. In this dataset, each distinct taxonomic name detected in the literature has exactly one web page.

Figure 1. A sample web page to show how extracted semantic features link to the BHL and external taxonomic databases

Figure 1 shows the web page  corresponding to a particular species in which the contexts surrounding the occurrence of the target mention are extracted from the text. Each piece of context evidence is assigned with a corresponding bibliographic citation that is linked to the respective PDF copy of the referring page (here, the BHL). Unique database identifier ID and the hyperlinks to external taxonomic databases are also provided on the web page if possible. The connections to external taxonomic databases would increase the understanding and analysis of the behaviour pertinent to the target species. These bibliographic linkages allow the system to identify and track back the raw data across the range of remote databases.

The metadata can potentially encode many semantic aspects of the data.  Identified taxonomic names and the hyperlinks to repositories will improve the capability of species-specific document retrieval. Encoding different names for organisms will improve synonym detection and reconciliation techniques are thus needed to interconnect multiple names. Also, the linkages to the unique identifiers of organisms facilitate the reconciliation process.

The semantically-enhanced linked data contains taxonomic names found in 4 digitized Biologia Centrali-Americana (BCA) volumes. They are: 

