Nitta, Joel [1], Iwasaki, Wataru [1].

Resolving species names rapidly and accurately with the “taxastand” R package.

Recently, it has become possible to conduct analyses of biodiversity on previously unimaginable scales by leveraging large, public datasets such as GBIF and GenBank. Species names are key identifiers that enable the merging of data between such datasets. However, it is not unusual to encounter multiple different synonyms applied to the same species across different datasets, which can prevent data merging if not resolved to the underlying species name. To enable analysis synthesizing data derived from multiple, large datasets, it is imperative to have software capable of resolving species names in a rapid, automated, and accurate fashion.
Here, we present the “taxastand” R package for standardization of species names across datasets. The taxastand package builds on the “taxon-tools” (https://github.com/camwebb/taxon-tools) command-line tool to enable species name matching and resolution in R, the popular programming environment used by many ecologists and evolutionary biologists. Features of taxastand include 1) ability to use any user-specified reference database, 2) completely local usage (no calls to an online API), thereby facilitating reproducibility, 3) fuzzy matching, and 4) awareness of the rules of botanical nomenclature when resolving names.
As a case-study, we demonstrate usage of taxastand to join distribution data of Japanese ferns from GBIF to a dataset on endangered status of Japanese ferns (the “Green List”).  Of 1,092 species in GBIF, taxastand was able to successfully resolve 770 names to the Green List. As the Japanese pteridophyte flora only includes ca. 720 species (excluding hybrids), it is likely that many of the unresolved GBIF names were non-native taxa or artifacts. To verify the accuracy of name resolution, we generated maps of species richness and compared them to previously published maps. Except for a few outliers, the maps were nearly indistinguishable.
taxastand is freely available at https://github.com/joelnitta/taxastand.

1 - The University of Tokyo, Department of Integrated Biosciences, Graduate School of Frontier Sciences, Kashiwa, Chiba, 277-0882, Japan


