Abstract Detail

Biodiversity Informatics & Herbarium Digitization

Weaver, William [1], Laport, Robert [2], Jaime-Rivera, Jorge [3], Ng, Julienne [4], Smith, Stephen [5].

Modular Machine Learning Methods for End-to-End Automated Phenotypic Trait Extraction from Digitized Herbarium Vouchers.

Phenotypic trait data from digitized herbarium specimens is a valuable tool for determining taxonomy, reconstructing phylogenetic relationships, and measuring phenological responses to climate change. Data aggregation portals like GBIF, SEINet, or iDigBio make large-scale projects feasible, but locating appropriate specimens and extracting trait data remains a laborious process. Previously, we released and published LeafMachine (www.LeafMachine.org) – a suite of convolutional neural networks and computer vision tools – to automatically measure basic leaf traits by isolating individual leaves from digitized specimens. Here we build on our previous work and present LeafMachine2, a python-based modular workflow that provides a robust toolbox of pre-trained machine learning networks to measure phenotypic traits. The LeafMachine2 workflow employs an object detection network to locate vegetative and reproductive structures, rulers, color correction cards, barcodes, maps, and text. We use instance segmentation to isolate individual leaf outlines and scene detection to score the quality and completeness of each detected leaf. We also developed a point-detection method to locate and map eight pseudo-landmarks for each detected leaf. Machine learning methods for biological datasets are prone to poor generalizability because of the inherent variability among specimens and the inconsistency of specimen preparation standards. To meet these challenges, LeafMachine2 provides users with a set of neural networks trained on a dataset of more than 5,000 specimens from 100 herbaria, representing approximately 2,600 species sampled across angiosperms. Users will be able to expand both the capability and scope of LeafMachine2 through transfer learning and seamless integration with the Labelbox (www.labelbox.com) annotation platform. Users can annotate new features and use transfer learning to tailor our pre-trained machine learning networks to new datasets or traits, expanding upon our supported library of traits, landmarks, and taxa. Our modular application of machine learning methods has the potential to vastly increase available trait information while adding valuable information to the extended specimen.

Related Links:

1 - University of Michigan, Ecology and Evolutionary Biology, MI, USA
2 - Rhodes College, Department Of Biology, 2000 North Parkway, Memphis, TN, 38112, United States
3 - The Morton Arboretum
4 - University of Colorado
5 - University of Michigan

Herbarium Digitization
Machine Learning
plant traits
convolutional neural networks
Specimen processing

Presentation Type: Oral Paper
Number: BI&HD I010
Abstract ID:239
Candidate for Awards:None

Copyright © 2000-2022, Botanical Society of America. All rights reserved