Supported by a Phenotype RCN collaboration grant, Grant Godden and Pier Luigi Buttigieg met during May 2015 at the Rancho Santa Ana Botanic Garden (RSABG) in Claremont, CA, with the aim of enhancing the ontological representation of plant environments. Grant and Pier processed label data from more than one million plant specimen records hosted by iDigBio, using a combination of natural language processing and text-mining techniques to identify well-represented terms and phrases in “habitat” descriptions. Their interactions with RSABG collections staff, whose active work with specimen digitization and insights into the creation of records that populate repositories like iDigBio, greatly enhanced the project and helped create a workable corpus. The preliminary results of the analyses were immediately informative, revealing gaps in the current coverage of the Environment Ontology (ENVO; Buttigieg et al., 2013).
Further work is planned to refine their computational pipeline and corpus, and to extend ENVO’s coverage of environments which the botanical community frequently sample. A brief publication reporting the process, findings, and results is in preparation.
GG is affiliated with the Rancho Santa Ana Botanic Garden, Claremont, CA, USA. PLB is affiliated with the Alfred-Wegener-Institut, Helmholtz-Zentrum für Polar- und Meeresforschung, Bremerhaven, Germany.