Tree of life and data integration challenges at the first FuturePhy workshop

Apr 062016

What are the challenges in building, visualizing and using the Tree of Life? How can we best utilize and build on existing phylogenetic knowledge and look ahead to address the challenges of data integration? Recently, fellow Phenoscaper Jim Balhoff and I attended the first FuturePhy workshop in Gainesville, Florida (February 20-22, 2016). The workshop brought together three taxonomically-defined working groups (catfish, beetles, barnacles) to build megatrees from existing phylogenetic studies, and identify and begin applying diverse data layers for their respective groups. Open Tree and Arbor personnel were on hand discuss and help solve issues in data integration.

The catfish team (John Lundberg, Mariangeles Arce, Jim Balhoff, Brian Sidlauskas, Ricardo Betancur, Laura Jackson, Kole Kubicek, Kyle Luckenbill, and myself, Wasila Dahdul) included participants with expertise in catfish anatomy, phylogenetics (molecular and morphological), development, bioinformatics, and digital imaging. We were motivated to build on the work of the All Catfish Species Inventory to achieve a more complete understanding of catfish diversification by integrating published phylogenies, 2D and 3D images in various online repositories, and thousands of computable phenotypes for catfishes in Phenoscape.

Screen Shot 2016-04-06 at 9.58.44 AM

We held several hands-on sessions on tree grafting (using Mesquite, R, and Arbor), data annotation (using Phenex), and tree submission to Open Tree.  We also examined an automatically generated supermatrix for 18 published catfish matrices in the Phenoscape KB (generated using the OntoTrace tool), and prototype data visualizations for supermatrices developed by Curt Lisle in Arbor. We used Mesquite to manually create a draft megatree, and in parallel, uploaded trees to Open Tree, which automatically synthesized a megatree for catfishes. Our plan is to compare the output of manual tree-building in Mesquite with the automated tree from Open Tree.

Among the issues and priorities that emerged during the workshop was the need for inclusion of the authoritative Catalog of Fishes taxonomy in Open Tree, and allowing the addition of unnamed or uncertainly identified taxa commonly used in matrices. We also discussed challenges in automated character consolidation across multiple studies, and the reuse of images across multiple online archives.

We left with a plan to continue tree building and data layer integration post-workshop, with the aim of publishing the catfish megatree (including the methods and remaining challenges) and the integration of data layers via interactions between Arbor, Open Tree, and Phenoscape.

Ontology-based text markup tools

Jan 142016

Efficiently extracting knowledge from the published literature is a challenge faced by many database projects in biology, and many of us are interested in tools that can assist and speed up the task of identifying concepts in free text. I’ve recently used two text markup tools that are helpful in keeping up with the literature and rapidly developing ontologies. As a participant in the Fifth BioCreative Challenge, in which biocurators test and evaluate text mining systems, I evaluated the EXTRACT bookmarklet tool. EXTRACT was developed for metagenomics data and provides full-page tagging of mapped terms from environment, disease, taxonomy, and tissue ontologies, and can also markup shorter selections of text on an HTML page. The tool is immediately useful, particularly during the first stages of the curation process, as a curator is surveying the literature for relevant articles.

Annotating long, descriptive text has also been a challenge for Phenoscape. To assist curators in this task, we recently added a text annotator tool to the Phenoscape Knowledgebase that tags selected text passages copied in from a source with matched terms from anatomy (Uberon), taxon (VTO), and quality (PATO) ontologies. Viewing the annotated results, with color-coded text, has aided curators in the process of applying large, complex ontologies to equally complex text.

Phenoscape poster at Evolution 2014

Aug 272014

I attended the Evolution 2014 meeting a few months ago in Raleigh, NC, and presented a poster on Phenoscape’s curation effort: “Moving the mountain: How to transform comparative anatomy into computable anatomy?”, with coauthors A. Dececchi, N. Ibrahim, H. Lapp, and P. Mabee. In this work, we assessed the efficiency of our workflow for the curation of evolutionary phenotypes from the matrix-based phylogenetic literature. We identified the bottlenecks and areas of improvement in data preparation, phenotype annotation, and ontology development. Gains in efficiency, such as through improved community data practices and development of text-mining tools, are critical if we are to translate evolutionary phenotypes from an ever-growing literature. The poster was well received and several researchers at the meeting were interested in learning more about open source tools for phenotype annotation.

Teleost Anatomy Ontology adds French terms and synonyms

Feb 032012

With the help of Phenoscape and DeepFin intern Ben Frable, I recently finished adding 117 French anatomical terms and synonyms from Chanet & Desoutter’s glossary publication [1] to the Teleost Anatomy Ontology (TAO). These authors spent many years defining and translating Paul Chabanaud’s anatomical analyses of flatfishes into modern French and English to help researchers understand his important publications. Adding these terms to the TAO takes their translation one step further, enabling computers to link Chabanaud’s unusual terms to an ontology ID for each anatomical ‘concept’, which in turn enables connections among all phenotypic and related data that reference this ID.

These synonyms can now be used in searches of the Phenoscape Knowledgebase. For example, you can see the French synonyms for ‘paired fin’. One can imagine ultimately being able to select a preferred language or term label when browsing the ontology in the Knowledgebase.

These were the first set of foreign terms to be added to the teleost ontology, and we had to tweak the Phenoscape Knowledgebase interface to display the diacritical marks correctly. We are ready to accept more! Please send me anything you’d like added or changed to the TAO term tracker.

[1] Chanet, B., & Desoutter-Meniger, M. (2008). French-English glossary of terms found in Chabanaud’s published works on Pleuronectiformes. Cybium, Electronic Publication no 1:1-23. PDF download

Phenoscape visits Xenbase for Anatomy Ontology Update

Sep 232011

Last month I visited Xenbase and Aaron Zorn’s lab at the Cincinnati Children’s Hospital for a couple of days (August 21-23, 2011) to work with Xenbase curators in preparing the Xenopus Anatomy Ontology (XAO) for its next big release.  Xenbase curators Christina James Zorn and VG Ponferrada have been leading the effort, and Erik Segerdell, the ontology development coordinator for the Phenotype RCN and former Xenbase curator, was also visiting for the week and helping with the update. Erik and I provided training in ontology editing and synchronization tools.

We used the Synchronization Tool (an Obo-Edit plugin) to compare XAO to several external ontologies. The tool made it efficient to find and add missing cross-references and terms, and to resolve conflicting data (e.g., differing definitions). By the end of the week, we updated XAO with all relevant terms from the Amphibian Anatomy Ontology (AAO), Vertebrate Anatomy Ontology (VAO), Uber Ontology (UBERON), and Common Anatomy Reference Ontology (CARO).

Erik, VG, and Christina worked hard the rest of the week to make XAO is_a complete and to update definitions, relationships, and synonyms for existing terms. With XAO’s new release, the ontology is now 25% larger than before. The newest version of XAO will be available for downloaded soon from the OBO Foundry.

ICBO 2011

Aug 122011

Jim Balhoff and I recently attended the International Conference on Biomedical Ontology (ICBO) held 26-30 July in Buffalo, NY. The conference focused on the use and development of ontologies in the biological and biomedical domains. Of particular interest to Phenoscape were the workshops and tutorials held during the two days before the main conference. Topics included ontology integration, Common Logic, ontology development tools, and using OBO and OWL formats for ontology development and reasoning.

We presented talks at the Facilitating Anatomy Ontology Interoperability workshop. Jim’s talk was on representing taxa as individuals in OWL, an alternative to the common representation of taxa as classes, which facilitates annotation of phenotypic data involving polymorphism and evolutionary reversals.  I presented a lightning talk on the anatomy ontology synchronization requirements for linking evolutionary and model organism phenotypes.  Other presentations from the workshop are available here. We also presented a poster describing the reasoning used in the Phenoscape Knowledgebase.

The main conference included interesting talks on a broad range of topics including the application of ontologies to proteins, diseases, biological mechanisms, and electronic health records. Presentations can be downloaded here.

Introducing the Vertebrate Anatomy Ontology

Jan 122011

The Vertebrate Anatomy Ontology (VAO) was recently developed as a high-level, bridging ontology for existing and future single species (e.g., zebrafish, mouse, Xenopus) and multispecies (teleosts, amphibians) vertebrate ontologies. We initiated VAO at a Phenoscape workshop held at NESCent in April 2010. VAO was developed to accommodate the various ways that biologists classify bones and cartilages, as distinct elements and tissue types, and based on developmental and locational criteria. After substantial review by experts in comparative anatomy, paleontology, systematics, and anatomy ontologies, VAO was submitted to the Open Biological and Biomedical Ontologies (OBO) Foundry and committed in December 2010.  The ontology currently contains 127 defined terms and 63 synonyms for cells, tissues, skeletal elements, skeletal system parts, and biological processes. Cross references to several existing ontologies (Cell Type Ontology, Common Anatomy Reference Ontology, GO Biological Process) are included, thus connecting vertebrate ‘sub’ onotologies to a wealth of additional data.  A mansucript detailing the VAO and evaluating the benefits of its use is in preparation.

New article on the Teleost Anatomy Ontology published in Systematic Biology

Mar 302010

We are pleased to announce the publication of the article “The Teleost Anatomy Ontology: Anatomical Representation for the Genomics Age” in Systematic Biology.  The paper describes how we developed this multispecies anatomy ontology for the annotation of systematic characters, and general solutions to various challenges in representing anatomical structures across a diverse clade of fishes.

Open access links to online versions of the paper are given below:

Wasila M. Dahdul; John G. Lundberg; Peter E. Midford; James P. Balhoff; Hilmar Lapp; Todd J. Vision; Melissa A. Haendel; Monte Westerfield; Paula M. Mabee.  2010.  The Teleost Anatomy Ontology: Anatomical Representation for the Genomics Age.  Systematic Biology.  View full text or download PDF.

