Jan 202016
 

NBO-ABO Merger Workshop Smithsonian, DC 25Oct15-620

This post is a followup to our previous post about integrating the Animal Behavior Ontology (ABO) and the NeuroBehavior Ontology (NBO). This covers the second workshop, a conference call held in early December and the poster one of us (PM) presented at SICB 2016 on January 6.

With additional funding from the Phenotype RCN, on October 24–25, 2015 we held the second workshop to begin the process of merging the ABO and the NBO based on the first workshop’s recommendations. This workshop was held at the Smithsonian Museum in Washington. Attendees included Elissa Chesler, George Gkoutos, David Osumi-Sutherland, and Reid Rumelt (Cornell undergraduate working on media tagging-based research); and workshop organizers Anne Clark, Sue Margulis, Peter Midford, Cynthia Parr, and Katja Schultz (our Local Host). Melissa Haendel participated remotely.

We made good progress getting started on a use-case based paper for applications of a behavior ontology. We also have a real home for the ABO – we deposited the OWL rendering Peter Midford generated in 2006 as the initial commit in a GitHub repository (note that this is the same repository where NBO is maintained).

We started the process of merging the ABO and NBO, our central objective. One of ABO’s strengths is a clear division between observable behavior (acts, events, and processes) and functional interpretations (for example, running vs. fleeing from a predator). The NBO is organized rather differently and we would like the division in ABO to appear at least somewhere in NBO. NBO contains a sizable number of terms not relevant to the behavioral ecology community, just as ABO has terms that are not of current use to the model organism community. We identified a number of stakeholder projects who would be affected and could potentially benefit by the merger, including Virtual Fly Brain, Rat Genome Database, and the International Mouse Phenotype consortium and probably others.

Since the workshop we have had several conference calls with the NBO developers (George Gkoutos and Robert Hoehndorf) to refine the concerns of other stakeholders. Discussion made it clear that NBO is focussed on behavior phenotypes, rather than behavior processes. However, there was some interest in incorporating the ABO functional terms. The thought was that the remaining ABO terms (those referring to events, acts, and processes) should wind up in the Gene Ontology (GO). Several of us are working on the process of merging the functional terms into NBO, and separately, looking through the existing process terms in the GO. We may want to propose a behavior process ontology, at least as a parking place for terms that eventually are added to the GO.

Finally, we presented a poster at the SICB 2016 meeting in Portland, OR on January 6. We will continue to use opportunities like this to discuss the process and implications of this merger with the broader animal behavior and neuroscience communities. We are developing a set of case studies and have outlined a followup paper to highlight both the applications of the outcome of the merging process and lessons learned during that process.

 Posted by on January 20, 2016 at 4:40 pm

Ontology-based text markup tools

 Uncategorized  Comments Off on Ontology-based text markup tools
Jan 142016
 

Efficiently extracting knowledge from the published literature is a challenge faced by many database projects in biology, and many of us are interested in tools that can assist and speed up the task of identifying concepts in free text. I’ve recently used two text markup tools that are helpful in keeping up with the literature and rapidly developing ontologies. As a participant in the Fifth BioCreative Challenge, in which biocurators test and evaluate text mining systems, I evaluated the EXTRACT bookmarklet tool. EXTRACT was developed for metagenomics data and provides full-page tagging of mapped terms from environment, disease, taxonomy, and tissue ontologies, and can also markup shorter selections of text on an HTML page. The tool is immediately useful, particularly during the first stages of the curation process, as a curator is surveying the literature for relevant articles.

Annotating long, descriptive text has also been a challenge for Phenoscape. To assist curators in this task, we recently added a text annotator tool to the Phenoscape Knowledgebase that tags selected text passages copied in from a source with matched terms from anatomy (Uberon), taxon (VTO), and quality (PATO) ontologies. Viewing the annotated results, with color-coded text, has aided curators in the process of applying large, complex ontologies to equally complex text.


Filed under: Uncategorized
Dec 302015
 

Biosphere 2, the site of the final Phenotype RCN Summit meeting (February 2016). Photo (CC BY-NC 2.0) by pinkgranite. See original at https://flic.kr/p/52bMzk.

The Fifth Annual Summit of the Phenotype Ontology Research Coordination Network will be held at the University of Arizona’s Biosphere 2, about 40 miles north of Tucson, AZ, from February 26-28, 2016 (Friday through Sunday noon).

The theme of this meeting will be ‘Complex data integration with phenotypes’ with a focus on the integration of phenotype data with other data sets. We will summarize where our phenotype community is at with respect to integration with other data types, and we will highlight active projects. We will be looking to the future — what projects should be priorities for the future? Joining us this year will be folks from the newly funded ‘FuturePhy’ (futurephy.org), who are interested in how to integrate multiple data types, including phenotype, with phylogenetic trees.

We estimate that the costs for this meeting (transportation to meeting from airport, lodging, food) will be approximately $500, though we will be able to cover expenses for a small number of participants, particularly students and postdocs who have specific interests in using phenotypic data associated with environment in their research. Please contact one of us if you are interested in attending. It should be agreat meeting!

Paula Mabee; pmabee@usd.edu
Eva Huala; huala@acoma.stanford.edu
Andy Deans; adeans@psu.edu
Suzanna Lewis; suzi@berkeleybop.org

The Phenotype Ontology RCN (http://phenotypercn.org) was funded by the U.S. NSF to establish a network of scientists who are interested incomparing phenotypes across species and in developing the tools and methods needed to enable comparisons. In contrast to the many well-established efforts in the molecular community, the representation of phenotypic traits using ontologies is in its infancy. Phenotype ontologies, however, have the potential to integrate these data across all levels of the biological hierarchy and to the environment. This RCN is building a community that, because of its expertise, fosters communications across disciplines to enable co-development of interoperable community standards and best practices for phenotype.

 Posted by on December 30, 2015 at 1:50 am
Dec 212015
 

The following post is from Peter Midford. – Andy Deans

As you may recall, at a Spring 2013 meeting of the Phenotype RCN in Durham, NC, the Behavior Breakout group discussed the existence of multiple behavioral ontologies, including the gaps in existing ontologies (such as the Neuro Behavior Ontology, or NBO) that preclude their widespread use in behavioral ecology and other sub-disciplines in animal behavior. The group felt it could be possible to merge two existing behavioral ontologies – the NBO, developed to serve studies of animal models of human behavioral dysfunction, and the Animal Behavior Ontology or ABO, developed to serve the field of comparative animal behavior, including behavioral ecology and other sub-disciplines. If successful, the merger would facilitate the broader integration of behavioral studies: applied with basic, model organism with comparative investigations, mechanistic with evolutionary, and human with non-human animal questions. At the same time, it would also need to continue to serve the specialized needs of subfields.

In late summer 2014, a small group of animal behaviorists who were present at the 2013 meeting in Durham (Anne Clark, Sue Margulis, Peter Midford, Cynthia Parr) received NSF funding to hold two workshops to accomplish these goals.

Our first workshop, held August 2014 at Princeton University, convened over a dozen animal behaviorists with a broad range of expertise in comparative behavior to develop specific recommendations on how to integrate the basic terms and concepts of the two ontologies. Key outcomes included a list of proposed changes in parent-child relations in the NBO to emphasize function, and ABO term definition improvements that together could serve as the basis of integrating the two ontologies.

Our second workshop, supported in part by additional funding from the Phenotype RCN, was held at the Smithsonian’s National Museum of Natural History, Washington, DC, on October 24-25, 2015. Its specific goal was to start the process of merging the ABO and the NBO based on the first workshop’s recommendations. Attendees in addition to the four organizers, were our local host Katja Schultz (Encyclopedia of Life), Elissa Chesler (The Jackson Laboratory), George Gkoutos (NBO developer, University of Birmingham), David Osumi-Sutherland (European Bioinformatics Institute, Virtual Fly Brain), Melissa Haendel (Oregon Health and Science University), and Reid Rumelt (Cornell University undergraduate working with Macaulay Library and Encyclopedia of Life).

The workshop began with presentations about the histories of NBO and ABO. NBO had its roots in a phenotype vocabulary supporting the EUMORPHIA project (see http://empress.har.mrc.ac.uk/ and http://www.europhenome.org/). Behavior terms were initially included in the Gene Ontology, but also maps to phenotype ontologies, such as the Mammalian Phenotype ontology (MP) and Human Phenotype Ontology so as enable the integration of data. The Neuro Behavior Ontology was created to concentrate effort specifically on behavior.

Slide03

ABO was one of the first accomplishments of the EthoSource project1, begun with an NSF-sponsored workshop in 2000 with the goal of developing integrated online resources for the discipline of Animal Behavior. Two NSF- sponsored Ontology Workshops followed in 2004-2005, at which an international group of animal behaviorists developed a basic metadata standard for the discipline, the ABO. The primary use of the ABO subsequent to 2005 was indexing an online ethogram repository, EthoSearch.org.

In our second blog post, we will summarize the progress we made in the October workshop, and outline our next steps.

1Martins, E. P. 2004. EthoSource: Storing, Sharing, and Combining Behavioral Data. BioScience 54 (10): 886. doi:10.1641/0006-3568(2004)054[0886:ESSACB]2.0.CO;2

 Posted by on December 21, 2015 at 3:49 pm
Dec 182015
 

The following post is from Anne Thessen, who originally published this news on her blog, The Data Detektiv. – Andy Deans

One of the fundamental goals of biology is understanding the interactions of environment and phenotype, but this is a surprisingly difficult topic to study – not because of the concepts, but because of the data. Observations about environment and phenotype occur in separate data sets and the terms used are far too idiosyncratic for automated integration. Several biological domains, including conservation and phylogenetics could be advanced if these two data types could be easily merged on a large scale.

I led a recent paper, published in PeerJ, which suggests that the use of ontologies to standardize and link data about phenotypes and environments can enable scientific breakthroughs by increasing the scale and flexibility of research. This paper was a product of a workshop facilitated by the Phenotype RCN and supported by the National Science Foundation. My co-authors and I give several domain-specific use cases describing how an ontology can help advance science in four biological sciences. We then discuss the challenges to be addressed, present some proof-of-concept analyses, and discuss existing ontologies. The summary contains three suggestions for increasing interoperability between phenotype and environment data.

graphical illustration of the paper described in this blog

Graphical abstract for Thessen et al. (2015) DOI: 10.7717/peerj.1470. Click to enlarge.

We hope this paper provides you with an overview of the landscape of ontologies available for integrating environmental data, and inspires you to use them in relation to your own data. For more information about ontologies and semantics, a good first read is Semantic Web for the Working Ontologist by Dean Allemang and Jim Hendler.

 Posted by on December 18, 2015 at 2:27 am
Jul 272015
 
prickly plant leaf

Succulent plant with interesting adaptive phenotypes. Photo taken at the Rancho Santa Ana Botanic Garden (RSABG) by Manicosity (CC BY-ND 2.0). Click for original.

Supported by a Phenotype RCN collaboration grant, Grant Godden and Pier Luigi Buttigieg met during May 2015 at the Rancho Santa Ana Botanic Garden (RSABG) in Claremont, CA, with the aim of enhancing the ontological representation of plant environments. Grant and Pier processed label data from more than one million plant specimen records hosted by iDigBio, using a combination of natural language processing and text-mining techniques to identify well-represented terms and phrases in “habitat” descriptions. Their interactions with RSABG collections staff, whose active work with specimen digitization and insights into the creation of records that populate repositories like iDigBio, greatly enhanced the project and helped create a workable corpus. The preliminary results of the analyses were immediately informative, revealing gaps in the current coverage of the Environment Ontology (ENVO; Buttigieg et al., 2013).

Further work is planned to refine their computational pipeline and corpus, and to extend ENVO’s coverage of environments which the botanical community frequently sample. A brief publication reporting the process, findings, and results is in preparation.

GG is affiliated with the Rancho Santa Ana Botanic Garden, Claremont, CA, USA. PLB is affiliated with the Alfred-Wegener-Institut, Helmholtz-Zentrum für Polar- und Meeresforschung, Bremerhaven, Germany.

 Posted by on July 27, 2015 at 5:33 pm
Jun 102015
 

Visualizing one or more trees/taxonomies with non-trivial number of characters and taxa is a challenge a number of projects is facing. The ETC project organized a workshop with information visualization experts, data providers (trees and characters), and end users to tackle the challenge together.
The meeting was organized by Hong Cui and hosted by Bertram Ludäscher in the National Center for Supercomputing Applications (NCSA), Urbana, IL, in May 11-13. Phenotype RCN participants Matt Yoder, Nico Franz and Martín Ramírez attended the meeting and posed vis challenges. Much of the workshop was devoted to brainstorm on the challenge of representing a large dataset together with some kind of mapping on a tree, and often on two trees simultaneously. This is a familiar challenge for anatomy ontologists, who are trying to represent the interaction of phylogenetic trees, matrices and ontologies:

Right, the level of anatomical data available for different parts of the fin and limb can be visualized for taxa along the fin to limb transition (figure from Dececchi et al. in press 2015, Systematic Biology, doi: 10.1093/sysbio/syv031). Left, a phylogeny of spiders colored according to anatomical complexity, derived from the ontology (figure from Ramírez & Michalik 2014, doi: 10.1111/cla.12075).

Right, the level of anatomical data available for different parts of the fin and limb can be visualized for taxa along the fin to limb transition (figure from Dececchi et al. in press 2015, Systematic Biology, doi: 10.1093/sysbio/syv031). Left, a phylogeny of spiders colored according to anatomical complexity, derived from the ontology (figure from Ramírez & Michalik 2014, doi: 10.1111/cla.12075).


The beautiful and clever examples presented by the vis experts were inspiring. How these gorgeous examples can help us represent or complex data in intuitive visualizations? Filters, sort controls, heat maps, zoom panes, collapsing, expanding, and more tools – all made in us two effects: Make some of our challenges look feasible, and refine our vague ideas into more precise challenges.
Hierarchical Circular Layouts, or HCL (Dang, Murray, and Forbes 2015), uses circular layouts on a hierarchical structure.

Hierarchical Circular Layouts, or HCL (Dang, Murray, and Forbes 2015), uses circular layouts on a hierarchical structure.


PathwayMatrix: Visualizing Binary Relationships between Proteins in Biological Pathways (Dang, Murray, and Forbes, 2015). PathwayMatrix can be used not only for biological pathway visualization, but also for character and taxon data.  See: http://biovis.net/year/2015/papers/pathwaymatrix-visualizing-binary-relationships-between-proteins-biological-pathways

PathwayMatrix: Visualizing Binary Relationships between Proteins in Biological Pathways (Dang, Murray, and Forbes, 2015). PathwayMatrix can be used not only for biological pathway visualization, but also for character and taxon data. See: http://biovis.net/year/2015/papers/pathwaymatrix-visualizing-binary-relationships-between-proteins-biological-pathways


The ETC project will implement a few promising techniques as part of ETC toolkit to invite comments and suggestions from broader communities. Stay tuned, and crunch your data into nice visualizations!
(post by Martin Ramírez)

Jun 082015
 

CARO/PCO Oregon Summit 2014

Figure 1. SubClass hierarchy of upper-level classes in  the Common Anatomy Reference Ontology (CARO, orange boxes), plus relations to critical external ontology terms from Basic Formal Ontology (BFO, grey boxes), Population and Community Ontology (PCO, yellow box), and Gene Ontology (GO, pink boxes). Terms in light purple boxes are found in multiple ontologies (CL=Cell Ontology). The term ‘Organism’ in the green box is not an ontology term but a class from the Darwin Core Vocabulary that is a subclass of the new CARO term ‘biological entity’. ‘Biological entity’ is a catch-all term for any material entity that is, is part of, or derived from an organism, virus, or viroid, or a collection of them.

Figure 1. SubClass hierarchy of upper-level classes in the Common Anatomy Reference Ontology (CARO, orange boxes), plus relations to critical external ontology terms from Basic Formal Ontology (BFO, grey boxes), Population and Community Ontology (PCO, yellow box), and Gene Ontology (GO, pink boxes). Terms in light purple boxes are found in multiple ontologies (CL=Cell Ontology). The term ‘Organism’ in the green box is not an ontology term but a class from the Darwin Core Vocabulary that is a subclass of the new CARO term ‘biological entity’. ‘Biological entity’ is a catch-all term for any material entity that is, is part of, or derived from an organism, virus, or viroid, or a collection of them.

Before the post-Thanksgiving haze had lifted, a small group of ontologists (Melissa Haendel, Chris Mungall, David Osumi-Sutherland, and Ramona Walls) converged on the lovely small town of Brownsville, Oregon to work on the Common Anatomy Reference Ontology (CARO), the Population and Community Ontology (PCO), and PATO, an ontology of biological qualities. This work was done within the context of the larger group of ontologies that make use of or are used by CARO (UBERON, GO, CL).

CARO is a relatively small upper ontology with ~165 classes and a few core relations that is used to link taxon-specific anatomy ontologies ranging from fruit flies to vertebrates to plants. The 1.0 release of CARO has been widely used, but usage has been quite inconsistent and sometimes incorrect. This is partly due to lack of clarity in some definitions, but also because it was written at a time when we lacked the tools to provide automated reports of incorrect usage.

PCO is recently developed ontology focussing on populations, communities and the relationships between organisms.  The definitions of organism types in CARO are critically important for this ontology, as are the biological qualities applying to groups of organisms in PATO.

PATO, an ontology of biological qualities, has been very widely used by the community brought together by the Phenotype RCN as well as in defining classes in a wide range of other ontologies used by this community (covering phenotypes, anatomy, cell types and populations).  So far, PATO has had limited axiomatisation, but there many obvious cases where axiomatisation could improve its integration with ontologies that use it – including the PCO and anatomy ontologies.

A major aim of our work on CARO at this meeting was to redraft textual definitions so that they could be understood by any competent biologist and to redraft logical definitions so that they could be used for automated classification and error checking. For both logical and textual definitions, we aimed to focus on distinctions that are important to biologists – either directly, or indirectly by making biologically useful queries possible.  We also aimed to take into account new use cases that have arisen since CARO 1.0 was released, as a result of work on the PCO as well as on anatomy ontologies and the ontologies and tools that use them.  In parallel with this work, we aimed to improve related axiomatisation of PATO.

Over two and half days of leftover turkey, home-fermented vegetables, and farm-fresh eggs, we took care of operational issues such as repository maintenance, as well as more hard-core ontologizing. A highlight of the meeting was an informal gathering on Monday night when we were joined by Laurel Cooper and John Campbell from Oregon State University and Joe Fontaine from Murdoch University to discuss the intersections of ontologies, ecology, plant traits, and biodiversity.

Key outputs of the meeting were:

  • A fresh github repo (https://github.com/obophenotype/caro/) for CARO, with cleaned up imports.
  • New CARO terms, including terms for multicellular anatomical structure and expression pattern, and a general term for organs.
  • Revised text and logical definitions for most CARO terms, including anatomical structure, cellular organism, and organ (figure 1, Vue file that shows the key classes and which files they live in).
  • Draft ontology design patterns (ODPs) for expression patterns and for anatomical structures with internal spaces (lumens).
  • Further development of PCO, including updating import files, testing ODPs for defining collections of organisms and species/organism interactions.
  • A pending beta release of CARO2.0 and plans for how to announce it.
  • Better formalization of PATO through general class axioms (GCIs) necessary for CARO and PCO.
  • A Jenkins job that reports on and verifies ontologies that use CARO (FBBT, PO, XAO, and ZFA))
  • A draft paper on CARO2.0.

One of the key use cases for anatomy ontologies is annotation of gene expression, and we wanted a way to help curators avoid the pitfall of annotating expression to the (immaterial) space that is part of a structure rather than the (material) structure that surrounds it. We propose a design pattern in which any structure that has an interior space (such as stomach) would be modeled using four classes: one for the entire structure (which includes both the surrounding structure and the space that is part of it), one for the space, one for the wall (which is just the surrounding structure without the space) and one for “wall region”. A wall region is any portion of the wall that spans the full thickness of the wall for its entire lateral extent, whereas the wall is the mereotopological sum of all wall regions. Following this pattern, an ontology that wished to include a stomach would have classes for “stomach”, “stomach lumen”, “stomach wall”, and “region of stomach wall”. We opted against including very general classes such as “wall” or “wall region” in CARO, and instead plan to document the pattern and provide a template for its use in anatomy ontologies.

One way of specifying the structures such as a stomach that have a geometric component is through the use of GCIs in PATO. PATO includes a number of classes for qualities describing shape. Of these, lumenized, tubular, and saccular are the most relevant to CARO. We began adding GCIs to PATO of the form:

  • bearer_of some lumenized subClassOf ‘has part’ some lumen
  • bearer_of some unlumenized subClassOf not (‘has part’ some lumen)

An open question remains on how to document these patterns (in CARO or as separate patterns). One possibility is for CARO to include abstract geometrical classes such as “anatomical tube” or “anatomical tube wall” and “tube lumen”.

Stay tuned for another post soon, with the upcoming release of CARO!

May 272015
 

How do phenotypic data factor into the issues relating to integrating complex data? Three frequent phenotypers (Ramona Walls, Chris Mungall, and Maryann Martone) were supported by this RCN to participate with sixteen others in an ‘Integrating Complex Data’ workshop organized by the American Institute for Biological Sciences (AIBS) with NSF funding (EF-1450894), on March 30-31 at the Hyatt Regency Crystal City in Arlington, Virginia. The workshop was co-chaired by Paula Mabee, Corinna Gries, and Robert Gropp, facilitated by Kathy Joyce, and observed by various program officers and staff from NSF.

Complex data integration, defined as ‘bringing together data from two or more fields’, is required to address many fundamental scientific questions as well as understanding how to mitigate the challenges facing the planet. Participants (whose research interests ranged from genetics, genomics, metagenomics, systematics, taxonomy, and ecology, to bio/eco-informatics and cyberinfrastructure development) initially discussed specific use cases in which complex data integration was required. They then focused on the barriers that impede integration, recognizing domain silos as major problem at this scale. They illustrated with examples that data discovery and integration are currently hampered by lack of common standards, including those for IDs, representation, ontologies, data formats, data collection, and communication protocols.  The usefulness of ontologies in connecting phenotypic data to other data types across domains was described by Phenotype RCN participants.

Suggestions and next steps required to achieve better data integration were the focus of the second day of the workshop. Community coalescence around shared standards, rather than more standards, was considered key.  Participants advocated for interagency discussions about how to provide linkages across their data systems, thus making data from all sites more readily discoverable and distributing the financial burden.  Participants further recognized that the technical expertise required for complex data integration is high; they promoted cross-training in informatics for graduate students and a higher level of specialist ‘data scientist’ training.  They also felt that funding mechanisms to enable scientists to employ technical specialists for specific data integration tasks would enable complex data integration.  Particularly at this juncture, where cross-domain data analysis is required to address societal problems, participants stressed that it is important to try to solve the immediate problems while working toward long-range solutions.  A full report from this workshop is in preparation and a link will be posted when it is available.

 Posted by on May 27, 2015 at 11:55 pm
Apr 212015
 

Photo used under Creative Commons from Kirt Edblom. https://www.flickr.com/photos/27190564@N02/15496544935

The Alfred P. Sloan Foundation’s Digital Information Technology program has awarded $499K to Phoenix Bioinformatics to catalyze development of creative new user-based funding strategies for research databases.   Phoenix was founded in 2013 by the staff of the Arabidopsis Information Resource (TAIR) to provide new support mechanisms as TAIR transitioned away from grant-based funding. Following its success with TAIR, Phoenix will be assisting other databases with their funding challenges and helping them find new ways to sustain their projects for the long term with community help. For more information about TAIR’s transition to sustainable funding or the newly funded Phoenix project please contact Phoenix Bioinformatics.

Phoenix Bioinformatics is a nonprofit 501(c)3 organization dedicated to finding innovative ways to sustain critical scientific resources.

 Posted by on April 21, 2015 at 7:35 pm