Apr 292013
 

Are you interested in describing and linking biological data? Apply by June 1, 2013, for the Ontologies for Evolutionary Biology course at the National Evolutionary Synthesis Center, July 29 – August 3 in Durham, NC:

https://academy.nescent.org/wiki/Ontologies_for_evolutionary_biology

Evolutionary research has been revolutionized by the explosion of genetic information available, and ontologies must play a central role in relating this knowledge to observable diversity. Ontologies provide scaffolding that interconnects many kinds of observations; across species, they provide evolutionary, developmental, and mechanistic insights. The theme for this year’s course is “enrichment”. We aim to help participants enrich their research through the use of ontologies, to enrich existing ontologies with new content, and to bring new domain expertise to the ontology community.

Mar 302013
 

There is a wealth of phenotypic information in the evolutionary literature that comes in the the form of semi-structured character state descriptions. To get that information into computable form is, right now, an awfully slow process. In Phenoscape I, we estimated that it took about five person-years in total to curate semantic phenotype anphenowordcloudnotations from 47 papers. If we are to get computable evolutionary phenotypes from a larger slice of the literature, we really need to figure out ways to speed this up.

One promising approach is to use text-mining.  This could contribute in a few different ways.  First, one could efficiently identify all the terms in the text that are not currently represented in ontologies and add them en masse, so that data curation does not have to stop and resume whenever such terms are encountered. Second, one could present a human curator with suggestions for what terms to use and what relations those terms have to one another, speeding the process of composing an annotation.

CharaParser, developed by Hong Cui at the University of Arizona, is an expert-based system that decomposes character descriptions into recognizable grammatical components, and it is now being used in several different biodiversity informatics projects. Baseline evaluation results from BioCreative III showed that a naive workflow combining CharaParser and Phenex, the software curators use to compose ontological annotations and relate them to character states, was capable of identifying candidate entity and quality phrases (it outperformed biocurators by 20% in recall on average) but had difficulty translating those into ontological annotations.  This first iteration workflow also was not yet reducing curation time.

In March, a small contingent from NESCent (Jim Balhoff, Hilmar Lapp and Todd Vision) visited Hong Cui’s group in Tucson. We talked through improvements to CharaParser and the curation workflow, brainstormed plans for a more thorough set of evaluation tests, began refactoring of the code so that it can be more easily shared across projects, and gained a better understanding of what features make a character difficult to curate for humans vs. text-mining.  We made substantial progress on all fronts, and are looking forward to seeing how much improvement in the accuracy and efficiency of curation will be achieved in the next round of testing.

We are also pleased to report that the CharaParser codebase will now be available from GitHub under an open source (MIT) license.


Filed under: Data Curation, NLP, Ontology, Phenex, Software
Mar 152013
 

By Peter Midford and George Gkoutos

We held a one-day behavior ontologies workshop on Sunday, February 24, immediately prior to this
year’s RCN summit. Our goals were to bring ontology developers and behavioral biologists together
to review the NBO (NeuroBehavior Ontology) as well as discuss its use and interoperability with other
ontologies. We started the day with a series of short talks: George Gkoutos and Robert Hoehndorf
explaining the development and initial applications of the NBO, followed by six speakers who
volunteered to discuss related topics.

Beorn Brembs presented a data workflow that captured Drosophila movements in the course of
a ‘choice’ experiment. The flow went from raw video to depositing data in figshare, via R, and finished
by showing the role of NBO annotations in the final deposit. Melissa Haendel raised several issues
related to capturing behavior observations using ontologies: What does behavior inhere in? How to
relate observations across species? How do measurements and observations relate to phenotypes
or conditions? David Osumi-Sutherland discussed the application of behavior terms in annotations
within the Virtual Fly Brain (http://www.virtualflybrain.org). Janna Hastings discussed two new
ontologies for Emotions (https://code.google.com/p/emotion-ontology/) and Mental Functioning
(https://code.google.com/p/mental-functioning-ontology/) and both their relationships to the NBO and
their application to mental disease. Christine Wall introduced an ontology of processes involved in
mammalian feeding, which looked like a good candidate for inclusion in NBO and raised important
questions of representation of sequential behavior events and behaviors existing on a continuum.
Finally, Allan Kalueff introduced a community developed catalog of zebrafish behavior.

We followed this with a morning breakout session with groups selected by areas of taxonomic focus:
arthropods, non-mammalian vertebrates, non-human mammals, and humans. When the breakout
groups reported out, there were some common concerns about taxon specificity of terms, both in text
definitions and in their placement in the hierarchy – the later potentially leading to incorrect inferences
for taxa not considered during development of the ontology. There were questions about behaviors,
social and otherwise, involving more than one organism, and the role of abnormal and ‘clinic’ behavior
phenotypes. Finally one group looked at several previous efforts to construct behavior ontologies (e.g.,
the ABO constructed at a series of workshops, and David Shotton’s SABO project).

After lunch, we proposed and discussed topics for a new set of breakouts, and settled on Application
to Behavioral Ecology, Representing Affective Behavior, and a group reviewing the behavior process
branch, with NBO developers George Gkoutos and Robert Hoehndorf soliciting suggestions for high
priority changes.

The Behavioral Ecology session brought a group of behavioral ecologists together with Chris Mungall
to discuss the ABO ontology and how it might be integrated with the NBO. Anne Clark and Sue
Margulis discussed how the ABO had been used in the development of the Ethosearch tool, an online
collection of text ethograms indexed with terms from the ABO. They had also written, and offered
to contribute a collection of text definitions they had developed during the Ethosearch effort. The
consensus was that the ontologies were fairly compatible and that it would be desirable to graft portions
of the NBO in the ABO. The group also agreed that the learning and cognition sections of the NBO
should be a priority area for review as both structure and definitions suffered from species specificity.

The review group wound up focussing on terms for voluntary and involuntary movement, an issue that
came up in the invertebrate morning breakout as well. There was discussion of reflexes, of which the
NBO has a large number, many of which are human or mammal specific, but of significant clinical
interest.

The report-out from the affective behavior group generated a lively discussion that started by
addressing the conflation between observable behavior (e.g., smiling) and an inferred diagnosis
(emotional happiness). Although this distinction between observable behavior and inferred emotion
(which might belong in the emotion-ontology) is straightforward, other behavior terms (‘agoraphobic
behavior’) conflate behavior and diagnosis. There was also discussion of fear-related terms in general
and whether these might be too human-centric and what the scope of the NBO was; in particular
would the NBO apply to plants or even paramecia, which have been the subject of multiple ethograms
in the past 15 years. The consensus appeared to be that NBO should apply to animals with nervous
systems, that other types of behavior ought to be welcome additions to the Biological Process branch
of the Gene Ontology. There was also discussion of terms of the form ‘behavioral control of x’ where
x was a process, such as defecation or lacrimation, was meaningfully different from the underlying
physiological process.

The discussion of affective terms provided a nice transition to Barry Smith’s presentation ‘On the
Future of the NeuroBehavior Ontology and Its Relation to the Mental Functioning Ontology.’ After
reviewing the partitioning of domains of biological knowledge by various OBO ontologies, Barry
made the case that the portion of Biological Process that applied to whole organisms needed to be
split between the NBO for observable behavior and the complementary Mental Functioning Ontology
(MFO). The MFO will cover terms related to mental states and processes, for example sensory
perception. Perception is not an observable behavior, though there are behaviors associated with
perception (e.g., head turning, flehman response). He recommended that NBO retain the prefix NBO,
but be considered the (narrow) Behavior Ontology. He also recommended that the feeding ontology
developed by the FEED project be incorporated into NBO, that merging the ABO ontology should
be explored, perhaps scoping behavioral terms taxonomically (e.g, with ‘occurs-in-taxon’) when
appropriate, and to create a separate version of the NBO that marks the human-specific terms. He also
thought we shouldn’t be spending a lot of time discussing what is and isn’t behavior.

We finished the day with a discussion of next steps and deciding what are the best routes for providing
feedback to the NBO developers. In regard to feedback routes, we looked at several options: the OBO-
behavior list, the tracker associated with the NBO repository on google-code, as well as the notes
mechanism within the NCBO Bioportal and Ontobee. We decided that the OBO-behavior list and the
google tracker were adequate at this time. George also said he would add some new committers to
facilitate additions from the FEED ontology, the ABO, as well as terms from the community-developed
list of zebrafish terms in collaboration with ZFIN (this has been done).

We discussed next steps, in terms of ontology work, publications and funding. There was interest
in proposing a behavior ontology focussed RCN to fund workshops. There is interest in among the
behavioral ecology attendees in proposing two followups, the first being a hackathon for ontology
developers (perhaps 3 days) to clear up the ontology issues in the NBO (such as the relation between
the process and phenotype branches) which could be followed up by a workshop for a review from
the perspective of behavioral biology, perhaps in the space between ISBE and ABS in summer 2014.
There is sense that prior to seeking major funding, we should generate more publications, and that the
data and use cases are there to demonstrate the value of behavior ontologies, as several presentations
during the day had already demonstrated. One suggestion was to look at disease terms in the NBO
and look for clusters of behavior phenotypes associated with those terms. Given the importance of
behavior in model organism communities, the group expected that both the NIH and EU funding
agencies would be interested in supporting further work with the NBO.

We broke up shortly before 6 PM, though the behavior thread continued throughout the RCN summit. The application of ontologies to behavior still lags behind the use of anatomy ontologies, but combining the opportunities to benefit from the experiences with anatomical ontologies and the enthusiasm expressed at the workshop, there is reason to be optimistic about the future development and application of behavior ontologies.

Feb 262013
 

At the end of October 2012, the working groups of the Phenotype Research Coordination Network (RCN) all met at the Asilomar Conference Center, in Pacific Grove, CA. One of the groups, the Vertebrate working group, made it their goal to discuss methods of representing phylogenetic and serial homology in anatomy ontologies, an issue that is central to Phenoscape as well. Though common ancestry is implicit in the semantics of many classes and subclass relationships (see for example the ‘homology_notes’ for digit in Uberon), most multispecies anatomy ontologies, including Uberon, VSAO, and TAO, do not assert homology relationships between anatomical entities.  Nonetheless, homology is central to comparative biology, and therefore to enriching computations across data types, species, and evolutionary change.


The working group used ontological relationships, phenotypes, and homology assertions across a small set of skeletal elements from vertebrate fins and limbs as a test case to identify requirements for making and reasoning over homology assertions. These included both positive (data expected to be returned) and negative (data expected not to be returned) results for particular queries involving phylogenetic and serial homology.  The group developed a number of such queries across subtype (is_a) and partonomy (part_of) relationships.  One example is that without homology assertions a query for phenotypes involving the ‘humerus’ would not retrieve phenotypes for ‘femur’.  Asserting that the ‘forelimb skeleton’ is serially homologous to the ‘hindlimb skeleton’ would not remedy this, because doing so would not imply that their parts (humerus and femur, respectively) would be homologous as well.  Instead, serial homology must be directly asserted between entities, even when they are parts of other already homologous structures (i.e., in this case humerus and femur have to also be directly asserted to be serial homologues).  Conversely, it was determined that homology relations, both serial and phylogenetic, should propagate to subclasses. For example, to return phenotypes for types of both the ‘paired fin skeleton’ and the ‘skeleton of limb’ in a query for either requires asserting phylogenetic homology only for these high-level classes. With this assertion propagating to all their subclasses, such as  ‘pectoral fin skeleton’, ‘hindlimb skeleton’, or ‘autopodial skeleton’, phenotypes for any of their subtypes would then also be returned.  The group also discussed how to define the identity of elements of a series consistently and ideally, universally.  The consensus was to specify subsets of digits for different taxa with different conventions, e.g., a basal tetrapod subset and a bird subset.

In summary, as identified at the workshop the requirements for reasoning over both phylogenetic and serial homology turned out to be fully consistent with standard OWL property semantics. Furthermore, the recommendations that emerged from the workshop for defining elements in a repeated series are fully in line with the goal of defining classes in anatomy ontologies such that they can be applied unambiguously, including in a manner that is not inconsistent with knowledge of developmental and evolutionary origin.

Aside from several Phenoscape personnel (Jim Balhoff,  David Blackburn, Alex Dececchi, Hilmar Lapp, Paula Mabee, Chris Mungall), participants in the meeting included Eric KansaHans Larsson and Karen Sears, who were new to the RCN (and Phenoscape). We are grateful to them for helping us work through the questions in a way that kept it grounded in enabling science.


Filed under: Anatomy Ontology, Homology, Ontology, Vertebrates, Workshops
Feb 242013
 

A chick embryo anatomical ontology was recently created by members of the eChickAtlas
project (www.echickatlas.org, funded by BBSRC) and the group at AgBase is developing an
adult chick anatomical ontology. The RCN Phenotype project provided funding for Frances
Wong (eChickAtlas) to visit Fiona McCarthy at the University of Arizona to establish a
collaboration to merge these two anatomical ontologies.

The two anatomical ontologies had been set up independently and thus were quite different
in structure. The first step was to establish the best way forward for the integration
process. We called upon members of the RCN Phenotype (Erik Segerdell, Chris Mungall,
and Melissa Haendel) for advice. During this call, we discussed fundamental aspects of
ontology development and strategies on how to proceed to create a single chick anatomical
ontology with both our files. We also went on to discuss the access and availability of a
single working file to allow us to continue our collaborative work at different locations
once this visit has ended. Based upon this discussion we came up with a general plan for
integrating the two ontologies. We then plan to share the integrated file with members of
the RCN Phenotype and Chris has indicated that he and Erik will be able to review this file
in comparison to the Uberon Chick terms. In addition, we also spoke with Terry Hayamizu
(Mouse Genome Informatics) about proposed changes to the mouse anatomy ontology.

Following the advice from the RCN Phenotype call, our plan for ontology development was
to re-structure higher level terms of the chick anatomy ontology to make it consistent with
other anatomy ontologies (notably, Uberon, mouse and Xenopus). During our remaining
time together we specifically focused on organizing these higher order terms so that we
would have a standardized framework. We did this by identifying general chick terms which
exist in both embryo and adult and using these terms for the higher order of the standardized
framework. We then plan to organize the embryological and adult terms into this ontology
structure. We are also working to improve the existing adult anatomy, removing many terms
that are not anatomical, reviewing definitions and relationships and fixing cross-references
to Uberon. Once we have this first review re-structured and completed, with both the
embryological and adult terms incorporated to create a single file, we will request for Chris
and Erik to review and bring in Uberon terms, where required.

In addition, Frances reported back to the eChickAtlas group and we are expecting to hear
from them regarding sharing the single ontology file in a way the enables subversioning. We
are recommending the implementation of a subversioning system that allows (1) our two
groups to continue developing this ontology together and (2) allows other anatomy ontology
experts from the RCN Phenotype to review this file.

We are grateful to the RCN Phenotype for their funding and support with this collaborative
project.

Aug 042012
 

We started the the last day of our course with presentation and discussion of group ontology exercises and issues facing the different domains, such as nomenclature and a nice example of linking taxon-specific anatomies with mineralized properties of tissues. We discussed an evolutionary time-analogy to developmental time for paleontology, which resulted in the question  “Is the big bang the first temporal boundary?”

We then moved on to discussion of what it takes to release an ontology in different flavors, and decided that a flavor is a slice of a version (!). Was it too early to think about discussion of ontology release processes? We felt not, because because these are end goals to keep in mind as we go about our work in developing anatomy ontologies.

We dealt with a number of additional technical issues, such as choosing a primary axis of classification and how this relates to properties (a nice realization). You CAN in fact, have structural and partonomy hierarchies exist in harmony. Next, we tackled the beforehand taboo topic of homology. The intentional delay in discussing homology until the last day paid off, as the presentation of how the community is currently representing homology hypotheses external to the anatomy ontologies seemed to go over without too much controversy.

We concluded the week with a gong show of 5 minute lightning talks of students discussing some problem, perspective, or plan. These turned out to be a great highlight of the week, and it was suggested that they should be emphasized in future years of the course (which the instructors think should be taught in Borneo).

An extra special thank you to the NESCent staff, in particular Jeff and Karen, who were flawless in helping make the week go super smoothly. Further thanks goes to the Phenotype RCN for suggesting and co-sponsoring the course. We are happy to announce a new graduating class of peer anatomy ontologists. We welcome your participation in our community in the years to come, and thank you for your enthusiasm.

Melissa Haendel, Matt Yoder, Carlo Torniai, Jim Balhoff, and Erik Segerdell

Ontologies can be released in multiple formats using the OBO Oort tool.

Aug 032012
 

Whew, just made it out of OWL hell (not really, we didn’t actually manually edit an OWL file, but we scared everyone enough not to do it) in time for a combined Wednesday/Thursday blog post.  Wednesday was completely devoted to an introduction to the Web Ontology Language, WOL, OWL.  After some morning theory, participants dug into (and finished!) a 41 page tutorial (not including 5 additional exercises) on the use of Protégé.  This was perhaps the trickiest day of the workshop with a lot of potential for confusion, but the relative silence and lack of gnashing of teeth foreshadowed a successful day.

Thursday, with OWL basics in hand, we moved on to a tutorial about importing other ontologies. Where to get them, how to get them, how to tell Protégé what to do with them. But of course, it is not always convenient to import a whole other ontology, so we learned the principles of “MIREOT”, or the minimum information required to reference an external ontology term. This required the complex navigation of the Ontofox tool to obtain bits and pieces of external ontologies. We then learned how to put them back together again (in Protégé).

We also had a lot of fun classifying the Balhoff family tree, as well as blue jeeps, as we learned to classify instances and use them for testing our anonymous classes. We learned to install Protégé plugins and link images within our ontology. Including pictures of Jim’s baby, hymenoptera heads, and blue jeeps.

At the end of the day, we broke into groups to brainstorm what would be needed to represent spatial relations and classes for the postcomposition of anatomy classes. This was done without peeking at the Spatial Ontology (we think!), in order to gain unbiased requirements for this ontology and get the participants thinking about spatial relationships.

On Thursday night some of the participants gathered in Robert’s room to listen to his daughter Leah’s music on the fabulous MTV show, Snooki and Jwoww. To everyone’s disappointment, her music was only on for around 10 seconds (although the show was fabulous). However, you can listen to her music here.

Matt Yoder, Melissa Haendel, Carlo Torniai, Jim Balhoff, and Erik Segerdell


Aug 012012
 

The second day began with participants a little more rested (if not a little more confused).  We started the day with an impromptu follow up to the SVN tutorial where we illustrated how repository conflicts can be resolved, and files can be moved/deleted and reverted using SVN.  And there was much understanding.  We followed up this discussion with a review of the previous days’ ontology design documents that were developed in CMAP.  This lead to some interesting give-and-take about some specific anatomical problems (nervous system), and ultimately helped to illustrate the point that use-cases and a-priori planning are critical to the developmental goals of an anatomical ontology.

The primary topic of the day was an introduction to the OBO world.  We started with the ‘evilness’ of an OBO file format (an inside joke, see the corresponding powerpoint), and then moved into a OBO tutorial focused on search, queries, and navigation.  This allowed course participants to have an introduction to some basic reasoning, as well as to learn some of hte visualization tools available in OBOEdit that are not available in Protege. Unfortunately we were using the latest version of OBOEdit, and were thus acting as beta testers–this resulted in a completely unstable platform for a few of our Windows-using participants.

Visualization in OBO-Edit

Following this healthy mix of relatively abstract thinking and pragmatic application, we moved to a lecture and demonstration lead by Jim Balhoff on phenotype annotation using the “EQ” syntax, and post-composition. Jim demonstrated the basics of Phenex configuration and use and also some examples derived from the HAO’s approach to instance and phenotype modelling with respect to semantic phenotypes.

For supper we returned to the previous evening’s locale (it was that good) for beverages and elevated-debate.

Matt Yoder, Melissa Haendel, Jim Balhoff, and Erik Segerdell

Jul 312012
 

Today was the first day of the RCN co-sponsored Anatomy Ontology Course. The course includes an introduction to ontology principles, including basic logical reasoning, version control, use of ontology editors, and community resources, standards and activities. The course has twelve students in attendance, ranging from experts in paleobiology and the fin to limb transition, to comparative vertebrate embryogenesis, cartilaginous fishes, weevils, gastropod and cephalopod molluscs, arthropod circulatory systems, taxonomic nomenclature, plant disease, electric fish morphology, mammalian feeding muscle systems, food web representation, and anatomy of model organisms such as zebrafish and mouse.

The morning started with a bit of a grind, with lots of lecturing to broadly cover the basic principles of ontology logic and anatomical ontology development.  In between lectures a breakout exercise focused on answering the question “Are ontologies for me?”.  This lead to a nice round of discussion as to the specific problems associated with instantiating an anatomy ontology.

In the afternoon participants learned the basics of SVN (http://subversion.tigris.org/), and important tool used throughout the anatomical ontology community for managing the ontology-related metadata.  As part of the exercise participants collaboratively built a “pass-it-on-story”: “ It was a dark and stormy nite. And I forgot an umbrella. Melissa’s cat Peanut is sleeping. The rain falling on your face…The roar of the tyrannosaur woke the cat up and it jumped off the sofa. The cat scratched my face on the way off the coutch, it hurt me. Your mother. That was a strange dream, must stop eating so late at night!” To conclude work-day we learned to develop design documents for ontologies using CMAP (http://cmap.ihmc.us/) and parts of the nervous system. It is amazing to see how the classification schemes compared to each other and to existing anatomy ontologies.

A collage of cmaps designed by Anatomy Course participants.

To polish off the evening, participants gathered at Geer Street Garden for some excellent food and a nice round of one-to-one conversations.  Not unsurprisingly, the topic of homology was almost instantly raised during the day (and somewhat quickly squashed- it’s a dangerous path), but over beverages became a welcome topic.

Matt Yoder, Melissa Haendel, and Erik Segerdell

Jul 182012
 

I received funding for a Collaborative Exchange Opportunity through the Phenotype RCN and I visited Chris Mungall at Berkeley two weeks ago. I am a member of the group working on the FEED Database project (http://www.feedexp.org/wiki/Public:Feeding_Experiments_End-User_Database), which began as a NESCent working group and is now funded directly by NSF. The aim of the FEED project is to create an online database that will act as a repository for physiologic and kinematic data on feeding and other behaviors of mammals. As part of the project we are working to create definitions for musculo-skeletal structures of the head and neck and for behaviors of the oro-pharyngeal complex that will work for all eutherians and metatherians. At a meeting in May, we created the spreadsheet of definitions that some of you may have seen on Phenoscape curator listserve and another one for oro-pharyngeal behaviors. As a newcomer to the field of ontology, I will be attending the course at NESCent at the end of the month, but I am anxious to continue the work we began in May so I met with Chris to figure out how to proceed.  I was very happy with what we accomplished in a very short time.  Here is a brief summary of our work:
* We looked at Phenote as an alternative to using an Excel spreadsheet for curating muscle properties, and decided this might work quite well. The Phenote file should be checked into phenoscape svn somewhere, but it is probably a bit premature to do this. The resulting properties can then be automatically merged into an ontology.
* We added the current spreadsheet FEED definitions into Uberon using the external_definition annotation property. See for example – http://purl.obolibrary.org/obo/UBERON_0001597 (it doesn’t look perfect in OntoBee yet, but soon it will have a link to feedexp.org).
* When the FEED group met and created the spreadsheet of definitions we realized very quickly that any definition of a muscle that is applicable to all mammals must be very generic. This takes a lot of “getting used to” for anatomists (for an example, take a look at our recent exchange regarding the genioglossus muscle on the Phenoscape curators listserve). We have found that the definitions of the head and neck muscles that we have looked at in Uberon were taken from dbpedia and are specific to human anatomy. These definitions must be revised to be useful for comparative anatomical studies.
* As a first try, we looked at some of the attachment sites already in Uberon taken from dbpedia and improved some of them (e.g. buccinator).
* I would like to be able to create character state matricies from our annotated database so we attempted to use Phenex to annotate the character state matrix in my paper from a recent FEED symposium (Druzinsky et al 2011) paper but we didn’t get very far. We thought it would be useful to cover Phenex in the upcoming course.
* We did a whirlwind Protege tour including some fairly complex features such as axiom annotations (i.e. per-statement attribution).
* We looked briefly at NBO and at the behavior spreadsheet. I am going to contact some of the FEED collaborators to investigate the possibility of getting these into NBO.

Robert Druzinsky (U. of Illinois, Chicago) with lots of help from Chris Mungall (U.C. Berkeley) and thanks to Heiko Dietze