Computable evolutionary phenotype knowledge: a hands-on workshop

Sep 302017

Call for Participation:

Computable evolutionary phenotype knowledge: a hands-on workshop

The Phenoscape project is hosting a hands-on workshop on Dec 11-14, 2017, at Duke University in Durham, North Carolina.

Evolutionary phenotype data that is amenable to computational data science, including computation-driven discovery, remains relatively new to science. Therefore use-cases and applications that effectively exploit these new capabilities are only beginning to emerge. If you are interested in discovering, linking to, recombining, or computing with machine-interpretable evolutionary phenotypes, this is the workshop for you!

The event will bring together a diverse group of people to collaboratively design and work hands-on on targets of their interest that take advantage and promote reuse of Phenoscape’s online evolutionary data resources and services. The event is designed as a hands-on unconference-style workshop. Participants will break into subgroups to collaboratively tackle self-selected
work targets.

The full Call for Participation, including motivation and scope, is posted here: https://hackmd.io/s/Sk6Xa7Eq-#

To apply to participate in the event, please fill out the application form by Oct 9, 2017. Travel sponsorship is available but limited, as is space.

Report from Tucson: from characters to annotations with text mining

Mar 302013

There is a wealth of phenotypic information in the evolutionary literature that comes in the the form of semi-structured character state descriptions. To get that information into computable form is, right now, an awfully slow process. In Phenoscape I, we estimated that it took about five person-years in total to curate semantic phenotype anphenowordcloudnotations from 47 papers. If we are to get computable evolutionary phenotypes from a larger slice of the literature, we really need to figure out ways to speed this up.

One promising approach is to use text-mining.  This could contribute in a few different ways.  First, one could efficiently identify all the terms in the text that are not currently represented in ontologies and add them en masse, so that data curation does not have to stop and resume whenever such terms are encountered. Second, one could present a human curator with suggestions for what terms to use and what relations those terms have to one another, speeding the process of composing an annotation.

CharaParser, developed by Hong Cui at the University of Arizona, is an expert-based system that decomposes character descriptions into recognizable grammatical components, and it is now being used in several different biodiversity informatics projects. Baseline evaluation results from BioCreative III showed that a naive workflow combining CharaParser and Phenex, the software curators use to compose ontological annotations and relate them to character states, was capable of identifying candidate entity and quality phrases (it outperformed biocurators by 20% in recall on average) but had difficulty translating those into ontological annotations.  This first iteration workflow also was not yet reducing curation time.

In March, a small contingent from NESCent (Jim Balhoff, Hilmar Lapp and Todd Vision) visited Hong Cui’s group in Tucson. We talked through improvements to CharaParser and the curation workflow, brainstormed plans for a more thorough set of evaluation tests, began refactoring of the code so that it can be more easily shared across projects, and gained a better understanding of what features make a character difficult to curate for humans vs. text-mining.  We made substantial progress on all fronts, and are looking forward to seeing how much improvement in the accuracy and efficiency of curation will be achieved in the next round of testing.

We are also pleased to report that the CharaParser codebase will now be available from GitHub under an open source (MIT) license.

Homology in anatomy ontologies: Report from a Phenotype RCN meeting

Feb 262013

At the end of October 2012, the working groups of the Phenotype Research Coordination Network (RCN) all met at the Asilomar Conference Center, in Pacific Grove, CA. One of the groups, the Vertebrate working group, made it their goal to discuss methods of representing phylogenetic and serial homology in anatomy ontologies, an issue that is central to Phenoscape as well. Though common ancestry is implicit in the semantics of many classes and subclass relationships (see for example the ‘homology_notes’ for digit in Uberon), most multispecies anatomy ontologies, including Uberon, VSAO, and TAO, do not assert homology relationships between anatomical entities.  Nonetheless, homology is central to comparative biology, and therefore to enriching computations across data types, species, and evolutionary change.

The working group used ontological relationships, phenotypes, and homology assertions across a small set of skeletal elements from vertebrate fins and limbs as a test case to identify requirements for making and reasoning over homology assertions. These included both positive (data expected to be returned) and negative (data expected not to be returned) results for particular queries involving phylogenetic and serial homology.  The group developed a number of such queries across subtype (is_a) and partonomy (part_of) relationships.  One example is that without homology assertions a query for phenotypes involving the ‘humerus’ would not retrieve phenotypes for ‘femur’.  Asserting that the ‘forelimb skeleton’ is serially homologous to the ‘hindlimb skeleton’ would not remedy this, because doing so would not imply that their parts (humerus and femur, respectively) would be homologous as well.  Instead, serial homology must be directly asserted between entities, even when they are parts of other already homologous structures (i.e., in this case humerus and femur have to also be directly asserted to be serial homologues).  Conversely, it was determined that homology relations, both serial and phylogenetic, should propagate to subclasses. For example, to return phenotypes for types of both the ‘paired fin skeleton’ and the ‘skeleton of limb’ in a query for either requires asserting phylogenetic homology only for these high-level classes. With this assertion propagating to all their subclasses, such as  ‘pectoral fin skeleton’, ‘hindlimb skeleton’, or ‘autopodial skeleton’, phenotypes for any of their subtypes would then also be returned.  The group also discussed how to define the identity of elements of a series consistently and ideally, universally.  The consensus was to specify subsets of digits for different taxa with different conventions, e.g., a basal tetrapod subset and a bird subset.

In summary, as identified at the workshop the requirements for reasoning over both phylogenetic and serial homology turned out to be fully consistent with standard OWL property semantics. Furthermore, the recommendations that emerged from the workshop for defining elements in a repeated series are fully in line with the goal of defining classes in anatomy ontologies such that they can be applied unambiguously, including in a manner that is not inconsistent with knowledge of developmental and evolutionary origin.

Aside from several Phenoscape personnel (Jim Balhoff,  David Blackburn, Alex Dececchi, Hilmar Lapp, Paula Mabee, Chris Mungall), participants in the meeting included Eric KansaHans Larsson and Karen Sears, who were new to the RCN (and Phenoscape). We are grateful to them for helping us work through the questions in a way that kept it grounded in enabling science.

