Computable evolutionary phenotype knowledge: a hands-on workshop

 community, Informatics, Knowledge Base, knowledgebase, Outreach, Software, Workshops  Comments Off on Computable evolutionary phenotype knowledge: a hands-on workshop
Sep 302017

Call for Participation:

Computable evolutionary phenotype knowledge: a hands-on workshop

The Phenoscape project is hosting a hands-on workshop on Dec 11-14, 2017, at Duke University in Durham, North Carolina.

Evolutionary phenotype data that is amenable to computational data science, including computation-driven discovery, remains relatively new to science. Therefore use-cases and applications that effectively exploit these new capabilities are only beginning to emerge. If you are interested in discovering, linking to, recombining, or computing with machine-interpretable evolutionary phenotypes, this is the workshop for you!

The event will bring together a diverse group of people to collaboratively design and work hands-on on targets of their interest that take advantage and promote reuse of Phenoscape’s online evolutionary data resources and services. The event is designed as a hands-on unconference-style workshop. Participants will break into subgroups to collaboratively tackle self-selected
work targets.

The full Call for Participation, including motivation and scope, is posted here:

To apply to participate in the event, please fill out the application form by Oct 9, 2017. Travel sponsorship is available but limited, as is space.

Filed under: Informatics, Knowledge Base, Outreach, Workshops Tagged: community, knowledgebase, Software, Workshops
Jul 212014

Over the years, a number of different vertebrate anatomy ontologies have been developed. Some of these are dedicated to a single model species, or to human. Others have been developed to describe phenotypic variation across species, and these cover a broad range of species. In particular:

This lead to considerable duplication of effort, as common anatomical
structures such as ‘pectoral girdle‘ were represented in all five ontologies
(as well as their single species counterparts):

Haendel et al fig 1

Pectoral girdle and related concepts in Uberon, with cross-references to other ontologies shown (Fig 1, Haendel et al)

It was difficult for the Phenoscape group to integrate data across all these ontologies, as this required that curators kept mutual cross-references up to date, a time-consuming and error-prone task.

As a result, the maintainers of these ontologies agreed to join forces and build a common ontology.This work is described in a new paper in the ontologies special issue of the Journal of Biomedical Semantics:

Haendel MA, Balhoff JP, Bastian FB et al  Unification of multi-species vertebrate anatomy ontologies for comparative biology in Uberon Journal of Biomedical Semantics 2014, 5:21  doi:10.1186/2041-1480-5-21

The group selected Uberon as the core ontology, as it had the broadest coverage, was already well-integrated with the single-species ontologies, and was adapted for OWL reasoning. The curators of these ontologies worked long and hard to integrate their work, with input from anatomy experts and developers of single-species ontologies, revealing many interesting differences in the way structures are represented across species along the way. For example, the representation of teeth in the combined ontology had to be flexible enough to accommodate teeth that are in widely variable locations and configurations:

Figure 4. Diversity of tooth locations

The number of classes merged is shown in figure 2 of the paper:

Figure 2. Overlap and contributions from source ontologies. A) Venn diagram showing the extent of cross-referenced content between msAOs prior to the merge.. B) Ontology evolution and integration into Uberon


As a result of this effort, we have a common anatomy ontology with broad and deep coverage for vertebrate anatomy. For a variety of viewing options, see the Uberon website. For examples of use for data integration see:

Like most ontologies, work is ongoing and we are constantly striving to improve depth, coverage and quality. We’re currently actively improving the representation of facial muscles in the ontology based on the FEED ontology. We are also working on a federated approach for bringing in invertebrate anatomy ontologies, many of which are developed under the auspices of the Phenotype RCN  including the Arthropod Anatomy Ontology, the Poriferan anatomy ontology [Thacker et al, accepted, JBMS], the cephalopod  ontology and the ctenophore ontology. We welcome feedback from everyone!

Mar 012013


We’re looking for a few good interns! Photo by Gilles San Martin. (CC BY-SA 2.0)

The Frost Entomological Museum at Penn State seeks undergraduate summer (2013) interns to assist with projects related to insect phenotype data, especially in the context of systematics and evolution. Interns will be exposed to a broad array of biodiversity informatics tools, including ontologies, and will learn aspects of specimen collection, handling, and curation. Applications are due March 20, 2013. More information is available at

Feb 282013

As you made have heard, the Google Summer of Code [1] is on again for 2013! NESCent is again putting together an application as an organization. The NESCent co-admins are me and Jim Proctor, along with Rutger Vos as proto-admin. Participating as an organization is very competitive, and a good suite of project ideas are a critical part of our application.

This is official call for project ideas. This year, instead of using a mailing list for discussion of project ideas, we are going to try using a G+ community. So, if you have an idea for a project, or fancy being a mentor or co-mentor, please join the Phyloinformatics community [2] and start thinking about project ideas. If you have an idea, post to the G+ page, and enter more detail about the idea on the NESCent GSoC page [3].


If you have any questions, let me know (or better yet, post to the G+ group, as someone else might have the same question).

Here’s to another fun and productive Phyloinformatics Summer of Code!

Karen Cranston

DILS 2012

 Conferences, Informatics  Comments Off on DILS 2012
Aug 282012

In June I had the opportunity to attend DILS 2012 (Data Integration in the Life Sciences), at the University of Maryland in College Park. I presented a poster on Phenoscape, “The Phenoscape Knowledgebase: Integrating phenotypic data across taxonomy, from biodiversity to developmental genetics”. The poster highlighted some of the new directions the Phenoscape project is heading, such as broadening taxonomic coverage and adoption of semantic web technologies. DILS was a small conference but had several talks discussing the applications of ontologies to biological data. I’m looking forward to DILS 2013 in Montreal, in conjunction with ICBO and the Canadian Semantic Web conference.

Filed under: Conferences, Informatics
 Posted by on August 28, 2012 at 6:47 pm  Tagged with:
Mar 012012

On the last day of a very successful Phenotype RCN meeting at Nescent last week , we held an impromptu session on OBO to OWL mappings.  This was based on the a recent workshop run for GO curators by myself (David Osumi-Sutherland), Chris Mungall,  Simon Jupp and Jane Lomax.  By popular demand, I’ve posted my slides on slideshare.

The original workshop also included an intro to Protégé 4 by Simon Jupp [warning: word doc] as well as a set of problem solving exercises consisting of a set of folders each featuring one or more test ontologies and a README with instructions. For best results, you should checkout the whole repository of exercises using an svn client: svn checkout obo2owl_tut_read_only
Details of the software required for these exercises can be found at

Notes from ISWC 2011

 Conferences, Informatics, Semantic Web  Comments Off on Notes from ISWC 2011
Nov 032011

Last week, I attended the 10th International Semantic Web Conference (ISWC) in Bonn, Germany. A tremendous variety of sophisticated work is going on both in academia and industry to improve the technology for, and take advantage of, the ever-growing network of data and concepts published, through open standards, on the web.

You might say it is the best of times and the worst of times for semantic web enthusiasts, in that reasoning and query engines that can be used on large collections of RDF have in the last few years become a reality (one of the Challenge Tracks provided contestants with a *billion* triples to work with).  But some see clouds on the horizon. The web search titans (Bing, Google and Yahoo!) are now pushing, a microformat and vocabulary standard for web content that some worry may threaten the development of richer semantic web technology.  Still, most treated the news positively, happy to know that these organizations now seem to agree on the importance of semantics.  In fact, Yahoo! described at the conference how they are trying to build a “Web of Objects” that takes advantage of, together with more extensive internal vocabularies, to regroup knowledge pieces that are scattered around the Web.

Conference chair Natasha Noy showed a revealing pair of tag clouds comparing the abstracts from the first year of the conference in 2001 to today — the terms “semantic” and “web” have shrunk in importance and “data” is now king! ISWC 2011 tag cloud

Ivan Herman’s blog gives a good sampling of the flavor of talks presented at the meeting.  I especially enjoyed the Industry Track, since these applications are less familiar to me than the academic/scientific ones, and  I was particularly impressed by the importance of semantic technologies to the news media and other content industries.  These technologies are being deployed by news organizations with great enthusiam (e.g. the BBC).  I also came away with a strong sense that semantic technologies are helping to create demand, and drive a revolution in the use of, Open Government Data; there were a number of demonstrations of useful real-world applications, particularly to environmental monitoring.

With my Phenoscape hat on, I attended a Linked Open Data for Science (LISC) satellite workshop prior to the main conference.  The event included both presentations and discussions from a variety of perspectives about the opportunities and challenges of this new technology.  A diversity of fields were represented (social science, linguistics, geosciences, biomedicine, etc.).  But, it is clear that uptake of linked open data as an alternative means of publication is still in its infancy within the sciences.  This despite the fact that the bioinformatics data centers account for nearly a quarter of the real estate in the famous linked data cloud diagram.  Some of the most exciting opportunities, in my opinion, come from the ability to allow radically decentralized data publication, and this is something that we might wish to pilot in a modestly distributed data curation environment like Phenoscape.  Another observation: I was surprised to discover at the meeting how much the utility of the linked data cloud (and, by extension, the semantic web) depend on the social convention by which everyone provides links into a relatively small number of large ‘concept repositories’ like DBPedia (which was originally a Master’s project, BTW).

The breakout discussion sessions at LISC  highlighted how scientific practice will place difficult demands on linked data with respect to provenance, context, granularity, distributed authority, etc.  This resonated with the message of our own contribution to the workshop, which outlined some of the particular challenges in making context-dependent links between scientific objects, when the descriptions of those objects are scattered across different resources, and when the similarities between objects are spread weakly over many properties [1].  Another important question that hit home for a number of us coming from the bioinformatics and biodiversity informatics world is how scientists are going to be able to take advantage of the innovations now going on in the commercial sector (including some of the exhibitors at the main conference) within the constraints and DIY culture of small individual university-based research grants.

There is no denying the explosion in linked data resources out there (comparisons of the growth in the cloud diagram are about as common as graphs showing the growth in sequence data at a biology conference).  But another recurrent theme of the meeting was that unfortunately much of that content is missing semantics (i.e. a lack of use or availability of ontologies for many concepts, and lack of links between content at different endpoints), and generating semantically annotated triples needs to be easier that it currently is (a message certainly relevant to those of us developing curation tools).

One of the keynotes, from Frank van Harmelen, generated quite a bit of buzz.  He looked back on 10 years of the semantic web, asking what theoretical principles we can learn from the experience so far, and his annotated slides are well worth a look.

The conference was a great mix of different formats.  In addition to the keynotes and regular talks, there are a host of workshops and tutorials, challenges, panel discussions (including one billed as a ‘Death Match’), and even a special competition for the best “Outrageous Ideas”.  The winner of that one was a proposal to bring linked data to the non-networked portion of humanity.  A particularly nice feature of the meeting was the ‘Minute Madness’ preceding the poster session in which each of the poster presenters gave a short timed pitch with to all the attendees – it was a very entertaining and informative way to ‘see’ every poster and allowed everyone to quickly pick out which ones to hit during the session.

For more, see the excellent day-by-day summary of the meeting from Juan Sequeda, where there are links to all the winning presentations and challenge entries.  [Ironically, the conference website is down temporarily while it is being moved, so come back later if the links to the papers hang].  The next ISWC will be November 11-15, 2012 in Boston.


[1] Vision T, Blake J, Lapp H, Mabee P, Westerfield M (2011) Similarity Between Semantic Description Sets: Addressing Needs Beyond Data Integration, in Proceedings of the First International Workshop on Linked Science, Bonn, Germany, October 24, 2011, Tomi Kauppinen, Line C. Pouchard, Carsten Kessler (eds), published in CEUR Workshop Proceedings, Volume 783.

Filed under: Conferences, Informatics, Semantic Web

Postdoctoral Opportunity: Semantic Reasoning for Biological Phenotypes

 Informatics, Jobs, Postdoc, Reasoning  Comments Off on Postdoctoral Opportunity: Semantic Reasoning for Biological Phenotypes
Jul 292011

We seek a postdoctoral researcher in computational biology for Phenoscape.  This person will contribute to two important research strands within the project:

  1. Development of computational and statistical methodology for measuring semantic similarity between sets of phenotypes, in order to support searches within extremely large phenotype datasets.
  2. Development and testing of methods for automatically generating ontologically based phenotype expressions from structured excerpts of natural language.

The position is based in the informatics group at the National Evolutionary Synthesis Center (NESCent), and will be administered through the University of North Carolina at Chapel Hill (UNC-CH) under the supervision of Hilmar Lapp at NESCent and Dr. Todd Vision at UNC-CH.   The research will be in collaboration with Dr. Chris Mungall at Lawrence Berkeley National Lab and Dr. Hong Cui at the University of Arizona.  The project also includes biologists and bioinformaticists from the University of South Dakota, the University of Chicago, the University of Kansas, in addition to the model organism databases for mouse (MGD), zebrafish (ZFIN), and Xenopus (Xenbase).

Applicants should have a PhD in bioinformatics, computational biology or a related field. Prior experience with machine reasoning using ontologies is strongly preferred. The position is for two years, pending satisfactory performance and availability of funds.  To apply, please provide a cover letter, CV, and contact information for three references.  Inquiries and applications may be sent to Hilmar Lapp at  The post is open immediately and will remain open until filled.

Filed under: Informatics, Jobs, Postdoc, Reasoning


 Conferences, Informatics, Semantic Web  Comments Off on CSHALS 2011
Mar 092011

I recently attended the Conference on Semantics in Healthcare and Life Sciences (CSHALS), in Cambridge, MA. The CSHALS meeting was a change for me in that it’s much more healthcare-oriented than other venues in which I’ve presented work from Phenoscape. This was a great opportunity to see how far the healthcare community has pushed semantic web technologies, and also to become more familiar with some of the more commercial packages which are available for storing and querying very large knowledgebases based on RDF (for example, AllegroGraph and Gruff from Franz, Inc., and Sentient Knowledge Explorer from IO Informatics). A particularly interesting talk was the keynote by Toby Segaran, of Metaweb Technologies, advocating semantic techniques as a more agile approach to data. Slideshows from the conference presentations are available for download here, including my own.

Filed under: Conferences, Informatics, Semantic Web
 Posted by on March 9, 2011 at 10:03 pm  Tagged with:

Matching Phenotypes

 Informatics, Knowledge Base, Science  Comments Off on Matching Phenotypes
Dec 172010

An important goal for the Phenoscape project is to be able to suggest candidate genes that may have contributed to evolutionary change.  The way that we have proposed to do this is to search for changes in phenotype that appear as the result of mutations in model organisms and also appear as phenotype changes on an evolutionary tree.  There are several challenges in designing this search, apart from simply recognizing similar phenotypes, that we have been working on during the past few months.

The first issue is that we are interested in changes in phenotype, not simply matching phenotypes.  For phenotypes associated with mutants of model organism mutants, it is understood that they vary with respect to the wild type.  For taxa, however, this means looking for taxonomic nodes where variation in a phenotype is observed among the children of the node.  For example, there are nine species within the genus Aspidoras with annotations for the shape of the opercle bone.  Of these, eight exhibit opercle bones with round shape, but the ninth (A. pauciradiatus) is annotated with a triangular opercle.  In contrast, all three annotated species of the related Hoplosternum are annotated with a triangular opercle.  Thus there is detectable variation in opercle shape within the children of Aspidoras, but not within  Hoplosternum - suggesting that change in opercle shape has occurred somewhere among the descendants of  Aspidoras. For our analysis, identifying variation among descendants is important.

Thus, our search for shared variation in phenotypes focuses on matching phenotypes associated with genes with phenotypes of taxa showing variation.  However we are looking for matches at a larger scale than single phenotypes; we are looking for matches across the set of phenotypes affected by a gene or the set of features that have changed among the descendants of a taxonomic node.   We refer to these sets of phenotypes as the ‘phenotypic profile’ of a gene or taxon, following a seminal paper by Washington et al. 2009.  Washington et al. propose four metrics (three based on ‘information content’) to score matches between the sets of phenotypes in a pair of profiles.

In the course of developing the search, we have encountered several important differences in curation approach between ZFIN and Phenoscape.  In some cases tehre are different uses of PATO to model the same phenotype, for example the absence of an entity.  In other cases ZFIN uses a quality ‘abnormal’ that applies to mutants, but not in a taxonomic, comparative sense, which means these phenotypes will be inaccessible to us.  Thus, implementing this search is helping us to better understand our data and our choices in modeling the data and how it interoperates with other ontology-based data.  Such reflection would have been difficult or impossible without the use of ontologies to represent the phenotypes.

Filed under: Informatics, Knowledge Base, Ontology, Science