Jan 102014

In an effort to expand the user community and to demonstrate what is possible using our infrastructure, members of the Phenoscape team gave multiple presentations across two continents on our recent developments. In late October Paula Mabee gave an invited presentation on mapping phenotypes across phylogenies at the Muséum national d’Histoire naturelle in Paris. This was followed by presentations at the 73rd annual meeting of the Society of Vertebrate Paleontology (SVP) in Los Angeles and the 2013 meeting of the Taxonomic Database Working Group (TDWG) in Florence, Italy. Phenoscape had a significant presence at SVP with both a poster presented by Alex Dececchi demonstrating our progress in generating supermatrices from our annotations as well as a talk given by collaborator Karen Sears, using EQ supermatrices from Phenoscape fin/limb data to examine integration patterns across the fin to limb transition. Karen’s talk marks the first of the collaborations coming out of our 2013 San Francisco workshop. It also showed how data from Phenoscape can drive independent projects and is easily integrated with existing phylogenetic and statistical tool such as Mesquite and various R modules. The talks and poster were well received, with numerous researchers inquiring on how they could incorporate Phenoscape or use ontology based annotations.

Filed under: Conferences
Nov 262013

A handful of new papers of interest are available at the Journal of Biomedical Semantics, which is publishing a collection of articles related to biomedical ontologies and ontology updates:

• P. E. Midford et al: The vertebrate taxonomy ontology: a framework for reasoning across model organism and species phenotypes

• R. Nigam et al: Rat Strain Ontology: structured controlled vocabulary designed to facilitate access to strain data at RGD

• P. Ciccarese et al: PAV ontology: provenance, authoring and versioning

• K. M. Livingston et al: Representing annotation compositionality and provenance for the Semantic Web

Nov 052013

via Robin Haw

On behalf of the Organizing Committee for the 7th International Biocuration Conference, I am delighted to announce that our website and registration are now open:


The conference will be held at Hart House, in Toronto from 6-9th April 2014 and it would be wonderful to see you there.  We have four keynote speakers confirmed:

Dr. Tim Hubbard, Wellcome Trust Sanger Institute

Dr. Suzanna Lewis, Lawrence Berkeley National Laboratory

Dr. Patricia Babbitt, California Institute for Quantitative Biosciences (QB3)

Dr. Lincoln Stein, Ontario Institute for Cancer Research

Early bird registration rates apply until 7th March 2014. We have secured discount rates at three hotels in Toronto; please see the Biocuration 2014 website for more information on booking.

Please note that the paper submission deadline is 15th November 2013. So there is limited time to put your paper together.

The deadline for the abstract submission to present at the conference is 10th February 2014.

Oct 292013

by Karen Eilbeck

One of our tasks at the SO-GENO phenotype workshop in Portland this fall, was to formalize the description of phenotypic data in genomic annotation. Previously we had written instructions in the use of phenotype ontologies such as HPO when creating variant file annotations in Genome Variation Format (GVF). GVF is a tab delimited variant file for the detailed annotation of sequence variants, and the specification is managed as part of the Sequence Ontology. Our revised guidelines were split into human and non-human recommendations to reflect the diversity in phenotypic annotation resources. We address best practices for annotation, provide easy to follow examples, and discuss the process for requesting new terms from the phenotype resources. The recommendations are available here and have been registered with Biosharing as a reporting guideline. Biosharing is a website to register and track well-constituted efforts to develop standards for describing and sharing biosciences experiments; see more here.

Oct 292013

by Matthew Brush

In September 2013, the Phenotype RCN sponsored a three-day workshop at Oregon Health & Science University to align sequence feature and genetic variation representation and thereby support phenotype data integration. Participants included developers of the Sequence Ontology  (SO) [1] (Karen Eilbeck, Mike Bada, and Bret Heale), and members of the ontology team from the Monarch Initiative [2] who have been developing a genotype ontology called GENO (Matthew Brush, Melissa Haendel, and Chris Mungall).


One of the goals of the Phenotype RCN is to promote coordination and standardization of phenotype-related data. A standardized representation of genotype information is required for integrating genetically-linked phenotype data from diverse sources  including model organism, human variation, livestock, and evolutionary databases.  A particular challenge relates to harmonizing phenotype annotations where they are linked to genetic variations at different levels of granularity – from complete strain genotypes, to specific gene alleles, to single nucleotide polymorphisms.

Monarch and SO Projects

The Monarch Initiative is a new effort that aims to integrate genotype-to-phenotype and related data from numerous sources under a common semantic framework, and develop tools and services for user-guided exploration and analysis. Towards this end, Monarch required development of new modeling for genotypes (housed in GENO), which was lacking in the ontology landscape. The scope of GENO necessarily overlaps with that of the Sequence Ontology, but has a unique perspective on sequence features as they relate to linking different scales of genetic variation and to organismal phenotypes. The need to align modeling between SO and GENO motivated our collaboration, which was particularly timely as the SO had recently initiated a refactoring to accommodate use cases beyond its initial charge of genome annotation. This refactoring aimed to  define the context of the SO with respect to the Basic Formal Ontology (BFO) and other OBO ontologies, enhance representation of sequence variation, and develop a parallel representation of material sequence features (MSO) to complement the abstract feature representation in the existing SO. These goals were consistent with those of Monarch to support better phenotype data integration and therefore a workshop was funded by the Phenotype RCN.

Genetic Variation in GENO

The genotype information modeled in GENO is broadly conceived to include any variation in gene expression that is tied to an observed phenotypic effect. Two types of ‘genetic variation’ are explicitly distinguished in GENO: (1) ‘Sequence-variation’ describes changes in the sequence of an organism’s genome, which are captured in the traditional genotypes shared by biologists. In this context, ‘sequence variant genes’ are heritable changes in genomic DNA, and include things like point mutations, SNPs, or transgenic insertions that are represented in SO. (2) ‘Expression-variation’ relates to experimental alterations in the expression-level of genes that are not due to changes in the sequence of the subjects’ genome. Here, ‘expression variant gene’ are genes that are altered in the level of their expression as a result of some experimental intervention such as targeted gene knock-down using reagents such as morpholinos and RNAi, or transient expression from DNA constructs. Like sequence variants, these expression variants change what is expressed in an organism and can lead to measurable phenotypic outcomes.  The GENO ontology aims to re-use and co-develop the SO sequence variation model, but the notion of expression variation was concluded to be outside the SO scope. Modeling in GENO will extend and be logically consistent with the SO approach and will leverage links to orthogonal ontologies to place variation in a broader biological context [3]. Additional information about the SO and GENO models and their interaction can be found in the presentation posted here [4].

Workshop Goals and Outcomes

One of the immediate goals of our workshop was to find consensus on high-level ontological issues that have yet to be resolved in the development of these and other OBO Foundry ontologies and document these decisions for the community.  Many such issues have been broadly debated for years, and our outcomes may be relevant for other domains or applications in biomedical research. Much progress was made in resolving key issues, and a plan was established for ongoing collaborative work.  Some outcomes are below, and more detailed notes can be found here [5].

  1. Terminological standardization of core terms.  Terms such as ‘sequence’, ‘gene’, ‘allele’, variant’, ‘reference’, ‘mutant’, ‘genetic’ are variably and ambiguously used across communities, and required precise definitions and consistent use.  Work is ongoing to craft such definitions, which will be reflected in our respective ontologies as they are refined and vetted.
  2. The ontological nature of sequences and sequence features (and their place in the BFO/IAO framework).  Specific topics included: (1) the merits and implications of modeling sequence features as generically dependent continuants, or more specifically as information content entities, (2) defining identity criteria for sequence features to include their sequence and their position (as opposed to sequence only), (3) how to model attributes of sequence features such as biological activity, experimental provenance, reference status, and zygosity, and (4) the ways in which sequence features are considered to vary with respect each other (e.g. wild-type vs mutant sequences, reference vs alternate sequences).
  3. Gene representation, and modeling the central dogma. We debated strategies to provide an OWL-based ontological representation and identifiers for genes and their variants, that would serve SO, Monarch, and the broader phenotype community.  Related discussions focused on how to build from this gene representation to link to derived sequences at RNA and protein levels, and describe properties that emerge in this derivation.
  4. Variant representation.  A precise and explicit account of how the concept of ‘sequence variation’ should be defined across SO and GENO was established. In this model, a ‘variant’ is any sequence feature that varies_with some other instance of the same feature.  So sequence variants are considered to be ‘variant_with’ any other version of that feature, rather than ‘variants_of’ some reference. But we will also represent more specific types of the ‘variant_with’ relation that describe the different ways biologists consider sequences to vary with each other based on the roles that the variants in this relation hold (including where one is reference and another alternate versions, or one is wild-type and the other mutant). This is a critical facet of relating phenotypes to genotypes.
  5. Integration of expression-level variation modeling in GENO with sequence-variation modeling in SO.  Here, the high level approach for representing expression variation in terms of genetic sequences that are altered in their expression was reviewed and vetted by members of Monarch and SO teams.  Several approaches for conceptual integration of the expression and sequence variation models are under consideration.
  6. Technical approaches for coordinated development.  Topics included how to manage parallel construction and coordination of abstract SO and physical MSO ontologies – where strategies for automated derivation of the SO from the MSO were reviewed.  In addition, we discussed how to manage community development of SO and GENO as integrated but separate ontologies, using existing platforms, tools, and standards for software development (Google projects, trackers, list-serves, build and QA tools, etc).

As noted above, more details on each of these topics, as well as many others, can be found in the document here [5].  Participation of the broader community is encouraged through feedback on this document or participation in ongoing coordination calls (contact brushm@ohsu.edu for info).


  1. http://www.sequenceontology.org/
  2. http://monarchinitiative.org/
  3. ICBO 2013 conference paper - http://www2.unb.ca/csas/data/ws/icbo2013/papers/ec/icbo2013_submission_60.pdf
  4. Presentation to the Phenotype RCN, October 2013 - http://www.slideshare.net/mhb120/phenotype-rcn-sogenoworkshopshared
  5. Google doc summarizing workshop outcomes - https://docs.google.com/document/d/1AUEVX0Sx_iy9mTI6F59Yo7ZCXu4zv5uSk28AHid5zhc/edit#
Oct 222013

The Journal of Biomedical Semantics is publishing a collection of articles related to biomedical ontologies and ontology updates, including some that address phenotype representation. As of today, the following have been published and provisional PDFs are available at the JBMS website:

• The Drosophila anatomy ontology

• The Drosophila phenotype ontology

• Enhanced XAO: the ontology of Xenopus anatomy and development underpins more accurate annotation of gene expression and queries on Xenbase

• Automatically transforming pre- to post-composed phenotypes: EQ-lising HPO and MP

• Function of dynamic models in systems biology: linking structure to behaviour

• Developing a semantically rich ontology for the biobank-administration domain

• Functional tissue units and their primary tissue motifs in multi-scale physiology

• Enrichment analysis applied to disease prognosis

• The Vertebrate Trait Ontology: a controlled vocabulary for the annotation of trait data across species

Oct 122013

On Monday, October 7, the Phenotype RCN hosted a cross-working group call featuring presentations by Pier Luigi Buttigieg on the Environment Ontology and by Matthew Brush on a workshop focused on genetic variation representation.

Dr. Buttigieg, a post-doctoral research associate at the Alfred Wegener Institute Helmholtz Centre for Polar and Marine Research, gave an overview of the structure and applications of the Environment Ontology (EnvO), a community ontology for the concise, controlled description of environments. It contains terms at the levels of biomes, environmental features, and environmental material. He talked about EnvO’s sister project, GAZ, an open-source gazetteer built on ontological principles, and EnvO’s adoption by the Encyclopedia of Life (EOL) to provide environmental context to taxon information. EnvO is of particular interest to the RCN because we would like to further explore capturing environments in relation to phenotype.

Matthew Brush, a member of the Ontology Development Group at Oregon Health & Science University, reported on a RCN-sponsored workshop to align genetic variation representation across the Sequence Ontology and Monarch Initiative, in support of efforts to link phenotypes to genotype data. There will be more posted on the RCN blog soon about the workshop and its outcomes.

The Phenotype RCN plans to host monthly calls the first Monday of every month at 8 a.m. Pacific / 11 a.m. Eastern time. If you would like to receive invitations to join via WebEx, please email Erik Segerdell. Suggestions for topics and volunteers for presenters are welcome!

An announcement for the next meeting, November 4, is coming soon.

Aug 062013

The final day of the course was presentation day, where we got to hear about everyone’s research questions and how they were going to use (or not!) ontologies for their work. We group brainstormed solutions and next steps for everyone’s projects. The group was very synergistic and we believe that we’ll see some nice contributions and connections being made in upcoming months in the ontology community. Highlights from the presentations are below:


Chris Sheil:

Developmental comparative morphology of the chondrocranium in turtles


Elise Larsen:

Developing a knowledge base of North American butterfly monitoring data


Ashleigh Smythe:

Comparative morphology of nematode anatomy and behavior


Mariangeles Arce:

Connectivity and function of cranial and ventral musculature in catfishes


Peter Uetz:

Representing diversity of form in The Reptile Database


Corrine Blank:

Phenomics of Prokaryota for Tree of Life


Jing Liu:

Use of CharaParser and ontologies to generate plant character matrices from publications


Maria Christina Diaz:

Development of the Porifera ontology to represent sponge comparative anatomy and biodiversity


Eric Chen:

Characterizing gene retention after whole genome duplication, triplication


John Wieczorek:

Semanticizing the Darwin Core


Laura Jackson:

Gene regulation of fin positioning in fishes


Steve Elliott:

Representing the semantic types in the Embryo Project


Anne Thessen:

Phenomics of Eukaryotic microbes


Aug 032013

We spent the entire Day 3 working on learning Protege and exploring the use of various OWL2 capabilities. The students worked at their own pace, but all made it to the light at the end of the tunnel. They are all now enlightened :-) (bearers_of some instance_of PATO:0001291). We had also converted the students VUE files into OWL, and some students were able to start work on their OWL files. Day 4 of the course was focused on developing skills around reuse of other ontologies and data interoperability. We learned techniques for performing OWL imports and the use of MIREOT (Minimum Information to Reference and External Ontology Term), which is basically a way to use a subset of another ontology in your own.

We saved a discussion of homology for the end of the last full day, knowing full well that a) the discussion would be vigorous, b) people would have to break for food at some time, and c) the conversation could continue into the evening. Kudos to our students, as they seemed to immediately understand our community’s general approach to this subject, to the degree that nobody even foamed at the mouth.

Jul 312013

Day two started with a “speed-dating” approach with instructors pairing off for short periods with participants to strategize and work on individual projects. VUE files representing participants’ projects continued to be formalized, some now contain many nodes and some are even very pretty. These visual representations will be translated into OWL files shortly, and further refined in Protege. The morning progressed into a presentation on annotations, where tools like Phenex and Phenote were outlined.

In the afternoon we had a great overview from Karen Kranston, PI on the OpenTreeOfLife project, and we discussed how ontologies may or may not be useful for projects like OToL. We continued with a survey of web-based resources related to evolutionary biology, with participants auditing well known websites for their use, or lack thereof of ontologies. The day concluded with the last bit of preparation prior to our big practical exercise on Protege on Wednesday, a nice overview of OWL, with specific reference to the (very nice) primer. We’re looking forward to the first real taste of formalization with the Protege tutorial, and the creation of individuals’ own ontologies.


Matt Yoder, Melissa Haendel, Erik Segerdell and Jim Balhoff