Apr 042016
 
a group of people smiling and standing in front of sculpture at Biosphere 2

Participants at the fifth and final summit meeting of the Phenotype RCN. Photo by Andy Deans (CC BY 2.0).

The Phenotype Research Coordination Network hosted its fifth and final summit meeting at the end of February at Biosphere 2, with 66(!) people in attendance. The focus was on data integration, and we were fortunate to have the FuturePhy project join us. Our program was packed, with a mix of panels, talks (we have links to slideshows), and breakout sessions that focused on proposal ideas. One frequent topic for discussion was the need to keep this network going, as there remains a clear need for outreach and mechanisms that foster collaborations on phenotype data. Several working groups also focused on large, international collaborations that would make phenotype tools, like ontologies, and phenotype data more accessible and sustainable—imagine something like GenBank but for phenotypes.

Another successful and compelling component of this meeting was the inclusion of many early career researchers and graduate students, who formed a cohesive network themselves. Their discussions and reports to the larger group identified broad needs and informed our collective ideas for future outreach directions.

The Phenotype RCN has been productive, impactful, and and incredibly rewarding. We thank all who have been involved, especially meeting participants and our advisory board. While this phase—i.e., our original NSF-funded schedule—may be winding down, the network is robust and active. Stay tuned for further developments!

 Posted by on April 4, 2016 at 4:45 pm
Oct 152014
 

Several members of the Plant Working Group got together at Phoenix Bioinformatics in lovely Redwood City California at the end of September to write up results of the long-running Plant Phenotype Pilot Project (or PPPP as the cognoscenti call it). The first draft is affectionately known as the Plant Phenotype Pilot Project Preliminary Paper (or PPPPPP).

Left: Wild type Arabidopsis and several adherent leaf mutants, from Voisin et al. 2009, PLoS Genetics 5(10):e1000703 Fig. 1.  Right: Zea mays adherent dwarf. From MaizeGDB (http://images.maizegdb.org/db_images/Variation/mgn/5207_1613_1042_29.jpg).

Figure 1   5207_1613_1042_29

Despite their obsession with the letter P, working group members Carolyn Lawrence, David Meinke, Ramona Walls, Lisa Harper and Eva Huala — with the help of Anika Oellrich, Laurel Cooper, Pankaj Jaiswal and George Gkoutos who were able to join via Skype — made good progress over the course of two and a half days on writing up sections describing the assembly and analysis of the phenotype dataset produced by the group, which includes 6361 Entity-Quality statements describing mutant phenotypes associated to 2744 genes across six well-studied plant species (Arabidopsis, rice, maize, soybean, Medicago, and tomato).

Other recent activities from the plant working group include submission of a grant proposal to NSF-ABI over the summer to fund the continuation of this work. The submission was made to “Advances in Biological Informatics” with funding sought from both the NSF and BBSRC under the “UK BBSRC-US NSF/BIO Lead Agency Pilot Opportunity” program.

IMG_2828   IMG_2832

The plant working group would like to thank the RCN for covering travel expenses for this and previous working group meetings; this funding has enabled the group to work together effectively despite being scattered over a wide range of institutions in the USA and UK.

IMG_2822

 

 Posted by on October 15, 2014 at 10:08 pm
Jul 212014
 

Over the years, a number of different vertebrate anatomy ontologies have been developed. Some of these are dedicated to a single model species, or to human. Others have been developed to describe phenotypic variation across species, and these cover a broad range of species. In particular:

This lead to considerable duplication of effort, as common anatomical
structures such as ‘pectoral girdle‘ were represented in all five ontologies
(as well as their single species counterparts):

Haendel et al fig 1

Pectoral girdle and related concepts in Uberon, with cross-references to other ontologies shown (Fig 1, Haendel et al)

It was difficult for the Phenoscape group to integrate data across all these ontologies, as this required that curators kept mutual cross-references up to date, a time-consuming and error-prone task.

As a result, the maintainers of these ontologies agreed to join forces and build a common ontology.This work is described in a new paper in the ontologies special issue of the Journal of Biomedical Semantics:

Haendel MA, Balhoff JP, Bastian FB et al  Unification of multi-species vertebrate anatomy ontologies for comparative biology in Uberon Journal of Biomedical Semantics 2014, 5:21  doi:10.1186/2041-1480-5-21

The group selected Uberon as the core ontology, as it had the broadest coverage, was already well-integrated with the single-species ontologies, and was adapted for OWL reasoning. The curators of these ontologies worked long and hard to integrate their work, with input from anatomy experts and developers of single-species ontologies, revealing many interesting differences in the way structures are represented across species along the way. For example, the representation of teeth in the combined ontology had to be flexible enough to accommodate teeth that are in widely variable locations and configurations:

Figure 4. Diversity of tooth locations

The number of classes merged is shown in figure 2 of the paper:

Figure 2. Overlap and contributions from source ontologies. A) Venn diagram showing the extent of cross-referenced content between msAOs prior to the merge.. B) Ontology evolution and integration into Uberon

 

As a result of this effort, we have a common anatomy ontology with broad and deep coverage for vertebrate anatomy. For a variety of viewing options, see the Uberon website. For examples of use for data integration see:

Like most ontologies, work is ongoing and we are constantly striving to improve depth, coverage and quality. We’re currently actively improving the representation of facial muscles in the ontology based on the FEED ontology. We are also working on a federated approach for bringing in invertebrate anatomy ontologies, many of which are developed under the auspices of the Phenotype RCN  including the Arthropod Anatomy Ontology, the Poriferan anatomy ontology [Thacker et al, accepted, JBMS], the cephalopod  ontology and the ctenophore ontology. We welcome feedback from everyone!

Jan 162014
 

Last week, the Phenotype RCN hosted a cross-working group call featuring presentations by Ramona Walls on the Plant Ontology and Cross-Species Reasoning [pdf] and Laurel Cooper on Common Reference Ontologies for Plants.

Dr. Walls (The iPlant Collaborative, University of Arizona, and New York Botanical Garden) demonstrated how the PO defines anatomical terms in a way that they can be used across all green plant species. After an overview of the ontology, which can be searched and browsed at http://plantontology.org/, she discussed its evolution, main branches (“plant anatomical entity” and “plant structure developmental stage”), and characteristics shared with CARO (the Common Anatomy Reference Ontology). She talked about specific changes to the ontology that make it work better for all green plants and presented use cases concerning comparison of gene expression, traits, and phenotypes across species.

Dr. Cooper (Oregon State University) followed with a talk about the PO and cROP, the Common Reference Ontologies for Plants. She identified problems arising from free-text phenotype descriptions and scattered data resources, and demonstrated how the PO fits into the centralized cROP platform, where reference ontologies for plants will be used to access data sources for plant traits, phenotypes, diseases, genomes linked to gene expression and genetic diversity data across a wide range of plant species. The cROP Ontology Database may be accessed via its web portal, http://crop.cgrb.oregonstate.edu/.

Many thanks to Ramona and Laurel for their outstanding talks!

The Phenotype RCN plans to host monthly calls the first Monday of every month at 8 a.m. Pacific / 11 a.m. Eastern time. If you would like to receive invitations to join via WebEx, please email Erik Segerdell. Suggestions for topics and volunteers for presenters are welcome!

Oct 292013
 

by Karen Eilbeck

One of our tasks at the SO-GENO phenotype workshop in Portland this fall, was to formalize the description of phenotypic data in genomic annotation. Previously we had written instructions in the use of phenotype ontologies such as HPO when creating variant file annotations in Genome Variation Format (GVF). GVF is a tab delimited variant file for the detailed annotation of sequence variants, and the specification is managed as part of the Sequence Ontology. Our revised guidelines were split into human and non-human recommendations to reflect the diversity in phenotypic annotation resources. We address best practices for annotation, provide easy to follow examples, and discuss the process for requesting new terms from the phenotype resources. The recommendations are available here and have been registered with Biosharing as a reporting guideline. Biosharing is a website to register and track well-constituted efforts to develop standards for describing and sharing biosciences experiments; see more here.

Oct 292013
 

by Matthew Brush

In September 2013, the Phenotype RCN sponsored a three-day workshop at Oregon Health & Science University to align sequence feature and genetic variation representation and thereby support phenotype data integration. Participants included developers of the Sequence Ontology  (SO) [1] (Karen Eilbeck, Mike Bada, and Bret Heale), and members of the ontology team from the Monarch Initiative [2] who have been developing a genotype ontology called GENO (Matthew Brush, Melissa Haendel, and Chris Mungall).

Background

One of the goals of the Phenotype RCN is to promote coordination and standardization of phenotype-related data. A standardized representation of genotype information is required for integrating genetically-linked phenotype data from diverse sources  including model organism, human variation, livestock, and evolutionary databases.  A particular challenge relates to harmonizing phenotype annotations where they are linked to genetic variations at different levels of granularity – from complete strain genotypes, to specific gene alleles, to single nucleotide polymorphisms.

Monarch and SO Projects

The Monarch Initiative is a new effort that aims to integrate genotype-to-phenotype and related data from numerous sources under a common semantic framework, and develop tools and services for user-guided exploration and analysis. Towards this end, Monarch required development of new modeling for genotypes (housed in GENO), which was lacking in the ontology landscape. The scope of GENO necessarily overlaps with that of the Sequence Ontology, but has a unique perspective on sequence features as they relate to linking different scales of genetic variation and to organismal phenotypes. The need to align modeling between SO and GENO motivated our collaboration, which was particularly timely as the SO had recently initiated a refactoring to accommodate use cases beyond its initial charge of genome annotation. This refactoring aimed to  define the context of the SO with respect to the Basic Formal Ontology (BFO) and other OBO ontologies, enhance representation of sequence variation, and develop a parallel representation of material sequence features (MSO) to complement the abstract feature representation in the existing SO. These goals were consistent with those of Monarch to support better phenotype data integration and therefore a workshop was funded by the Phenotype RCN.

Genetic Variation in GENO

The genotype information modeled in GENO is broadly conceived to include any variation in gene expression that is tied to an observed phenotypic effect. Two types of ‘genetic variation’ are explicitly distinguished in GENO: (1) ‘Sequence-variation’ describes changes in the sequence of an organism’s genome, which are captured in the traditional genotypes shared by biologists. In this context, ‘sequence variant genes’ are heritable changes in genomic DNA, and include things like point mutations, SNPs, or transgenic insertions that are represented in SO. (2) ‘Expression-variation’ relates to experimental alterations in the expression-level of genes that are not due to changes in the sequence of the subjects’ genome. Here, ‘expression variant gene’ are genes that are altered in the level of their expression as a result of some experimental intervention such as targeted gene knock-down using reagents such as morpholinos and RNAi, or transient expression from DNA constructs. Like sequence variants, these expression variants change what is expressed in an organism and can lead to measurable phenotypic outcomes.  The GENO ontology aims to re-use and co-develop the SO sequence variation model, but the notion of expression variation was concluded to be outside the SO scope. Modeling in GENO will extend and be logically consistent with the SO approach and will leverage links to orthogonal ontologies to place variation in a broader biological context [3]. Additional information about the SO and GENO models and their interaction can be found in the presentation posted here [4].

Workshop Goals and Outcomes

One of the immediate goals of our workshop was to find consensus on high-level ontological issues that have yet to be resolved in the development of these and other OBO Foundry ontologies and document these decisions for the community.  Many such issues have been broadly debated for years, and our outcomes may be relevant for other domains or applications in biomedical research. Much progress was made in resolving key issues, and a plan was established for ongoing collaborative work.  Some outcomes are below, and more detailed notes can be found here [5].

  1. Terminological standardization of core terms.  Terms such as ‘sequence’, ‘gene’, ‘allele’, variant’, ‘reference’, ‘mutant’, ‘genetic’ are variably and ambiguously used across communities, and required precise definitions and consistent use.  Work is ongoing to craft such definitions, which will be reflected in our respective ontologies as they are refined and vetted.
  2. The ontological nature of sequences and sequence features (and their place in the BFO/IAO framework).  Specific topics included: (1) the merits and implications of modeling sequence features as generically dependent continuants, or more specifically as information content entities, (2) defining identity criteria for sequence features to include their sequence and their position (as opposed to sequence only), (3) how to model attributes of sequence features such as biological activity, experimental provenance, reference status, and zygosity, and (4) the ways in which sequence features are considered to vary with respect each other (e.g. wild-type vs mutant sequences, reference vs alternate sequences).
  3. Gene representation, and modeling the central dogma. We debated strategies to provide an OWL-based ontological representation and identifiers for genes and their variants, that would serve SO, Monarch, and the broader phenotype community.  Related discussions focused on how to build from this gene representation to link to derived sequences at RNA and protein levels, and describe properties that emerge in this derivation.
  4. Variant representation.  A precise and explicit account of how the concept of ‘sequence variation’ should be defined across SO and GENO was established. In this model, a ‘variant’ is any sequence feature that varies_with some other instance of the same feature.  So sequence variants are considered to be ‘variant_with’ any other version of that feature, rather than ‘variants_of’ some reference. But we will also represent more specific types of the ‘variant_with’ relation that describe the different ways biologists consider sequences to vary with each other based on the roles that the variants in this relation hold (including where one is reference and another alternate versions, or one is wild-type and the other mutant). This is a critical facet of relating phenotypes to genotypes.
  5. Integration of expression-level variation modeling in GENO with sequence-variation modeling in SO.  Here, the high level approach for representing expression variation in terms of genetic sequences that are altered in their expression was reviewed and vetted by members of Monarch and SO teams.  Several approaches for conceptual integration of the expression and sequence variation models are under consideration.
  6. Technical approaches for coordinated development.  Topics included how to manage parallel construction and coordination of abstract SO and physical MSO ontologies – where strategies for automated derivation of the SO from the MSO were reviewed.  In addition, we discussed how to manage community development of SO and GENO as integrated but separate ontologies, using existing platforms, tools, and standards for software development (Google projects, trackers, list-serves, build and QA tools, etc).

As noted above, more details on each of these topics, as well as many others, can be found in the document here [5].  Participation of the broader community is encouraged through feedback on this document or participation in ongoing coordination calls (contact brushm@ohsu.edu for info).

References

  1. http://www.sequenceontology.org/
  2. http://monarchinitiative.org/
  3. ICBO 2013 conference paper – http://www2.unb.ca/csas/data/ws/icbo2013/papers/ec/icbo2013_submission_60.pdf
  4. Presentation to the Phenotype RCN, October 2013 – http://www.slideshare.net/mhb120/phenotype-rcn-sogenoworkshopshared
  5. Google doc summarizing workshop outcomes – https://docs.google.com/document/d/1AUEVX0Sx_iy9mTI6F59Yo7ZCXu4zv5uSk28AHid5zhc/edit#
Oct 122013
 

On Monday, October 7, the Phenotype RCN hosted a cross-working group call featuring presentations by Pier Luigi Buttigieg on the Environment Ontology and by Matthew Brush on a workshop focused on genetic variation representation.

Dr. Buttigieg, a post-doctoral research associate at the Alfred Wegener Institute Helmholtz Centre for Polar and Marine Research, gave an overview of the structure and applications of the Environment Ontology (EnvO), a community ontology for the concise, controlled description of environments. It contains terms at the levels of biomes, environmental features, and environmental material. He talked about EnvO’s sister project, GAZ, an open-source gazetteer built on ontological principles, and EnvO’s adoption by the Encyclopedia of Life (EOL) to provide environmental context to taxon information. EnvO is of particular interest to the RCN because we would like to further explore capturing environments in relation to phenotype.

Matthew Brush, a member of the Ontology Development Group at Oregon Health & Science University, reported on a RCN-sponsored workshop to align genetic variation representation across the Sequence Ontology and Monarch Initiative, in support of efforts to link phenotypes to genotype data. There will be more posted on the RCN blog soon about the workshop and its outcomes.

The Phenotype RCN plans to host monthly calls the first Monday of every month at 8 a.m. Pacific / 11 a.m. Eastern time. If you would like to receive invitations to join via WebEx, please email Erik Segerdell. Suggestions for topics and volunteers for presenters are welcome!

An announcement for the next meeting, November 4, is coming soon.

Aug 062013
 

The final day of the course was presentation day, where we got to hear about everyone’s research questions and how they were going to use (or not!) ontologies for their work. We group brainstormed solutions and next steps for everyone’s projects. The group was very synergistic and we believe that we’ll see some nice contributions and connections being made in upcoming months in the ontology community. Highlights from the presentations are below:

 

Chris Sheil:

Developmental comparative morphology of the chondrocranium in turtles

shiel

Elise Larsen:

Developing a knowledge base of North American butterfly monitoring data

larsen

Ashleigh Smythe:

Comparative morphology of nematode anatomy and behavior

smythe

Mariangeles Arce:

Connectivity and function of cranial and ventral musculature in catfishes

arce

Peter Uetz:

Representing diversity of form in The Reptile Database

uetz

Corrine Blank:

Phenomics of Prokaryota for Tree of Life

blank

Jing Liu:

Use of CharaParser and ontologies to generate plant character matrices from publications

liu

Maria Christina Diaz:

Development of the Porifera ontology to represent sponge comparative anatomy and biodiversity

diaz

Eric Chen:

Characterizing gene retention after whole genome duplication, triplication

chen

John Wieczorek:

Semanticizing the Darwin Core

wieczorek

Laura Jackson:

Gene regulation of fin positioning in fishes

jackson

Steve Elliott:

Representing the semantic types in the Embryo Project

elliott

Anne Thessen:

Phenomics of Eukaryotic microbes

thessen

Aug 032013
 

We spent the entire Day 3 working on learning Protege and exploring the use of various OWL2 capabilities. The students worked at their own pace, but all made it to the light at the end of the tunnel. They are all now enlightened :-) (bearers_of some instance_of PATO:0001291). We had also converted the students VUE files into OWL, and some students were able to start work on their OWL files. Day 4 of the course was focused on developing skills around reuse of other ontologies and data interoperability. We learned techniques for performing OWL imports and the use of MIREOT (Minimum Information to Reference and External Ontology Term), which is basically a way to use a subset of another ontology in your own.

We saved a discussion of homology for the end of the last full day, knowing full well that a) the discussion would be vigorous, b) people would have to break for food at some time, and c) the conversation could continue into the evening. Kudos to our students, as they seemed to immediately understand our community’s general approach to this subject, to the degree that nobody even foamed at the mouth.

Jul 312013
 

Day two started with a “speed-dating” approach with instructors pairing off for short periods with participants to strategize and work on individual projects. VUE files representing participants’ projects continued to be formalized, some now contain many nodes and some are even very pretty. These visual representations will be translated into OWL files shortly, and further refined in Protege. The morning progressed into a presentation on annotations, where tools like Phenex and Phenote were outlined.

In the afternoon we had a great overview from Karen Kranston, PI on the OpenTreeOfLife project, and we discussed how ontologies may or may not be useful for projects like OToL. We continued with a survey of web-based resources related to evolutionary biology, with participants auditing well known websites for their use, or lack thereof of ontologies. The day concluded with the last bit of preparation prior to our big practical exercise on Protege on Wednesday, a nice overview of OWL, with specific reference to the (very nice) primer. We’re looking forward to the first real taste of formalization with the Protege tutorial, and the creation of individuals’ own ontologies.

VUE_map

Matt Yoder, Melissa Haendel, Erik Segerdell and Jim Balhoff