Feb 132014

Dear Phenotype Community,

We are heading into our annual meeting next week, where we will be prioritizing our next year’s activities with the help of the Phenotype RCN Advisory Board. If you have an idea for a workshop, working group or collaborative exchange, please send me an email and/or fill out a short application with your idea. See our blog for previous posts by folks who have been funded, and email me if you would like to discuss an idea before you propose. Please get these to us by February 19th.

Thanks! Paula (pmabee@usd.edu)

 Posted by on February 13, 2014 at 2:32 am
Feb 102014

Via http://phenoday2014.bio-lark.org/:

The Phenotype day [International Conference on Intelligent Systems for Molecular Biology] is an initiative developed jointly with the Bio-Ontologies and BioLINK Special Interest Groups.

The systematic description of phenotype variation has gained increasing importance since the discovery of the causal relationship between a genotype placed in a certain environment and a phenotype. It plays not only a role when accessing and mining medical records but also for the analysis of model organism data, genome sequence analysis and translation of knowledge across species. Accurate phenotyping has the potential to be the bridge between studies that aim to advance the science of medicine (such as a better understanding of the genomic basis of diseases), and studies that aim to advance the practice of medicine (such as phase IV surveillance of approved drugs).

Various research activities that attempt to understand the underlying domain knowledge exist, but they are rather restrictively applied and not very well synchronized. In this Phenotype Day we propose to trigger a comprehensive and coherent approach to studying (and ultimately facilitating) the process of knowledge acquisition and support for Deep Phenotyping by bringing together researchers and practitioners that include but are not limited to the following fields:

• biology as well as computational biology

• genomics, clinical genetics, pharmacogenomics, healthcare

• text/data mining and knowledge discovery

• knowledge representation and ontology engineering

For more information including paper submission deadlines and instructions, please go to http://phenoday2014.bio-lark.org/.

Jan 252014

Our paper describing the Vertebrate Taxonomy Ontology (VTO)  is published!  See: http://www.jbiomedsem.com/content/4/1/34 .

One primary objective for Phenoscape and similar projects is to aggregate phenotypic data from multiple studies to named taxa, which in many phylogenetic studies are species but also might be at higher taxonomic levels such as genera or families. While there are many widely used taxonomies that include rich sampling of species and higher taxa, for example Bill Eschmeyer’s widely used Catalog of Fishes, there are few vetted “bridging” taxonomies that allow for aggregating data across, say, fishes, amphibians, and mammals. This problem becomes even more acute when you consider integrating data for extinct taxa as well. As a first step towards addressing this issue for vertebrates, we created the Vertebrate Taxonomy Ontology (VTO) that brings together taxonomies from NCBI, AmphibiaWeb, the Catalog of Fishes (via the previously existing Teleost Taxonomy Ontology), and the Paleobiology Database. The resulting curated taxonomy contains more than 106,000 terms, more than 104,000 additional synonyms, and extensive cross-referencing to these existing taxonomies. The Phenoscape Knowledgebase will leverage this taxonomic ontology by allowing for phenotype statistics to be displayed by taxon, including coarse measures of the extent of annotation coverage and phenotypic variation. Though phenotypes may be annotated to a species, the use of an ontological framework for the taxonomic hierarchy facilitates aggregating phenotypes to higher levels, such as genera or families. In the future, we hope to be able to integrate other excellent and rich sources of taxon-specific taxonomies, such as that in the Reptile Database or the International Ornithologists’ Union Bird List. This is a work-in-progress and the Phenoscape team is certainly interested to integrate new taxonomic sources as well as explore different ways that such a resource can be used and developed by the larger community.


Filed under: Taxonomy Ontology, Vertebrates
Jan 162014

Last week, the Phenotype RCN hosted a cross-working group call featuring presentations by Ramona Walls on the Plant Ontology and Cross-Species Reasoning [pdf] and Laurel Cooper on Common Reference Ontologies for Plants.

Dr. Walls (The iPlant Collaborative, University of Arizona, and New York Botanical Garden) demonstrated how the PO defines anatomical terms in a way that they can be used across all green plant species. After an overview of the ontology, which can be searched and browsed at http://plantontology.org/, she discussed its evolution, main branches (“plant anatomical entity” and “plant structure developmental stage”), and characteristics shared with CARO (the Common Anatomy Reference Ontology). She talked about specific changes to the ontology that make it work better for all green plants and presented use cases concerning comparison of gene expression, traits, and phenotypes across species.

Dr. Cooper (Oregon State University) followed with a talk about the PO and cROP, the Common Reference Ontologies for Plants. She identified problems arising from free-text phenotype descriptions and scattered data resources, and demonstrated how the PO fits into the centralized cROP platform, where reference ontologies for plants will be used to access data sources for plant traits, phenotypes, diseases, genomes linked to gene expression and genetic diversity data across a wide range of plant species. The cROP Ontology Database may be accessed via its web portal, http://crop.cgrb.oregonstate.edu/.

Many thanks to Ramona and Laurel for their outstanding talks!

The Phenotype RCN plans to host monthly calls the first Monday of every month at 8 a.m. Pacific / 11 a.m. Eastern time. If you would like to receive invitations to join via WebEx, please email Erik Segerdell. Suggestions for topics and volunteers for presenters are welcome!

Jan 152014

[posted on behalf of István Mikó, Penn State University]

bat fly (Diptera: Nycteribiidae)

A bat fly (Diptera: Nycteribiidae) poses for the camera. It’s barely recognizable as a relative of the familiar Drosophila melanogaster (Diptera: Drosophilidae) and is radically different from the fish tongue-eating arthropod in the photo below. This photo by Gilles San Martin (CC BY-SA 2.0).

The Phenotype RCN Arthropod working group has focused mostly on the development and characterization of the Common Arthropod Anatomy Ontology (CAAO). Despite difficulties defining basic classes due to the immense differences in basic anatomical concepts—there are, after all, more than a million known species of arthropods, with almost as many different forms (see photos above and below)—we made progress in the development of certain portions of CAAO. During 2013 the group established the basis of classes and relationships referring anatomical structures of the arthropod the integument. Establishing this system is especially crucial for disciplines targeting world species diversity, such as arthropod taxonomy and phylogenetics, where more than 90% of the applied characters are related to the outer layer of the integument, the cuticle. Efforts have also been made on the development of the arthropod nervous system portion by the adoption of the relatively recently published relation system by Richter et al. (2010).
The development of CAAO is ongoing, but the focus of the working group has shifted a bit towards the establishment of outreach strategies for the better utilization of available ontologies by the domain experts and to establish tighter collaborations between research group members. We’re especially interested in seeding research that will become the basis new funding. The main subjects of this new directions are:

  1. Cooperation between working group members who develop tools that enhance ontology development and usage for domain expert communities [e.g., mx (Yoder 2014) and Morph·D·Base (Vogt and Grobe 2010)].
  2. Making more widely accessible and understandable the ideas developed by individual working group members on applications of ontologies to different areas of arthropod research [e.g. ontology based measure of structural complexity for phylogenies (Ramirez 2013); demarcating and differentiating basic categories of anatomical entities (Vogt 2010) and application of semantic models in species descriptions (Balhoff et al. 2013)].
  3. To define the role and place of homology concepts in arthropod ontologies [e.g. Szucsich and Wirkner 2007, Franz 2013].

During 2013 some members of the Arthropod-working group were able to meet during two Arthropod specific meetings organized in Germany: Willi Hennig Society meeting in Rostock August 3rd-7th and the 6th Dresden Meeting on Insect Phylogeny in Dresden September 27-29. Although it was possible at these meetings to start cooperation on the above mentioned areas and outline future directions of the working group, it was also accepted that further meetings with more participants of the working group is needed in terms of establishing the planned collaborations. The annual RCN summit meeting on 21-23 February 2014 will be attended by most of the key personnels and hopefully help the working group to assure a workflow to reach the new directions.

Fish tongue-eating isopod (Isopoda: Cymothoidae), with radically different anatomy and phenotypes than the bat fly above. Photo by Andy Heyward (CC BY-NC-SA 2.0).


  • Balhoff J, Mikó I, Yoder M, Mullins P, Deans AR (2013) A semantic model for species description, applied to the ensign wasps (Hymenoptera: Evaniidae) of New Caledonia. Systematic Biology 62 (5): 639–659 doi: 10.1093/sysbio/syt028.
  • Franz N (2013) Anatomy of cladistic analysis. Cladistics 2013: 1–28 doi: 10.1111/cla.12042
  • Ramirez (2013) An ontology-based measure of structural complexity for phylogenies. XXXII Meeting of the Willi Hennig Society. August 2013.
  • Richter S, Roesel R, Purschke G, Schmidt-Rhaesa A, Scholtz G, Stach T, Vogt L, Andreas W, Brenneis G, Döring C, Faller S, Fritsch M, Grobe P, Heuer CM, Kaul S, Møller OS, Müller CH, Rieger V, Rothe BH, Stegner ME, Harzsch S (2010) Invertebrate neurophylogeny: suggested terms and definitions for a neuroanatomical glossary. Frontiers in Zoology 7:1-49. doi: 10.1186/1742-9994-7-29
  • Szucsich MU, Wirkner C (2007) Homology: a synthetic concept of evolutionary robustness of patterns. Zoologica Scripta 36: 281–289. doi: 10.1111/j.1463-6409.2007.00275.x
  • Yoder MJ (2014) “mx” Web-based content management system for biodiversity informatics. http://mx.phenomix.org
  • Vogt L (2010) Spatio-structural granularity of biological material entities. BMC Bioinformatics 11: 289. doi: 10.1186/1471-2105-11-289
  • Vogt L. and Grobe P (2010) Morph·D·Base – Eine online Datenbank für morphologische Daten und Metadaten. GfBS Newsletter 24: 29–34. https://www.morphdbase.de/
Jan 102014

In an effort to expand the user community and to demonstrate what is possible using our infrastructure, members of the Phenoscape team gave multiple presentations across two continents on our recent developments. In late October Paula Mabee gave an invited presentation on mapping phenotypes across phylogenies at the Muséum national d’Histoire naturelle in Paris. This was followed by presentations at the 73rd annual meeting of the Society of Vertebrate Paleontology (SVP) in Los Angeles and the 2013 meeting of the Taxonomic Database Working Group (TDWG) in Florence, Italy. Phenoscape had a significant presence at SVP with both a poster presented by Alex Dececchi demonstrating our progress in generating supermatrices from our annotations as well as a talk given by collaborator Karen Sears, using EQ supermatrices from Phenoscape fin/limb data to examine integration patterns across the fin to limb transition. Karen’s talk marks the first of the collaborations coming out of our 2013 San Francisco workshop. It also showed how data from Phenoscape can drive independent projects and is easily integrated with existing phylogenetic and statistical tool such as Mesquite and various R modules. The talks and poster were well received, with numerous researchers inquiring on how they could incorporate Phenoscape or use ontology based annotations.

Filed under: Conferences
Nov 262013

A handful of new papers of interest are available at the Journal of Biomedical Semantics, which is publishing a collection of articles related to biomedical ontologies and ontology updates:

• P. E. Midford et al: The vertebrate taxonomy ontology: a framework for reasoning across model organism and species phenotypes

• R. Nigam et al: Rat Strain Ontology: structured controlled vocabulary designed to facilitate access to strain data at RGD

• P. Ciccarese et al: PAV ontology: provenance, authoring and versioning

• K. M. Livingston et al: Representing annotation compositionality and provenance for the Semantic Web

Nov 052013

via Robin Haw

On behalf of the Organizing Committee for the 7th International Biocuration Conference, I am delighted to announce that our website and registration are now open:


The conference will be held at Hart House, in Toronto from 6-9th April 2014 and it would be wonderful to see you there.  We have four keynote speakers confirmed:

Dr. Tim Hubbard, Wellcome Trust Sanger Institute

Dr. Suzanna Lewis, Lawrence Berkeley National Laboratory

Dr. Patricia Babbitt, California Institute for Quantitative Biosciences (QB3)

Dr. Lincoln Stein, Ontario Institute for Cancer Research

Early bird registration rates apply until 7th March 2014. We have secured discount rates at three hotels in Toronto; please see the Biocuration 2014 website for more information on booking.

Please note that the paper submission deadline is 15th November 2013. So there is limited time to put your paper together.

The deadline for the abstract submission to present at the conference is 10th February 2014.

Oct 292013

by Karen Eilbeck

One of our tasks at the SO-GENO phenotype workshop in Portland this fall, was to formalize the description of phenotypic data in genomic annotation. Previously we had written instructions in the use of phenotype ontologies such as HPO when creating variant file annotations in Genome Variation Format (GVF). GVF is a tab delimited variant file for the detailed annotation of sequence variants, and the specification is managed as part of the Sequence Ontology. Our revised guidelines were split into human and non-human recommendations to reflect the diversity in phenotypic annotation resources. We address best practices for annotation, provide easy to follow examples, and discuss the process for requesting new terms from the phenotype resources. The recommendations are available here and have been registered with Biosharing as a reporting guideline. Biosharing is a website to register and track well-constituted efforts to develop standards for describing and sharing biosciences experiments; see more here.

Oct 292013

by Matthew Brush

In September 2013, the Phenotype RCN sponsored a three-day workshop at Oregon Health & Science University to align sequence feature and genetic variation representation and thereby support phenotype data integration. Participants included developers of the Sequence Ontology  (SO) [1] (Karen Eilbeck, Mike Bada, and Bret Heale), and members of the ontology team from the Monarch Initiative [2] who have been developing a genotype ontology called GENO (Matthew Brush, Melissa Haendel, and Chris Mungall).


One of the goals of the Phenotype RCN is to promote coordination and standardization of phenotype-related data. A standardized representation of genotype information is required for integrating genetically-linked phenotype data from diverse sources  including model organism, human variation, livestock, and evolutionary databases.  A particular challenge relates to harmonizing phenotype annotations where they are linked to genetic variations at different levels of granularity – from complete strain genotypes, to specific gene alleles, to single nucleotide polymorphisms.

Monarch and SO Projects

The Monarch Initiative is a new effort that aims to integrate genotype-to-phenotype and related data from numerous sources under a common semantic framework, and develop tools and services for user-guided exploration and analysis. Towards this end, Monarch required development of new modeling for genotypes (housed in GENO), which was lacking in the ontology landscape. The scope of GENO necessarily overlaps with that of the Sequence Ontology, but has a unique perspective on sequence features as they relate to linking different scales of genetic variation and to organismal phenotypes. The need to align modeling between SO and GENO motivated our collaboration, which was particularly timely as the SO had recently initiated a refactoring to accommodate use cases beyond its initial charge of genome annotation. This refactoring aimed to  define the context of the SO with respect to the Basic Formal Ontology (BFO) and other OBO ontologies, enhance representation of sequence variation, and develop a parallel representation of material sequence features (MSO) to complement the abstract feature representation in the existing SO. These goals were consistent with those of Monarch to support better phenotype data integration and therefore a workshop was funded by the Phenotype RCN.

Genetic Variation in GENO

The genotype information modeled in GENO is broadly conceived to include any variation in gene expression that is tied to an observed phenotypic effect. Two types of ‘genetic variation’ are explicitly distinguished in GENO: (1) ‘Sequence-variation’ describes changes in the sequence of an organism’s genome, which are captured in the traditional genotypes shared by biologists. In this context, ‘sequence variant genes’ are heritable changes in genomic DNA, and include things like point mutations, SNPs, or transgenic insertions that are represented in SO. (2) ‘Expression-variation’ relates to experimental alterations in the expression-level of genes that are not due to changes in the sequence of the subjects’ genome. Here, ‘expression variant gene’ are genes that are altered in the level of their expression as a result of some experimental intervention such as targeted gene knock-down using reagents such as morpholinos and RNAi, or transient expression from DNA constructs. Like sequence variants, these expression variants change what is expressed in an organism and can lead to measurable phenotypic outcomes.  The GENO ontology aims to re-use and co-develop the SO sequence variation model, but the notion of expression variation was concluded to be outside the SO scope. Modeling in GENO will extend and be logically consistent with the SO approach and will leverage links to orthogonal ontologies to place variation in a broader biological context [3]. Additional information about the SO and GENO models and their interaction can be found in the presentation posted here [4].

Workshop Goals and Outcomes

One of the immediate goals of our workshop was to find consensus on high-level ontological issues that have yet to be resolved in the development of these and other OBO Foundry ontologies and document these decisions for the community.  Many such issues have been broadly debated for years, and our outcomes may be relevant for other domains or applications in biomedical research. Much progress was made in resolving key issues, and a plan was established for ongoing collaborative work.  Some outcomes are below, and more detailed notes can be found here [5].

  1. Terminological standardization of core terms.  Terms such as ‘sequence’, ‘gene’, ‘allele’, variant’, ‘reference’, ‘mutant’, ‘genetic’ are variably and ambiguously used across communities, and required precise definitions and consistent use.  Work is ongoing to craft such definitions, which will be reflected in our respective ontologies as they are refined and vetted.
  2. The ontological nature of sequences and sequence features (and their place in the BFO/IAO framework).  Specific topics included: (1) the merits and implications of modeling sequence features as generically dependent continuants, or more specifically as information content entities, (2) defining identity criteria for sequence features to include their sequence and their position (as opposed to sequence only), (3) how to model attributes of sequence features such as biological activity, experimental provenance, reference status, and zygosity, and (4) the ways in which sequence features are considered to vary with respect each other (e.g. wild-type vs mutant sequences, reference vs alternate sequences).
  3. Gene representation, and modeling the central dogma. We debated strategies to provide an OWL-based ontological representation and identifiers for genes and their variants, that would serve SO, Monarch, and the broader phenotype community.  Related discussions focused on how to build from this gene representation to link to derived sequences at RNA and protein levels, and describe properties that emerge in this derivation.
  4. Variant representation.  A precise and explicit account of how the concept of ‘sequence variation’ should be defined across SO and GENO was established. In this model, a ‘variant’ is any sequence feature that varies_with some other instance of the same feature.  So sequence variants are considered to be ‘variant_with’ any other version of that feature, rather than ‘variants_of’ some reference. But we will also represent more specific types of the ‘variant_with’ relation that describe the different ways biologists consider sequences to vary with each other based on the roles that the variants in this relation hold (including where one is reference and another alternate versions, or one is wild-type and the other mutant). This is a critical facet of relating phenotypes to genotypes.
  5. Integration of expression-level variation modeling in GENO with sequence-variation modeling in SO.  Here, the high level approach for representing expression variation in terms of genetic sequences that are altered in their expression was reviewed and vetted by members of Monarch and SO teams.  Several approaches for conceptual integration of the expression and sequence variation models are under consideration.
  6. Technical approaches for coordinated development.  Topics included how to manage parallel construction and coordination of abstract SO and physical MSO ontologies – where strategies for automated derivation of the SO from the MSO were reviewed.  In addition, we discussed how to manage community development of SO and GENO as integrated but separate ontologies, using existing platforms, tools, and standards for software development (Google projects, trackers, list-serves, build and QA tools, etc).

As noted above, more details on each of these topics, as well as many others, can be found in the document here [5].  Participation of the broader community is encouraged through feedback on this document or participation in ongoing coordination calls (contact brushm@ohsu.edu for info).


  1. http://www.sequenceontology.org/
  2. http://monarchinitiative.org/
  3. ICBO 2013 conference paper - http://www2.unb.ca/csas/data/ws/icbo2013/papers/ec/icbo2013_submission_60.pdf
  4. Presentation to the Phenotype RCN, October 2013 - http://www.slideshare.net/mhb120/phenotype-rcn-sogenoworkshopshared
  5. Google doc summarizing workshop outcomes - https://docs.google.com/document/d/1AUEVX0Sx_iy9mTI6F59Yo7ZCXu4zv5uSk28AHid5zhc/edit#