phenotypercn

Jun 102015
 

Visualizing one or more trees/taxonomies with non-trivial number of characters and taxa is a challenge a number of projects is facing. The ETC project organized a workshop with information visualization experts, data providers (trees and characters), and end users to tackle the challenge together.
The meeting was organized by Hong Cui and hosted by Bertram Ludäscher in the National Center for Supercomputing Applications (NCSA), Urbana, IL, in May 11-13. Phenotype RCN participants Matt Yoder, Nico Franz and Martín Ramírez attended the meeting and posed vis challenges. Much of the workshop was devoted to brainstorm on the challenge of representing a large dataset together with some kind of mapping on a tree, and often on two trees simultaneously. This is a familiar challenge for anatomy ontologists, who are trying to represent the interaction of phylogenetic trees, matrices and ontologies:

Right, the level of anatomical data available for different parts of the fin and limb can be visualized for taxa along the fin to limb transition (figure from Dececchi et al. in press 2015, Systematic Biology, doi: 10.1093/sysbio/syv031). Left, a phylogeny of spiders colored according to anatomical complexity, derived from the ontology (figure from Ramírez & Michalik 2014, doi: 10.1111/cla.12075).

Right, the level of anatomical data available for different parts of the fin and limb can be visualized for taxa along the fin to limb transition (figure from Dececchi et al. in press 2015, Systematic Biology, doi: 10.1093/sysbio/syv031). Left, a phylogeny of spiders colored according to anatomical complexity, derived from the ontology (figure from Ramírez & Michalik 2014, doi: 10.1111/cla.12075).


The beautiful and clever examples presented by the vis experts were inspiring. How these gorgeous examples can help us represent or complex data in intuitive visualizations? Filters, sort controls, heat maps, zoom panes, collapsing, expanding, and more tools – all made in us two effects: Make some of our challenges look feasible, and refine our vague ideas into more precise challenges.
Hierarchical Circular Layouts, or HCL (Dang, Murray, and Forbes 2015), uses circular layouts on a hierarchical structure.

Hierarchical Circular Layouts, or HCL (Dang, Murray, and Forbes 2015), uses circular layouts on a hierarchical structure.


PathwayMatrix: Visualizing Binary Relationships between Proteins in Biological Pathways (Dang, Murray, and Forbes, 2015). PathwayMatrix can be used not only for biological pathway visualization, but also for character and taxon data.  See: http://biovis.net/year/2015/papers/pathwaymatrix-visualizing-binary-relationships-between-proteins-biological-pathways

PathwayMatrix: Visualizing Binary Relationships between Proteins in Biological Pathways (Dang, Murray, and Forbes, 2015). PathwayMatrix can be used not only for biological pathway visualization, but also for character and taxon data. See: http://biovis.net/year/2015/papers/pathwaymatrix-visualizing-binary-relationships-between-proteins-biological-pathways


The ETC project will implement a few promising techniques as part of ETC toolkit to invite comments and suggestions from broader communities. Stay tuned, and crunch your data into nice visualizations!
(post by Martin Ramírez)

Jun 082015
 

CARO/PCO Oregon Summit 2014

Figure 1. SubClass hierarchy of upper-level classes in  the Common Anatomy Reference Ontology (CARO, orange boxes), plus relations to critical external ontology terms from Basic Formal Ontology (BFO, grey boxes), Population and Community Ontology (PCO, yellow box), and Gene Ontology (GO, pink boxes). Terms in light purple boxes are found in multiple ontologies (CL=Cell Ontology). The term ‘Organism’ in the green box is not an ontology term but a class from the Darwin Core Vocabulary that is a subclass of the new CARO term ‘biological entity’. ‘Biological entity’ is a catch-all term for any material entity that is, is part of, or derived from an organism, virus, or viroid, or a collection of them.

Figure 1. SubClass hierarchy of upper-level classes in the Common Anatomy Reference Ontology (CARO, orange boxes), plus relations to critical external ontology terms from Basic Formal Ontology (BFO, grey boxes), Population and Community Ontology (PCO, yellow box), and Gene Ontology (GO, pink boxes). Terms in light purple boxes are found in multiple ontologies (CL=Cell Ontology). The term ‘Organism’ in the green box is not an ontology term but a class from the Darwin Core Vocabulary that is a subclass of the new CARO term ‘biological entity’. ‘Biological entity’ is a catch-all term for any material entity that is, is part of, or derived from an organism, virus, or viroid, or a collection of them.

Before the post-Thanksgiving haze had lifted, a small group of ontologists (Melissa Haendel, Chris Mungall, David Osumi-Sutherland, and Ramona Walls) converged on the lovely small town of Brownsville, Oregon to work on the Common Anatomy Reference Ontology (CARO), the Population and Community Ontology (PCO), and PATO, an ontology of biological qualities. This work was done within the context of the larger group of ontologies that make use of or are used by CARO (UBERON, GO, CL).

CARO is a relatively small upper ontology with ~165 classes and a few core relations that is used to link taxon-specific anatomy ontologies ranging from fruit flies to vertebrates to plants. The 1.0 release of CARO has been widely used, but usage has been quite inconsistent and sometimes incorrect. This is partly due to lack of clarity in some definitions, but also because it was written at a time when we lacked the tools to provide automated reports of incorrect usage.

PCO is recently developed ontology focussing on populations, communities and the relationships between organisms.  The definitions of organism types in CARO are critically important for this ontology, as are the biological qualities applying to groups of organisms in PATO.

PATO, an ontology of biological qualities, has been very widely used by the community brought together by the Phenotype RCN as well as in defining classes in a wide range of other ontologies used by this community (covering phenotypes, anatomy, cell types and populations).  So far, PATO has had limited axiomatisation, but there many obvious cases where axiomatisation could improve its integration with ontologies that use it – including the PCO and anatomy ontologies.

A major aim of our work on CARO at this meeting was to redraft textual definitions so that they could be understood by any competent biologist and to redraft logical definitions so that they could be used for automated classification and error checking. For both logical and textual definitions, we aimed to focus on distinctions that are important to biologists – either directly, or indirectly by making biologically useful queries possible.  We also aimed to take into account new use cases that have arisen since CARO 1.0 was released, as a result of work on the PCO as well as on anatomy ontologies and the ontologies and tools that use them.  In parallel with this work, we aimed to improve related axiomatisation of PATO.

Over two and half days of leftover turkey, home-fermented vegetables, and farm-fresh eggs, we took care of operational issues such as repository maintenance, as well as more hard-core ontologizing. A highlight of the meeting was an informal gathering on Monday night when we were joined by Laurel Cooper and John Campbell from Oregon State University and Joe Fontaine from Murdoch University to discuss the intersections of ontologies, ecology, plant traits, and biodiversity.

Key outputs of the meeting were:

  • A fresh github repo (https://github.com/obophenotype/caro/) for CARO, with cleaned up imports.
  • New CARO terms, including terms for multicellular anatomical structure and expression pattern, and a general term for organs.
  • Revised text and logical definitions for most CARO terms, including anatomical structure, cellular organism, and organ (figure 1, Vue file that shows the key classes and which files they live in).
  • Draft ontology design patterns (ODPs) for expression patterns and for anatomical structures with internal spaces (lumens).
  • Further development of PCO, including updating import files, testing ODPs for defining collections of organisms and species/organism interactions.
  • A pending beta release of CARO2.0 and plans for how to announce it.
  • Better formalization of PATO through general class axioms (GCIs) necessary for CARO and PCO.
  • A Jenkins job that reports on and verifies ontologies that use CARO (FBBT, PO, XAO, and ZFA))
  • A draft paper on CARO2.0.

One of the key use cases for anatomy ontologies is annotation of gene expression, and we wanted a way to help curators avoid the pitfall of annotating expression to the (immaterial) space that is part of a structure rather than the (material) structure that surrounds it. We propose a design pattern in which any structure that has an interior space (such as stomach) would be modeled using four classes: one for the entire structure (which includes both the surrounding structure and the space that is part of it), one for the space, one for the wall (which is just the surrounding structure without the space) and one for “wall region”. A wall region is any portion of the wall that spans the full thickness of the wall for its entire lateral extent, whereas the wall is the mereotopological sum of all wall regions. Following this pattern, an ontology that wished to include a stomach would have classes for “stomach”, “stomach lumen”, “stomach wall”, and “region of stomach wall”. We opted against including very general classes such as “wall” or “wall region” in CARO, and instead plan to document the pattern and provide a template for its use in anatomy ontologies.

One way of specifying the structures such as a stomach that have a geometric component is through the use of GCIs in PATO. PATO includes a number of classes for qualities describing shape. Of these, lumenized, tubular, and saccular are the most relevant to CARO. We began adding GCIs to PATO of the form:

  • bearer_of some lumenized subClassOf ‘has part’ some lumen
  • bearer_of some unlumenized subClassOf not (‘has part’ some lumen)

An open question remains on how to document these patterns (in CARO or as separate patterns). One possibility is for CARO to include abstract geometrical classes such as “anatomical tube” or “anatomical tube wall” and “tube lumen”.

Stay tuned for another post soon, with the upcoming release of CARO!