Jun 082015

CARO/PCO Oregon Summit 2014

Figure 1. SubClass hierarchy of upper-level classes in  the Common Anatomy Reference Ontology (CARO, orange boxes), plus relations to critical external ontology terms from Basic Formal Ontology (BFO, grey boxes), Population and Community Ontology (PCO, yellow box), and Gene Ontology (GO, pink boxes). Terms in light purple boxes are found in multiple ontologies (CL=Cell Ontology). The term ‘Organism’ in the green box is not an ontology term but a class from the Darwin Core Vocabulary that is a subclass of the new CARO term ‘biological entity’. ‘Biological entity’ is a catch-all term for any material entity that is, is part of, or derived from an organism, virus, or viroid, or a collection of them.

Figure 1. SubClass hierarchy of upper-level classes in the Common Anatomy Reference Ontology (CARO, orange boxes), plus relations to critical external ontology terms from Basic Formal Ontology (BFO, grey boxes), Population and Community Ontology (PCO, yellow box), and Gene Ontology (GO, pink boxes). Terms in light purple boxes are found in multiple ontologies (CL=Cell Ontology). The term ‘Organism’ in the green box is not an ontology term but a class from the Darwin Core Vocabulary that is a subclass of the new CARO term ‘biological entity’. ‘Biological entity’ is a catch-all term for any material entity that is, is part of, or derived from an organism, virus, or viroid, or a collection of them.

Before the post-Thanksgiving haze had lifted, a small group of ontologists (Melissa Haendel, Chris Mungall, David Osumi-Sutherland, and Ramona Walls) converged on the lovely small town of Brownsville, Oregon to work on the Common Anatomy Reference Ontology (CARO), the Population and Community Ontology (PCO), and PATO, an ontology of biological qualities. This work was done within the context of the larger group of ontologies that make use of or are used by CARO (UBERON, GO, CL).

CARO is a relatively small upper ontology with ~165 classes and a few core relations that is used to link taxon-specific anatomy ontologies ranging from fruit flies to vertebrates to plants. The 1.0 release of CARO has been widely used, but usage has been quite inconsistent and sometimes incorrect. This is partly due to lack of clarity in some definitions, but also because it was written at a time when we lacked the tools to provide automated reports of incorrect usage.

PCO is recently developed ontology focussing on populations, communities and the relationships between organisms.  The definitions of organism types in CARO are critically important for this ontology, as are the biological qualities applying to groups of organisms in PATO.

PATO, an ontology of biological qualities, has been very widely used by the community brought together by the Phenotype RCN as well as in defining classes in a wide range of other ontologies used by this community (covering phenotypes, anatomy, cell types and populations).  So far, PATO has had limited axiomatisation, but there many obvious cases where axiomatisation could improve its integration with ontologies that use it – including the PCO and anatomy ontologies.

A major aim of our work on CARO at this meeting was to redraft textual definitions so that they could be understood by any competent biologist and to redraft logical definitions so that they could be used for automated classification and error checking. For both logical and textual definitions, we aimed to focus on distinctions that are important to biologists – either directly, or indirectly by making biologically useful queries possible.  We also aimed to take into account new use cases that have arisen since CARO 1.0 was released, as a result of work on the PCO as well as on anatomy ontologies and the ontologies and tools that use them.  In parallel with this work, we aimed to improve related axiomatisation of PATO.

Over two and half days of leftover turkey, home-fermented vegetables, and farm-fresh eggs, we took care of operational issues such as repository maintenance, as well as more hard-core ontologizing. A highlight of the meeting was an informal gathering on Monday night when we were joined by Laurel Cooper and John Campbell from Oregon State University and Joe Fontaine from Murdoch University to discuss the intersections of ontologies, ecology, plant traits, and biodiversity.

Key outputs of the meeting were:

  • A fresh github repo (https://github.com/obophenotype/caro/) for CARO, with cleaned up imports.
  • New CARO terms, including terms for multicellular anatomical structure and expression pattern, and a general term for organs.
  • Revised text and logical definitions for most CARO terms, including anatomical structure, cellular organism, and organ (figure 1, Vue file that shows the key classes and which files they live in).
  • Draft ontology design patterns (ODPs) for expression patterns and for anatomical structures with internal spaces (lumens).
  • Further development of PCO, including updating import files, testing ODPs for defining collections of organisms and species/organism interactions.
  • A pending beta release of CARO2.0 and plans for how to announce it.
  • Better formalization of PATO through general class axioms (GCIs) necessary for CARO and PCO.
  • A Jenkins job that reports on and verifies ontologies that use CARO (FBBT, PO, XAO, and ZFA))
  • A draft paper on CARO2.0.

One of the key use cases for anatomy ontologies is annotation of gene expression, and we wanted a way to help curators avoid the pitfall of annotating expression to the (immaterial) space that is part of a structure rather than the (material) structure that surrounds it. We propose a design pattern in which any structure that has an interior space (such as stomach) would be modeled using four classes: one for the entire structure (which includes both the surrounding structure and the space that is part of it), one for the space, one for the wall (which is just the surrounding structure without the space) and one for “wall region”. A wall region is any portion of the wall that spans the full thickness of the wall for its entire lateral extent, whereas the wall is the mereotopological sum of all wall regions. Following this pattern, an ontology that wished to include a stomach would have classes for “stomach”, “stomach lumen”, “stomach wall”, and “region of stomach wall”. We opted against including very general classes such as “wall” or “wall region” in CARO, and instead plan to document the pattern and provide a template for its use in anatomy ontologies.

One way of specifying the structures such as a stomach that have a geometric component is through the use of GCIs in PATO. PATO includes a number of classes for qualities describing shape. Of these, lumenized, tubular, and saccular are the most relevant to CARO. We began adding GCIs to PATO of the form:

  • bearer_of some lumenized subClassOf ‘has part’ some lumen
  • bearer_of some unlumenized subClassOf not (‘has part’ some lumen)

An open question remains on how to document these patterns (in CARO or as separate patterns). One possibility is for CARO to include abstract geometrical classes such as “anatomical tube” or “anatomical tube wall” and “tube lumen”.

Stay tuned for another post soon, with the upcoming release of CARO!

May 272015

How do phenotypic data factor into the issues relating to integrating complex data? Three frequent phenotypers (Ramona Walls, Chris Mungall, and Maryann Martone) were supported by this RCN to participate with sixteen others in an ‘Integrating Complex Data’ workshop organized by the American Institute for Biological Sciences (AIBS) with NSF funding (EF-1450894), on March 30-31 at the Hyatt Regency Crystal City in Arlington, Virginia. The workshop was co-chaired by Paula Mabee, Corinna Gries, and Robert Gropp, facilitated by Kathy Joyce, and observed by various program officers and staff from NSF.

Complex data integration, defined as ‘bringing together data from two or more fields’, is required to address many fundamental scientific questions as well as understanding how to mitigate the challenges facing the planet. Participants (whose research interests ranged from genetics, genomics, metagenomics, systematics, taxonomy, and ecology, to bio/eco-informatics and cyberinfrastructure development) initially discussed specific use cases in which complex data integration was required. They then focused on the barriers that impede integration, recognizing domain silos as major problem at this scale. They illustrated with examples that data discovery and integration are currently hampered by lack of common standards, including those for IDs, representation, ontologies, data formats, data collection, and communication protocols.  The usefulness of ontologies in connecting phenotypic data to other data types across domains was described by Phenotype RCN participants.

Suggestions and next steps required to achieve better data integration were the focus of the second day of the workshop. Community coalescence around shared standards, rather than more standards, was considered key.  Participants advocated for interagency discussions about how to provide linkages across their data systems, thus making data from all sites more readily discoverable and distributing the financial burden.  Participants further recognized that the technical expertise required for complex data integration is high; they promoted cross-training in informatics for graduate students and a higher level of specialist ‘data scientist’ training.  They also felt that funding mechanisms to enable scientists to employ technical specialists for specific data integration tasks would enable complex data integration.  Particularly at this juncture, where cross-domain data analysis is required to address societal problems, participants stressed that it is important to try to solve the immediate problems while working toward long-range solutions.  A full report from this workshop is in preparation and a link will be posted when it is available.

 Posted by on May 27, 2015 at 11:55 pm
Apr 212015

Photo used under Creative Commons from Kirt Edblom. https://www.flickr.com/photos/27190564@N02/15496544935

The Alfred P. Sloan Foundation’s Digital Information Technology program has awarded $499K to Phoenix Bioinformatics to catalyze development of creative new user-based funding strategies for research databases.   Phoenix was founded in 2013 by the staff of the Arabidopsis Information Resource (TAIR) to provide new support mechanisms as TAIR transitioned away from grant-based funding. Following its success with TAIR, Phoenix will be assisting other databases with their funding challenges and helping them find new ways to sustain their projects for the long term with community help. For more information about TAIR’s transition to sustainable funding or the newly funded Phoenix project please contact Phoenix Bioinformatics.

Phoenix Bioinformatics is a nonprofit 501(c)3 organization dedicated to finding innovative ways to sustain critical scientific resources.

 Posted by on April 21, 2015 at 7:35 pm
Jan 092015

Figure assembled by Anya Broverman-Wray (CC BY 2.0) doi: 10.1371/journal.pbio.1002033.g001

In case you missed it, our latest Phenotype RCN publication came out this week in PLoS Biology. In this perspective we argue for more investment in the infrastructure needed to make phenotypes more accessible. Check it out!

Abstract.—Despite a large and multifaceted effort to understand the vast landscape of phenotypic data, their current form inhibits productive data analysis. The lack of a community-wide, consensus-based, human- and machine-interpretable language for describing phenotypes and their genomic and environmental contexts is perhaps the most pressing scientific bottleneck to integration across many key fields in biology, including genomics, systems biology, development, medicine, evolution, ecology, and systematics. Here we survey the current phenomics landscape, including data resources and handling, and the progress that has been made to accurately capture relevant data descriptions for phenotypes. We present an example of the kind of integration across domains that computable phenotypes would enable, and we call upon the broader biology community, publishers, and relevant funding agencies to support efforts to surmount today’s data barriers and facilitate analytical reproducibility.

Deans AR, Lewis SE, Huala E, Anzaldo SS, Ashburner M, Balhoff JP, Blackburn DC, Blake JA, Burleigh JG, Chanet B, Cooper LD, Courtot M, Csösz S, Cui H, Dahdul W, Das S, Dececchi TA, Dettai A, Diogo R, Druzinsky RE, Dumontier M, Franz NM, Friedrich F, Gkoutos GV, Haendel M, Harmon LJ, Hayamizu TF, He Y, Hines HM, Ibrahim N, Jackson LM, Jaiswal P, James-Zorn C, Köhler S, Lecointre G, Lapp H, Lawrence CJ, Le Novère N, Lundberg JG, Macklin J, Mast AR, Midford PE, Mikó I, Mungall CJ, Oellrich A, Osumi-Sutherland D, Helen Parkinson, Ramírez MJ, Richter S, Robinson PN, Ruttenberg A, Schulz KS, Segerdell E, Seltmann KC, Sharkey MJ, Smith AD, Smith B, Specht CD, Squires RB, Thacker RW, Thessen A, Fernandez-Triana J, Vihinen M, Vize PD, Vogt L, Wall CE, Walls RL, Westerfeld M, Wharton RA, Wirkner CS, Woolley JB, Yoder MJ, Zorn AM, Mabee PM. (2015) Finding our way through phenotypes. PLoS Biology 13(1): e1002033. DOI: 10.1371/journal.pbio.1002033

Dec 072014

euroevodevoVienna2014In late July, the Phenotype RCN and Phenoscape co-sponsored several speakers in the symposium “What should Bioinformatics do for EvoDevo?” co-organized by Günter Plickert, Mark Blaxter, Paula Mabee and Ann Burke. The symposium was part of the European Society for Evolutionary Developmental Biology (EED) meeting, held in Vienna. The organizers brought together speakers whose research and perspectives provided examples of how EvoDevo data integration is necessary for discoveries.  Several speakers presented new insights into EvoDevo that were directly derived from sequencing genomes or transcriptomes.   Others showed how by using semantic methods to represent species phenotypes, they could be linked to genetic and developmental data, and the research questions that they addressed. This well-attended symposium met its goals, which were to:

  • promote awareness of new and developing resources and methods as well as EvoDevo uses of existing ones.
  • promote discussions in the EvoDevo community that value input of bioinformatics to EvoDevo questions.
  • invite the audience to share their ideas of how to move the integration forward

The excellent organization of this conference and the wonderful venue helped spark several new collaborations and grant proposals.  Talks and speakers in this symposium included (full program found here):

  1. Bioinformatics for EvoDevo: Connecting evolutionary morphology and model organism genetics, presented by Paula Mabee (University of South Dakota, Vermillion, SD, USA)
  2. Insights into the evolution and development of planarian regeneration from the genome of the flatworm, Girardia tigrina, presented by Sujai Kumar (University of Oxford, GBR)
  3. From the wet lab to the computer and back: A stage specific RNAseq analysis elucidates the molecular underpinnings and evolution of Hydrozoan development, presented by Philipp Schiffer (University of Cologne, GER)
  4. Insights into the evolution of early development of parthenogenetic nematodes by second generation sequencing, presented by Christopher Kraus (University of Cologne, GER)
  5. Petaloidy, polarity and pollination: The evolution of organ morphology networks, presented by Chelsea Specht (University of California Berkeley, CA, USA)
  6. Aligning phonemes and genomes to understand the evolution of multicellular organisms, presented by Philip Donoghue (University of Bristol, GBR)
  7. Online databases provide critical insights into the evolution of appendage modularity during the fin to limb transition, presented by Karen Sears (University of Illinois, Urbana, IL, USA)
  8. Evolutionally conserved mechanisms of regeneration in chordates: Uncovering pathways active during WBR in Botrylloides leachi, presented by Lisa Zondag (University of Otago, Dunedin, NZL)
  9. Phylogenomics of MADS-box genes in flowering plants to identify EvoDevo genes, presented by Guenter Theissen (Friedrich Schiller University Jena, GER)
  10. Illuminating the evolutionary origin of the turtle shell by a comparative tissue-specific transcriptome analysis, presented by Juan Pascual-Anaya (RIKEN Center for Developmental Biology, Kobe, JPN)
  11. Blastodermal segmentation in the milkweed bug, Oncopeltus facsiatus, presented by Ariel Chipman (The Hebrew University of Jerusalem, ISR)
  12. The origins of arthropod innovations: Insights from the noninsect arthropods, the cherry shrimp and rusty millipede, presented by Nathan Kenny (The Chinese University of Hong Kong, HKG)
 Posted by on December 7, 2014 at 4:52 pm
Nov 142014

Figure3-Revised.copyDo sponges have true tissues? This fundamental question is just one of the controversial topics that Phenotype RCN team members encountered as they constructed a new ontology to describe the unique features of sponge anatomy. As you can see from the diagram below, the team opted to describe “functional layers” of sponge cells, re-using the CARO class ‘portion of tissue’ to contain these layers.

The recently published Porifera ontology (PORO) is an outcome of Phenotype RCN meetings that matched experts in creating ontologies with taxonomists seeking to improve phenotype descriptions and databases. Sponge biologists Bob Thacker, Cristina Díaz, Adeline Kerner, and Régine Vignes-Lebbe teamed up with information scientists Chris Mungall, Melissa Haendel, and Erik Segerdell to generate the ontology from an existing thesaurus of anatomical terms. The ontology is currently being used to allow natural language processing software to efficiently extract morphological characters from taxonomic monographs.

Citation: Thacker RW, Díaz MC, Kerner A, Vignes-Lebbe R, Segerdell E, Haendel MA, Mungall CJ. 2014. The Porifera Ontology (PORO): enhancing sponge systematics with an anatomy ontology. Journal of Biomedical Semantics 5:39. doi: 10.1186/2041-1480-5-39.

Agelas conifera 14Jan06 066 copy 2.jpg

 Posted by on November 14, 2014 at 2:40 am
Oct 152014

Several members of the Plant Working Group got together at Phoenix Bioinformatics in lovely Redwood City California at the end of September to write up results of the long-running Plant Phenotype Pilot Project (or PPPP as the cognoscenti call it). The first draft is affectionately known as the Plant Phenotype Pilot Project Preliminary Paper (or PPPPPP).

Left: Wild type Arabidopsis and several adherent leaf mutants, from Voisin et al. 2009, PLoS Genetics 5(10):e1000703 Fig. 1.  Right: Zea mays adherent dwarf. From MaizeGDB (http://images.maizegdb.org/db_images/Variation/mgn/5207_1613_1042_29.jpg).

Figure 1   5207_1613_1042_29

Despite their obsession with the letter P, working group members Carolyn Lawrence, David Meinke, Ramona Walls, Lisa Harper and Eva Huala — with the help of Anika Oellrich, Laurel Cooper, Pankaj Jaiswal and George Gkoutos who were able to join via Skype — made good progress over the course of two and a half days on writing up sections describing the assembly and analysis of the phenotype dataset produced by the group, which includes 6361 Entity-Quality statements describing mutant phenotypes associated to 2744 genes across six well-studied plant species (Arabidopsis, rice, maize, soybean, Medicago, and tomato).

Other recent activities from the plant working group include submission of a grant proposal to NSF-ABI over the summer to fund the continuation of this work. The submission was made to “Advances in Biological Informatics” with funding sought from both the NSF and BBSRC under the “UK BBSRC-US NSF/BIO Lead Agency Pilot Opportunity” program.

IMG_2828   IMG_2832

The plant working group would like to thank the RCN for covering travel expenses for this and previous working group meetings; this funding has enabled the group to work together effectively despite being scattered over a wide range of institutions in the USA and UK.



 Posted by on October 15, 2014 at 10:08 pm

Half-duck, half-crocodile, and bigger than T. Rex: a giant semiaquatic predatory dinosaur

 Uncategorized  Comments Off on Half-duck, half-crocodile, and bigger than T. Rex: a giant semiaquatic predatory dinosaur
Sep 262014

A team led by University of Chicago Phenoscapers Nizar Ibrahim and Paul Sereno have published new findings about the remarkable semiaquatic predatory dinosaur Spinosaurus aegyptiacus in the latest issue of Science.  It has been receiving some nice coverage at NPR and other news outlets.

Workers at the National Geographic Museum in Washington grind the rough edges off a life-size replica of a spinosaurus skeleton.  Credit: Mike Hettwer/National Geographic.

From the abstract:

We describe adaptations for a semiaquatic lifestyle in the dinosaur Spinosaurus aegyptiacus. These adaptations include retraction of the fleshy nostrils to a position near the mid-region of the skull and an elongate neck and trunk that shift the center of body mass anterior to the knee joint. Unlike terrestrial theropods, the pelvic girdle is downsized, the hindlimbs are short, and all of the limb bones are solid without an open medullary cavity, for buoyancy control in water. The short, robust femur with hypertrophied flexor attachment and the low, flat-bottomed pedal claws are consistent with aquatic foot-propelled locomotion. Surface striations and bone microstructure suggest that the dorsal “sail” may have been enveloped in skin that functioned primarily for display on land and in water.

Citation: Ibrahim N, Sereno PC, Dal Sasso C, Maganuco S, Fabbri M, Martill DM, Zouhri S, Myhrvold N, Iurino DA (2014) Semiaquatic adaptations in a giant predatory dinosaur. Science. http://doi.org/10.1126/science.1258750.

Filed under: Uncategorized
Sep 252014

[written by Matt Yoder. Posted by Andy Deans]

Regular RCN attendees Matt Yoder, István Mikó and Andy Deans attended ICIM3 in Berlin, Germany on August 3rd–7th. The congress brought together world leaders in invertebrate morphology for a week of presentations and discussion on the campus of the Humboldt University in Berlin. Logistics were flawless, with ample food and drink to wet interactions (e.g., endless beer and pretzels for the poster session!). The conference was truly a showcase of phenotypes and was fascinating from the standpoint of just seeing examples of life evolving. For those of us interested in semantically describing morphological diversity, the myriad approachs to representing morphology as data was extremely informative and indicative of the challenges we face.

people stand around, talking in a room full of posters

Morphologists talk phenotype, over endless pretzels and beer. Photo by Andy Deans (CC BY 2.0).

In addition to generally absorbing the goings on, Yoder and Deans participated in a eMorphology symposium led by Lars Vogt, one of the PIs of MorphDBase. Deans presented on the state of semantic phenotype representation, with particular attention to its role in taxonomy (Deans ICIM3 slideshow), a follow-up to a presentation and panel discussion from the last ICIM (Deans ICIM2 slideshow). Yoder delivered a talk (http://dx.doi.org/10.6084/m9.figshare.1127970) on behalf of Jim Balhoff et al., on presence/absence inference utilizing Phenoscape KB. Balhoff has written tools that utilize inference to expand the knowledge provided by curators into much larger datasets asserting the presence or absence of anatomical features across taxa. These tools also find logical inconsistencies with curator made statements, and are a great example of a practical approach to computing on phenotypes.

A meeting highlight was the opportunity to see the latest and greatest imaging technologies within a special symposium on advances in microscopy. Speakers highlighted advances in 3D and 4D imaging, with systems capable of generating massive datasets—easily rivaling the big-data world of genomics. Handling these data has become a science itself. It was great to see open-source software and hardware(!) initiatives leading the field in this regard. Stephen Saalfeld’s talk on image alignment was amazing, a presentation similar to that given at ICIM3 is available on Youtube. Pavel Tomancak’s description of light-sheet microscopy using OpenSpim was also inspirational.

non-hexapod pancrustaceans in vials of ethanol

Arthropod phenotypes on display in the halls. Ready access to specimens and hand-blown glass models (see below) catalyzed several discussions about the evolution of form and function in this phylum. Photo by Andy Deans (CC BY 2.0).

Finally, the meeting was flush with opportunities for developing longer term collaborations. The curators of MorphDBase and the recent initiative TaxonWorks spent significant time discussing the possibility of sharing a code-base and thus greatly extending their resources. We hope that this collaboration comes to fruition and that it becomes an important component of “phenotype-handling” in the future.

A special thanks to the Phenotype RCN PIs for supporting, in part, our attendance.

museum case full of boxes that contain glass models of organisms

Glass models of invertebrates, on display at the Humboldt University, in Berlin. Photo by Andy Deans (CC BY 2.0).

 Posted by on September 25, 2014 at 1:04 am

Phenoscape poster at Evolution 2014

 Conferences, Curation Tools, Data Curation  Comments Off on Phenoscape poster at Evolution 2014
Aug 272014

I attended the Evolution 2014 meeting a few months ago in Raleigh, NC, and presented a poster on Phenoscape’s curation effort: “Moving the mountain: How to transform comparative anatomy into computable anatomy?”, with coauthors A. Dececchi, N. Ibrahim, H. Lapp, and P. Mabee. In this work, we assessed the efficiency of our workflow for the curation of evolutionary phenotypes from the matrix-based phylogenetic literature. We identified the bottlenecks and areas of improvement in data preparation, phenotype annotation, and ontology development. Gains in efficiency, such as through improved community data practices and development of text-mining tools, are critical if we are to translate evolutionary phenotypes from an ever-growing literature. The poster was well received and several researchers at the meeting were interested in learning more about open source tools for phenotype annotation.

Filed under: Conferences, Curation Tools, Data Curation