May 272015

How do phenotypic data factor into the issues relating to integrating complex data? Three frequent phenotypers (Ramona Walls, Chris Mungall, and Maryann Martone) were supported by this RCN to participate with sixteen others in an ‘Integrating Complex Data’ workshop organized by the American Institute for Biological Sciences (AIBS) with NSF funding (EF-1450894), on March 30-31 at the Hyatt Regency Crystal City in Arlington, Virginia. The workshop was co-chaired by Paula Mabee, Corinna Gries, and Robert Gropp, facilitated by Kathy Joyce, and observed by various program officers and staff from NSF.

Complex data integration, defined as ‘bringing together data from two or more fields’, is required to address many fundamental scientific questions as well as understanding how to mitigate the challenges facing the planet. Participants (whose research interests ranged from genetics, genomics, metagenomics, systematics, taxonomy, and ecology, to bio/eco-informatics and cyberinfrastructure development) initially discussed specific use cases in which complex data integration was required. They then focused on the barriers that impede integration, recognizing domain silos as major problem at this scale. They illustrated with examples that data discovery and integration are currently hampered by lack of common standards, including those for IDs, representation, ontologies, data formats, data collection, and communication protocols.  The usefulness of ontologies in connecting phenotypic data to other data types across domains was described by Phenotype RCN participants.

Suggestions and next steps required to achieve better data integration were the focus of the second day of the workshop. Community coalescence around shared standards, rather than more standards, was considered key.  Participants advocated for interagency discussions about how to provide linkages across their data systems, thus making data from all sites more readily discoverable and distributing the financial burden.  Participants further recognized that the technical expertise required for complex data integration is high; they promoted cross-training in informatics for graduate students and a higher level of specialist ‘data scientist’ training.  They also felt that funding mechanisms to enable scientists to employ technical specialists for specific data integration tasks would enable complex data integration.  Particularly at this juncture, where cross-domain data analysis is required to address societal problems, participants stressed that it is important to try to solve the immediate problems while working toward long-range solutions.  A full report from this workshop is in preparation and a link will be posted when it is available.

Dec 072014

euroevodevoVienna2014In late July, the Phenotype RCN and Phenoscape co-sponsored several speakers in the symposium “What should Bioinformatics do for EvoDevo?” co-organized by Günter Plickert, Mark Blaxter, Paula Mabee and Ann Burke. The symposium was part of the European Society for Evolutionary Developmental Biology (EED) meeting, held in Vienna. The organizers brought together speakers whose research and perspectives provided examples of how EvoDevo data integration is necessary for discoveries.  Several speakers presented new insights into EvoDevo that were directly derived from sequencing genomes or transcriptomes.   Others showed how by using semantic methods to represent species phenotypes, they could be linked to genetic and developmental data, and the research questions that they addressed. This well-attended symposium met its goals, which were to:

  • promote awareness of new and developing resources and methods as well as EvoDevo uses of existing ones.
  • promote discussions in the EvoDevo community that value input of bioinformatics to EvoDevo questions.
  • invite the audience to share their ideas of how to move the integration forward

The excellent organization of this conference and the wonderful venue helped spark several new collaborations and grant proposals.  Talks and speakers in this symposium included (full program found here):

  1. Bioinformatics for EvoDevo: Connecting evolutionary morphology and model organism genetics, presented by Paula Mabee (University of South Dakota, Vermillion, SD, USA)
  2. Insights into the evolution and development of planarian regeneration from the genome of the flatworm, Girardia tigrina, presented by Sujai Kumar (University of Oxford, GBR)
  3. From the wet lab to the computer and back: A stage specific RNAseq analysis elucidates the molecular underpinnings and evolution of Hydrozoan development, presented by Philipp Schiffer (University of Cologne, GER)
  4. Insights into the evolution of early development of parthenogenetic nematodes by second generation sequencing, presented by Christopher Kraus (University of Cologne, GER)
  5. Petaloidy, polarity and pollination: The evolution of organ morphology networks, presented by Chelsea Specht (University of California Berkeley, CA, USA)
  6. Aligning phonemes and genomes to understand the evolution of multicellular organisms, presented by Philip Donoghue (University of Bristol, GBR)
  7. Online databases provide critical insights into the evolution of appendage modularity during the fin to limb transition, presented by Karen Sears (University of Illinois, Urbana, IL, USA)
  8. Evolutionally conserved mechanisms of regeneration in chordates: Uncovering pathways active during WBR in Botrylloides leachi, presented by Lisa Zondag (University of Otago, Dunedin, NZL)
  9. Phylogenomics of MADS-box genes in flowering plants to identify EvoDevo genes, presented by Guenter Theissen (Friedrich Schiller University Jena, GER)
  10. Illuminating the evolutionary origin of the turtle shell by a comparative tissue-specific transcriptome analysis, presented by Juan Pascual-Anaya (RIKEN Center for Developmental Biology, Kobe, JPN)
  11. Blastodermal segmentation in the milkweed bug, Oncopeltus facsiatus, presented by Ariel Chipman (The Hebrew University of Jerusalem, ISR)
  12. The origins of arthropod innovations: Insights from the noninsect arthropods, the cherry shrimp and rusty millipede, presented by Nathan Kenny (The Chinese University of Hong Kong, HKG)
Nov 142014

Figure3-Revised.copyDo sponges have true tissues? This fundamental question is just one of the controversial topics that Phenotype RCN team members encountered as they constructed a new ontology to describe the unique features of sponge anatomy. As you can see from the diagram below, the team opted to describe “functional layers” of sponge cells, re-using the CARO class ‘portion of tissue’ to contain these layers.

The recently published Porifera ontology (PORO) is an outcome of Phenotype RCN meetings that matched experts in creating ontologies with taxonomists seeking to improve phenotype descriptions and databases. Sponge biologists Bob Thacker, Cristina Díaz, Adeline Kerner, and Régine Vignes-Lebbe teamed up with information scientists Chris Mungall, Melissa Haendel, and Erik Segerdell to generate the ontology from an existing thesaurus of anatomical terms. The ontology is currently being used to allow natural language processing software to efficiently extract morphological characters from taxonomic monographs.

Citation: Thacker RW, Díaz MC, Kerner A, Vignes-Lebbe R, Segerdell E, Haendel MA, Mungall CJ. 2014. The Porifera Ontology (PORO): enhancing sponge systematics with an anatomy ontology. Journal of Biomedical Semantics 5:39. doi: 10.1186/2041-1480-5-39.

Agelas conifera 14Jan06 066 copy 2.jpg

Jun 182014

This spring saw three related meetings (two workshops and a hackathon) aimed at advancing development of the Biological Collections Ontology (BCO) and the Population and Community Ontology (PCO) and developing tools to annotate data using those and other ontologies. The first two meetings were held from February 18-20, 2014 in the iPlant offices in Tucson, AZ, right before the Phenotype RCN annual meeting, and were supported in part by the Phenotype RCN. The third meeting was held concurrently with the 16th Genomics Standards Consortium (GSC) Meeting at Pembroke College in Oxford, England from March 31 – April 2. Additional support for all three meetings was provided by EAGER: An Interoperable Information Infrastructure for Biodiversity Research, RCN4GSC: A Research Coordination Network for the Genomic Standards Consortium, and BiSciCol Tracker: Towards a tagging and tracking infrastructure for biodiversity science collections, with logistic support from iPlant and the GSC.

At the first meeting, ten in-person and three remote participants gathered use cases to help grow the PCO, a relatively new ontology that describes collections of organisms such as populations and communities as well as qualities and processes related to those collections. The PCO can be used to describe any collection of organisms (or viruses or viroids), from microbes to humans, whether the collection consists of one or multiple taxa. During one and half days, we came up with a preliminary list of factors by which organisms are grouped into populations or communities, developed an ontology design pattern for how to describe membership in a group of organisms, defined several new PCO terms for specific use cases, made decisions about modeling challenging concepts such as ecological niche (spanning both only PCO and ENVO), and decided to provide pre-composed terms for those characteristics of populations that are not taxon specific and cannot be defined as derived from individual measurements. In addition, there were many lively discussions about the nature of an organism or population and how our expanding knowledge of the microbial world might turn everything we know on its head.

The second workshop focused on mapping datasets to ontology terms and converting them to Resource Description Framework (RDF), using the BCO, an ontology that describes field-based biological sampling processes and observations, as well as material entities and roles associated with those processes. During another intense one and a half days, 18 in-person and one remote participants coordinated development among BCO, OBI, and ENVO, created a concept map for DNA marker gene studies that led to new terms for OBI and a manuscript submitted to the International Conference on Biomedical Ontologies, and did a first pass mapping of Darwin Core terms to ontology terms. In addition, we mapped three data sets to the BCO, converted them to RDF triple stores, and ran preliminary queries. At the end of the third day, about half of the participants climbed into a van to take part in another three jam-packed days of meetings Biosphere 2, hosted by the Phenotype RCN. Our northern European colleagues were particularly happy to see the sunshine for the first time in months.

To help counteract the pleasant weather in Arizona and follow-up on some of the ideas generated during the workshop in Tucson, we decided to hold a BCO hackathon in Oxford six weeks later. In our honor, temperatures in the UK jumped 20 degrees (Fahrenheit) the week we were there, sparing me total weather shock. The hackathon was smaller (7 full time participants plus a few part time), and focused on generating concrete products. Over the course of four days, we coded an additional dataset to RDF, developed a Material Sample Core for the Global Biodiversity Information Framework (GBIF), created a Web Ontology Language (OWL) file for importing Darwin Core classes and properties into BCO, developed a workflow for converting biodiversity data among formats, prepared an updated version of the BCO for release, and completed a proof-of-concept conversion tool that converts existing RDF outputs to Darwin Core Archive format using an ontology specification. We also took part in several of the main meeting sessions of the GSC and reported on our work to the larger group.

A more detailed report describing these three meetings has been submitted to Standards in Genomic Science.

Submitted by Ramona Walls, on behalf of co-organizers John Deck and Rob Guralnick and all of the participants.

May 302014

Dear Phenotype RCN community.  Please take a moment to help NSF identify priorities for investment in Genome-Phenome research.  These will be translated into funding solicitations relevant to you!

John Wingfield, Assistant Director of the National Science Foundation Directorate for Biological Sciences (BIO), is pleased to announce the posting of a Wiki to seek community input on the grand challenge of understanding the complex relationship between genomes and phenomes.  The Wiki is intended to facilitate discussion among researchers in diverse disciplines that intersect with biology, such as computation, mathematics, engineering, physics, and chemistry. The Wiki format encourages open communication, captures new viewpoints, and promotes free exchange of ideas about the bottlenecks that impede progress on the genomes-phenomes grand challenge and approaches or strategies to overcome these challenges. Information provided through the Wiki will help inform BIO’s future research investments and activities relevant to understanding genomes-phenomes relationships.

To provide comments, ask questions and view input from and interact with other community members, first-time users should sign up for an account via this link: Sign-up.  Once registered, users will be directed to the main page of the NSF Wiki to accept the terms and conditions before proceeding.  Additional guidance and subsequent visits can be accessed via this link: Genomes-Phenomes Wiki.Community members should feel free to forward notice of this to anyone they think might be interested in contributing to the discussion. Questions regarding the Wiki should be sent to

Mar 072014

B2With its research emphasis on understanding the impact of climate change on the environment, Biosphere2 turned out to be the perfect venue for our fourth annual Phenotype RCN meeting!  More than 60 students, postdocs, and professionals from 7 countries participated in this inspirational event, and our expertise was evenly split between biology and informatics.  We were particularly pleased to have the support and participation from the EDEN RCN (6 people), with their focus on understanding the impact of ecological factors on organismal development and evolution.  

The goals for this summit meeting were to (1) understand the bioinformatics landscape of environmental ontologies and vocabularies (What resources exist? What acquisitions and mergers should happen?); (2) find out how (and whether) environment is represented with respect to phenotype in projects and annotation data sets; and (3) determine research that would benefit from the integration of environment ontologies.  We frontloaded this work by initiating a group Google doc prior to the meeting, with the goal to refine it and publish it following the meeting.  Combined with presentations from meeting participants, this activity was surprisingly effective (!), and the manuscript is progressing quickly.  In short, we discovered that the ENVO ontology is likely to be the most widely used and supported, and though it needs to be provisioned with many concepts from the user community, participants felt that it would be sufficient for their needs.  It doesn’t seem that environment has been formally represented with respect to phenotypes outside of the microbial realm (where it is very important), but many interesting research questions could be addressed if it was.  Please let us know if you’d like to contribute to this doc.

Another huge accomplishment: Over a dozen new research collaborations were spawned by this meeting!  We’re still sorting these out, but the RCN hopes to support many of these activities through our Collaborative Exchange Opportunities mechanism.

On the social side, this meeting was very fun!  To the relief and immense enjoyment from those of us from the North, who haven’t seen warm weather in what seems like an eternity, most meals were held outside on the patio of B2. And one dinner was even inside the Biosphere itself.  Some of our participants enjoyed antics in B2, including one who managed to get locked in (briefly)….. The clean and cozy casitas made for great breakout spaces, the fantastic catering kept our minds sharp, the fun and beautiful setting inspired interaction, and the care and attention to every organizational detail (thanks to Kim Land at B2 and Judy Logue for the RCN) made this meeting possibly our best.  Thanks everyone!




Feb 132014

Dear Phenotype Community,

We are heading into our annual meeting next week, where we will be prioritizing our next year’s activities with the help of the Phenotype RCN Advisory Board. If you have an idea for a workshop, working group or collaborative exchange, please send me an email and/or fill out a short application with your idea. See our blog for previous posts by folks who have been funded, and email me if you would like to discuss an idea before you propose. Please get these to us by February 19th.

Thanks! Paula (

Apr 062013

phenoscape image

 Postdoctoral fellow: Bioinformatics, Phenotypes

We are recruiting a postdoc with training in bioinformatics who is interested in studying phenotypic evolution by combining model organism genetic data with comparative anatomical data from throughout the vertebrates.  One of the biggest challenges in systems biology is the inclusion of whole organism phenotypes.  In the Phenoscape group, we have developed ontology-based methods for representing phenotypes of diverse species in order to integrate them with model organism developmental and genetic data. We have collected these data in a sophisticated Knowledgebase, which has an initial focus on the diversity of phenotypes in ostariophysan fish, including zebrafish ( We are currently scaling up our approach to the vertebrates as a whole, with a goal of allowing similarities to be identified between phenotypes from sources as diverse as dinosaur fossils and mouse knockout mutants.

We invite postdoctoral applicants to propose an independent project that uses the Phenoscape Knowledgebase as a research platform.  In particular, we are interested in projects that will leverage functional genomic data to study the evolution of whole-organism phenotype in nonmodel organisms.  Projects may range from primarily computational to primarily biological.

The postdoc will work under the direction of Paula Mabee (University of South Dakota) and Todd Vision (University of North Carolina), as part of a distributed, multidisciplinary team that includes evolutionary biologists, computer scientists, model organism experts, and bioinformaticists.   It will be based in South Dakota, with opportunities to travel to other sites, including the National Evolutionary Synthesis Center (NESCent), the University of Chicago, and the California Academy of Sciences.

Starting date: This two year postdoctoral position is available to be filled immediately.

Required qualifications:

  • Ph.D. degree with strong background in bioinformatics;
  • Preferred previous experience in one of the following: ontologies, functional genomics, developmental biology
  • Demonstrated ability to work in a team setting
  • Demonstrated communication and writing skills, in English

How to apply: Please contact Dr. Mabee ( for inquiries. Applications should be directed to Dr. Mabee and include a cover letter, CV, a brief statement detailing your research interests and career goals, and three letters of reference.  For more information, please see and

Mar 262013

The Phenotype RCN held its third annual summit meeting February 24-27, 2013 at the National Evolutionary Synthesis Center (NESCent) in Durham, North Carolina.  We filled the house once again, with 60 participants from the US, Canada, Germany, UK, Spain and Australia.

The NSF funds Research Coordination Networks to ‘encourage and foster interactions among scientists to create new research directions or advance a field’. Now at the midway point in our five-year funding period, the Phenotype RCN is focused on inspiring research and proposals that use ontology-annotated data to address scientific questions – at the same time as we continue to support coordinated ontology and standards development.  The two primary themes for this meeting reflected these goals: the use of text mining for extracting computable phenotypes from text and the representation of behavior in ontologies, arguably one of the most difficult phenotypes to consider.

The focus on text mining methods to extract phenotypes from the voluminous legacy literature, as well as from current data, was instructive and led to research ideas involving large-scale phenotypes.  To foster communication between biologists (who know the literature in which the phenotypes are embedded) and methods folks (the Natural Language Processing (NLP) experts), we held a ‘Phenomixer’ on the first morning, where groups of 4-5 people rotated among experts who in 2-3 minutes presented their story, answered questions and discussed possible proposals and ideas.  This successful (but exhausting) exercise enabled participants to make personal connections and helped them find collaborators.  Excellent talks that introduced phenotype extraction methods and corresponding driving scientific problems, along with panel discussions, supported this goal.  Most of the participants in the behavior ontology workshop who met the preceding day (see blog post below) stayed for the annual summit meeting where they presented their work and continued their discussions and plans for collaboration.


We were especially pleased with the synergies from this meeting and are eager to support the early stages of collaborations as they form.  Please consider applying for collaboration funds from the RCN.  If you have any questions, please contact us (



Call for Participants: Workshop on New Tools for Studying Phenotype Evolution in the Vertebrates

Jul 232012

What new research opportunities are opened up by the power to compute over phenotype information from thousands of species of vertebrates, particularly when that information is combined with phenotype and expression information for thousands of genes in multiple model organisms? The Phenoscape project invites you to be among the pioneers in opening up this research.

The first release of the Phenoscape Knowledgebase includes over 500K species phenotypes linked to 4,000+ genes from zebrafish, and is currently being extended to capture phenotype data from other vertebrates and linked to phenotype and expression data for other model organisms (including mouse and Xenopus).

We are looking for participants for a small, 3-day workshop, September 21-24, 2012 (to be held in Keystone, SD) who are interested in engaging in creative problem-solving directed at this outstanding problem and initiating collaborations. The ideal outcome would be several collaborative projects whose goals would drive the development of the Phenoscape tool set/interface and would present new and creative ways to deepen understanding of phenotypic evolution. Phenoscape aims to support the initial steps in these activities. We are particularly interested in a broad approach to this problem and welcome interest from scientists with backgrounds in computational and systems biology, mathematics, development, genomics, and evolution.

If you are interested, please contact Paula Mabee or Todd Vision.

