National Evolutionary Synthesis Center, Durham, NC
Friday, Feb. 25, 2011
Twitter feed for meeting notes: #phenorcn
8:45 – 9:30: Introductions and welcomes from NESCent and RCN PIs
NESCent welcome: Allen Rodrigo, Director; Todd Vision, Associate Director of Informatics
Paula Mabee: Introduction to the RCN; vertebrate working group
1. Develop a community network:
- annual summit meetings
- working group meetings
- collaborative exchanges
2. Make phenotype data interoperable:
-High level anatomy ontologies for all 3 key taxonomic groups:
Vertebrates- Paula Mabee
Arthropods- Andy Deans
Plants- Eva Huala
Informatics- Suzi Lewis
Project coordinator: Erik Segerdell
3. Outreach to ancillary phenotype groups
4. Meeting objectives: (see slides…)
-Align and synchronize anatomical ontologies
-Define standards, publish manuals
5. Meeting outcomes:
- list of participants for future workshops
- list of resources
- working doc for AO integration
Vertebrate phenotype ontology data and resources (See slides)
Eva Huala: Overview of plant ontologies and projects
Plant Working Group PI
PI from TAIR (The Arabidopsis Information Network)
Needs and Approaches: Ecology, Systematics, genetics, Genomics
Resources: Plant ontologies: Plant Ontology, Crop Ontology, Trait Ontology, OBOE, Plant Functional Trait Ontology (Traitnet)
High throughput phenotyping coming up soon – many groups doing this in Europe, Australia, and elsewhere
-Gigavision (http://www.gigavision.net/) Justin Borevitz?
-Plant Phenomics – High throughput Phenotyping (http://www.plantphenomics.org.au/)
- also should include: SoyBase
Questions: one plant ontology? interoperability? common data formats? tools/methods?
Andy Deans: Overview of arthropod ontologies and projects
Arthropod anatomy ontologies – include SPD, HAO, Drosophila AO (FBbt)
how to align arthropod ontologies?
develop a common arthropod anatomy ontology
purpose: extend into other lineages
distribute best practices
Suzanna Lewis: Overview of ontology and software development
Infrastructure and tools
- Develop shared semantic standards
- Tools for Data capture and Curation
- Building an integrative infrastructure, efficient querying,
9:30 – 10:30: 5-minute lightning talks
- PI of Generation Challenge Program
- Generation Challenge Programme (GCP) of the Consultative Group on International Agricultural Research (CGIAR) now focuses on seven trait–crop combinations.
- crop trait ontology developed to assist with data mining
-Co-PI on TraitNet
sharing of ecological trait data
trait ontology for plants – collaborations with NJIT, NCEAS, CNRS
Tools: Protege 4, OWL, OBOE plug-in
Needs: how to deploy ontologies (triple stores)
OBOE: Extensible Observation Ontology
AgBase – ontology users
Model functional genomics datasets
Chicken, other archosaurs, crocodilians, turkey, zebrafinch, cow, pig,
For plants: Maize, cotton, miscanthus, pine, poplar, rice and soybean
Utilizing the Gene Ontology terms and some PO terms for maize and cotton
biocuration interface in house
roadblocks: annotate increasing number of sequences from new sequencing technologies?
Vision: employ multiple ontologies
- plant research and tool developer
Tolkin: web application for biodiversity data
RegNum: repository of phylogentic definitions
HERBIS: autoamted data capture
WebApp for phenotype ontology – ontology builder and harvester
Four types of phenotypes: characters, ……?
Used Phenex to develop app
Bottleneck: people understanding value of ontology, dissemination, use
Providing tools won’t work: need cultural shift
Laurel Cooper – Oregon State University – Corvallis
- Plant Ontology project coordinator and curator
- Describes plant structures as well as plant growth and developmental stages and working towards encompasses terms and annotations from all plants.
- Plant Ontology expanding to cover all plants
- Facilitate consistency in annotation
- PO as a reference ontology for all plants; interrelated with Trait Ontology
- Provide mappings to other plant ontologies
- Challenges: expand cross-references to PO terms from species-specific plant ontologies
- Expand annotations in PO, plant-specific phenotypic descriptors in PATO
Tools: OBO-Edit, AmiGO, OWL, Protege, Phenote being developed
Avian phylogenetics, biogeography; interested in ontologies as entities, e.g., species, homology
tools: not formally used them
roadblocks: for bird ontologies, initiating it , need homologies
Vision: get bird morphologists together to assemble bird characters
Quentin Cronk, University of British Columbia – Canada
Evolution of plant development
Ontologies to dissect complex traits in detail: e.g., evolution of woodiness, bundles of traits
Challenges: moving from MODs to complexity of comparative biology
For using PO: challenge is to move towards the complexity of comparative biology
Vision: connect gene networks to phenotype networks (space, time, development, evolution)
Wasila Dahdul, NESCent – Durham
-Linking evolution to morphology
-Representation of complex phenotypes using ontologies
-Lead curator and anatomy ontology editor for Phenoscape
-Phenex: Annotation tool, translates matrix-based data to EQ
-Roadblocks: annotation vs. curation; collaborative ontology editing and development, need to streamline
Chris Desjardines, Broad Institute – Cambridge, Massachusetts
-ontology consumer; phenotype to genotype
-staff scientist at Broad Institute; keeping up with genotyping linking to phenotyping
-how high-throughput phenotyping can be linked to high-throughput genotyping via ontologies.
-ability to share phenotype data across organisms will be critical
Nico Franz, Puerto Rico
weevil taxonomy => adopt HAO terms and tools for beetle revision
implement ontology-compatible practices to increase semantic precision of taxonomic revisions
meanings of taxonomic names: synonymous names may have same meaning or not, same name could have different names;
why taxonomic community behind Biological tax and ontology development?:
Franz and Thau 2010: https://journals.ku.edu/index.php/jbi/article/view/3927
10:45 – 12:30: 5-minute lightning talks
Peter Midford, NESCent
Ontologies for behavior (inferring using ont terms and generating extensions to ontology),
Teleost taxonomy ontology,
Matching phenotypes (Washington et al. 2010 paper)
Taxon vs taxon, gene vs gene, taxon vs gene
Wishlist: Ontologies for behavior up to speed
-Solanaceae Genomics Network (http://solgenomics.net/) which is a database for the Solanaceae and related Asterids, many of which are being sequenced, including tomato, potato and pepper, and for which many phenotyping projects have been initiated.
SGN: solgenomics; for plants solanaceae and asterids
Genome to Phenome problem with Solanacae as a model (G2P)
-Community curation model, users login and curate
Tools: OE, storing with Chado cv module
github.com/solgenomics, gmod_load_cvterms.pl (updating ontologies), loading scripts
In house: Solanaceae Phenotype Ontology (SP) – used by breeders, mappings to PO
Roadblocks: bringing together wide, diverse community, need ontologies for breeder expts and plant growth conditions
Cyril Pommier, INRA, France
Ephesis, GNpIS and Ontologies needs
Ephesis: Environment Phenotype Information System
expression, polymophism, genomes, phenotypes…integration
Ontology Pros: want to use EQV but which app?
Ontology Cons: huge ontologies
Cyndy Parr, Mus. Nat. History, Wash. DC
SPIRE- ecological; semantic prototypes, Environment Phenotype
- Encyclopedia of Life (http://www.eol.org/)
- Lepidopteran morphology and life history ontologies with Leptree (http://www.leptree.net)
Natural history ontologies for Animal Diversity Web (http://animaldiversity.org)
SPIRE (Semantic Prototypes in Research Ecoinformatics) http://ice.ucdavis.edu/project/spire
Why ontologies? For spire, need reasoning: e.g., what are body weights of mammalian predators of invasive fish?
Tools: Protege, Swoop, Triple Shop, hand-coding, RDF 1-2-3
Roadblocks: SPIRE: availability of other data, LepTree: tool dev, sociology, EOL: prioriites
Martin Ramirez, Argentina
Spider TOL curation of image collection
Use ontologies to parse freetext; ontology annotates images; organizes workflow, documentation
Tools: OE, CVS-OBO foundry repository on SF, Visual Basic, …many
Roadblock: files, file formats…ontology modification doesn’t propagate across apps and files, maintenance
Vision: community maintenance of ontology; connect evol. with genomics (link the worlds); reasoning on morphology: correlations, associations, enrichments; make systematics more efficient
Erik Segerdell, Oregon Health & Science University – Portland
- Former curator for Xenopus Anatomy Ontology and was involved in the development of the Zebrafish Anatomy Ontology (ZFA)
- Eagle-i: protoype of biomed research resource discovery network: laboratory resources, services, instruments, people, reagenst, strains, lines
Lineages of tissues, develops from, and start and end stages
CMAP- create diagrams
Rosemary Shrestha, CGIAR
- Crop Ontology; developers and consumers of ontologies
- Developing multi crop-specific trait ontologies
- Collaborating with the PO and TO
- Roadblocks: too many databases remotely distributed; defining protocols among breeders; dispersed community; no annotation/curation tool; no semantic web approach; not very attractive products for donor
- Vision: data flow together
Chris Smith, San Francisco State University – California
-Ants; polyphenism; single ant embryo can dev into 3 radically different organisms – epigenetic
-3 to 7 castes, epigenetic influence on genome, Juvenile hormone
-knowledge on ‘covergent’ behaviors in ants and bees; genomes, literature on behavior, assays…
-behaviors can be linked to SNPs eg, Africanized bees vs European honeybees
-Social Insect Behavior Ontology (SIBO) – anatomy, CHEBI- chemical language
New NIH grant: ant genes and human mutation for gregariousness; social interaction candidate genes
Roadblock: explaining ontologies to nonexperts
Goal: Natural language capture, dissect genetics and epigenetics
Cynthia Smith, Jackson Lab
-Mammalian Phenotype Ontology – 8K precomposed terms, used to describe the Mouse Genome Informatics (MGI) which catalogs mouse genetic mutations and the descriptions of the phenotypes that result from these lesions.
-supports phenotype annotations at varying degrees of granularity
-tools: use OE; RGD, OMIA, Europhenome
-online tools: PhenoGO, PhenomicDB, PhenoHM, MamPhEA
-curation driven approach: terms added as required by curators;
systematic review and comparisons to other ontologies
-annotate phenotype data generated in high-throughput screens, various centers throughout the world doing the phenotyping
Goal: bring all mouse resources together under common language
David Osumi-Sutherland, Cambridge
-Ontologist: FlyBase; Virtual Fly Brain; Vertebrate Bridging Ontologist
Why use ontologies: standard for annotation; query-able store of information
How used: reasoners + equivalent class defs to maintain multiple axes of classification;
- maintain logical consistency; use reasoners to drive queries on VFB site (~30% of the relations are inferred, easier to maintain), disjoints easier for reasoner
Tools: lots; see slides; too many tools: wish list to pare down;
Roadblocks; need better OBO/OWL conversions and reasoner tools
OBO1.4 + obolib already does most of what is needed
Reasoner scaling: limits of OE reasoners?
OWL reasoner scaling issues when classifying large numbers of neurons
Vision: expert review of ontology
code: collab dev OWL-API based tools; RDB-based ontology dev system
ontology: multispieces integration, pan insecta/pan arthropoda
Pantelis Topalis, Crete, Greece
- Developing ontologies for vector borne diseases
- Anatomy of mosquitoes and ticks; TGMA, TADS
- Insecticide resistance ontology (IRBase); malaria ontology (MO), MIRO?, extension of the Infectious Disease Ontology
- Annotation of gene expression expts (VectorBase)
- Follow OBO foundry principles; use CARO and BFO
- Ontologies used in Decision Support Systems regarding infectious diseases
Roadblocks: bridging scientific jargon and ontological logic; CHADO needs natural diversity
- Future uses of various ontologies: cross-products for description of phenotypes; expand to agricultural domain; template for new VBD ontologies
Todd Vision, NESCent
Dryad Digital Data Repository
- Interested in how to best guide individual authors in depositing phenotype & trait data associated with individual publications, when it gets submitted to the Dryad data repository, to maximize re-usability for larger data integration efforts
Marvalee Wake, UC Berkeley
- User/consumer of Amphibian anatomy ontology
- why? Integrated development of robust anatomical, tax onotology as a means of organising and categorizing info…
tools: OE, excel, etc…still adopting!
roadblock: time, funding, gathering community interest, support and imput
single ontology or 3 for major lineages?
fitting taxa into tax ontology
Problem: not enough time and money
Vision: Ontology vision that encompasses all issues, new tools based on familiar tools (eg. social networking, Facebook), easy to update
- Can societies be used to promote ontologies? not financially supportive but encourage AO developers to develop software
Marcelo Weksler, Brazil
Mammalian AToL; 4550 morphological character matrix; scoring for 100 species, incl. extinct
-Using ontology rules to organize and ID problems in matrix
Producers of ruled data to use in ontologies; images, old literature; done in Morphobank
Goal: assess the global phylogeny of mammals
Roadblock – manual input; need understanding of tools and how to relate this work to ontology community
Ontology rules create ontology maps; to find more morphological features
Interested in homology problems; building ontology relationships (part_of, EQ) between morphological structures and their phenotypic variation components (character states).
Mark Westneat, Field Museum, Chicago
Interested in visualization (big phylogenies, ontologies- similar problems)
Connect existing AO and behavior with physiological and biomechanical knowledge
Natural language still important and should be embraced
Phylogeny of Pokeman; developed in Mesquite
Megatree of fishes: iPlant visualization tool – for large phylogenetic tree of fishes
Food web network
Shawn Winterton, California Department of Food & Agriculture
Taxonomist; frustrated with current taxonomy paradigm
Neuroptera Anatomical Ontolgoy (NAO) – lacewings
Needs: interoperability; character matrices, describe new species
Lacewing digital library, want to develop ontology, Common Arthropod Ontology
Vision: storing data once and then reused often
Paradigm shift in tax towards ontology based description
Ontology – character matrices- NLP
Interactive atlas- mouse over
Matt Yoder, North Carolina State University – Raleigh
- Biodiversity informatics and hymenopteran systematics, Hymenoptera Anatomy Ontology (HAO)
Lead developer of ‘mx‘, which is used to manage the HAO, among other things.
mx – one stop for working with data -> publishing; collaborative; connects to numerous software and file formats; Anatomy is hard
Phenotypes tied to specimens (not names), come up with concept of species based on that; concept becomes quantitative;
Want to help open source coders and AO development
Need help with linking tools; ontology request broker
-Discovering AO by automated reasoning on phenotype data
-Inferring anatomy ontologies : derive the AO of a taxon
given AO of model taxon, set of observation data
Morphster Ontology – metamodel for sys bio, Each change should reflect change in starting ontology
Character Type Taxonomy created: Neomorphic, transformational, meristic, etc…relational
Signatures of character types
Evaluation: evaluate ZFA by auto-generating it, starting from TAO and CTOL data
Evaluation criteria: what is incorrect/missing in ZFA? and TAO-ZFA synonym list
To learn: what more can be inferred using character data? and how can inference needs help identify good/bad practices in anatomy ontology development
Melissa Haendel steps in for Gary. – “About Ontologies”
- Has intro ontologies talk in her back pocket
- Ontology is a human representation of a domain of knowledge
- Working on Eagle-I with Erik, in the library
- Applies to people in the room who don’t have a lot of experience with ontologies
- Goal: get people to a conversational level
- premise: it’s hard to get things out of free text/natural language, uses the “large bones” e.g.
- common vocabularies- allows us to search in meaningful ways (in data world, use common URIs to link across semantic web)
- presents the controlled vocabulary spectrum Glossaries->taxonomies->ontologies
- end point is we can infer a lot of information, using design principles
- So why do it?
- part_of example, gathering additional data
- “ontology is a classification vocabulary” – “the study of being” – a human representation of a domain of knowledge, heirarchical arrangement, and relationships among them are also defined
- well known ontologies – SnoMed, Foundational Model of Anatomy, Gene Ontology, Linnean Taxonomy of species
- Beginner’s guide: “an x = y that has one or more differentiating characteristics”, basic subsumption reasoning
- the concept of “disjoint” – something can be blue or red but not both – VERY helpful to annotators
- subsumption reasoning- the simplest type, from cat -> entity
- Directed Acyclic Graph- must be acyclic
- idea of different types of hierarchies (subsumption, partonomy, directed acyclic graph etc.)
- Ontology classes, instances and relations are stored as sentences: subject–property–object
- Relationships formalize elements of a definition, this is necessary for computer use
- introduction of the idea that there are different formats (OWL/OBO)
- the “True path rule” ** <- important -> “the pathway from a child term all the way up to its top level must be universally true”
- idea of transitive properties -> how do they propagate up/down
eg. cat is_a mammal is-a vertebrate
- Terms should have the same meaning on every occasion of use <- not sure I agree with this (class/label distinction), “univocity”
- order of assertions matters, relations always mean the same thing
- joel asks: “are the world ontologies in english” <- you can specify the language that your labels are in, the definition in the term is what that matters- it’s both logical and text for the user
- IMPORTANT -> right off the first confusion is the idea of separation of concepts and labels -> something that OBO confuses (slightly), and that is fundamental to building an ontologies
- SPORK classification:
- the role of multiple inheritance -> you can classify things multiple ways, by sensory modality, by the physical relationships, by which neurons are fired etc.
- avoid asserting, should be inferred
- when we make many insertions it becomes difficult interpret results?
- what if the cross classifying ontologies are very different?
- use different ontologies to classify a single “thing” – Brent -> hinting at questions about making phenotype statements (tying multiple ontologies together) -> referencing as “differentia” -> leads to multiple ways of classifying
- presents the landscape of anatomy ontologies (see slide), OBO Foundry Orthagonality of Ontolologies
- presents the idea of Phenotypes = Entity + Quality -> example of character + state
- what’s the difference between “annotations” and “phenotypes”? They are not the same thing. Important-> an annotation has 1) evidence and 2) attribution
- Question -> “what’s the difference between a phenotype and a “piece of anatomy”?
We can have many different types of phenotypes -> genotype + environment -> organism.
Chris says – see Robert ?dorfs paper from a couple months ago. ??
2:30 – 3:30: Anatomy ontologies and discovery: reality and possibilities
Monte Westerfield: Using phenotypes to find disease genes
Discussing the work done in the Washington paper, and an update on PATO.
(Washington, N. L., Haendel, M. A., Mungall, C. J., Ashburner, M., Westerfield, M., and Lewis, S. E. (2009). Linking Human Diseases to Animal Models Using Ontology-Based Phenotype Annotation. PLoS Biol, 7, e1000247)
- A collaboration among two model organism databases ZFIN and Xenopus
1) what was the biological problem (methods to describe phenotypes)
2) how do we compare phenotypes
3) how do we search/do proof of concepts
- Premise -> mutations in homolous genes happen in diverse organisms, can we use inference on phenotypes across organisms to find underlying genes
- previously, you made comparisons across genes (blast)
- we need a “Phenoblast” method linking phenotype<->phenotype
- OMIM- free text disease description resource in NL <- computationally difficult
- adopted the EQ syntax, the source being different ontologies + PATO
- used EQ syntax to make annotations across organisms
- strategy -> use shared genes as proof of principle (OMIM, ZFIN, FlyBase)
- set up a single blind curator -> currated 11 genes for different organisms + human
- put the data in OBD
- results in a lot of data that requires a quantative approach to exploring
- PROBLEM: not all curators are created equal
- but ontologies can come to the rescue- ontologies keep contrasting concepts in an ontology, thus they can be “normalized” based on that ontology
- idea of assigning distances of annotations from “true” concept
- introduces the idea of information content -> how high is it? how often is it used? – can generate similarity scores
- so were annotators really coding things differently? Not really when the statistics are run -> can use as a metric to focus future efforts in training/clarification etc.
- premise mutations in one gene (intragenic) should result in phenotypes that are more similar to each other than intergenic mutations (and they are)
- extend this idea to pathways (similar phenotypes retrieved genes that turned out to be in the same pathway.
- Take home- phenotypes retrieved homologous genes across species.
- goal is to expand the process across multiple organisms
- EQ<->MP translation -> OBD allows the addition of many additional datasets
- Question -> how difficult was it to map between ontologies
- Can machines relate human eyes to drosophila eyes -> “this is a difficult problem” some anatomy we can make a statement of homology, some we can’t
- IMPORTANT- do not put homology into the ontologies at a base level -> you code that into the data after the fact in conjunction with other data
Chris Mungall: How to build interoperable anatomy ontologies
- works in partiuclar on gene and cell ontologies, also the PATO project, mouse, plant, phenoscape
- challenge – (fun and interesting things to do, but there are some annoying problems)
- there are multiple ontologies out there, unfortunately many have been built in an uncoordinated fashion, therefor it’s difficult to provide a unified whole
- solutions exist!
- uses a case study with GO/CL to illustrate the issues -> the goal being similar work to what Monte discussed
- reviews ontologies -> relationships with properties that you can compute on, entities are grouped to classes <- key; relationships are relationships of ALL MEMBERS of the classes
- ontologies themselves aren’t smart, they are all based on deductive logic (a narrow way of thinking)
- even simple statements like “chromosomes exists in the nucleus” are not true in ontologies because of this “narrowness”
- to get around this you make distinct sublcasses, or make statments in the opposite direct (every nucleus has chromosomes)
- What’s out there? – Human, Model Orgs, Domain specific (neuroscience), Cross-science
- FMA – no development relationships or embryonic structures; a peculiar language; it’s very strict (single inheritance), no computable definitions; heavily precomposed (pre-coordinated) – “something about a very specific part of a toe”
- strengths – extensive spatial relationships
- contrasting model organisms -> no single use envisioned, designed to be very general purpose
- contains a very deep hierarchy
- contrast this to Model Organism Anatomy Ontologies (which share some different core principles)
- * designed with a purpose, then they branch out, take a more practical approach rather than embroiled in a serious formalism ? missed something about definitions
- there are also multiple species ontologies, upper ontologies, developmental ontologies, domain specific ontologies
- Problem – not developed into a shared framework (besides general buy-in to CARO) -> constructed under “massively different principles” -> leads to a waste of effort (heart in mouse vs. heart in human)
- so howto get them to work together -> look at GO, then CO
- GO covers all life, across different scales -> some things aren’t very clear, e.g. “blood” across taxonomic levels means what?
- how does it deal with taxonomic variation -> doesn’t need to at some levels (e.g. cells/nuclei) because it excludes some statements (safer to make statements going up the tree vs. down the tree…)
- doesn’t use taxa in differentiating statments (there is no implication of homology here)
- some exceptions for readability inject organism names (“fungal-type cell wall”) … but labels are just labels, the underlying concepts/definitions are key
- “limb development” and “wing development” <-> labels that are hard to interpret
- cites Kusnierczyk, W, JBI something
- cross ontology connections b/w NCBI/GO – uninion classes, “only in taxa”, “never in taxa” (in taxon_not Aves, i.e., the complement of that taxon).
- this can help clarify some of the vagueness in GO terms, and use to detect mistakes in the ontology (like lactating chickens <- not so much)
- with lots of diversity (scope) comes with lots of work … so parts just aren’t worked out as a reflection of this, and development is ongoing (in spurts?)
- proposal (because GO is just too big….)- subclassing GO to assign work at different levels (exploit taxonomic links to aid in this- filter out say, just insect terms (e.g. eye development) so that the mammalian groups just have mammalian eye development terms to use
- another example – Cells – “massive overlap” in different ontologies, there is no single unified view of cells – proposal to combine everything into one, another to subdivide work into taxonomic approach
- so it all comes down to lumping/splitting (just like taxonomy)
- split and maintiain ids, then merge and review
- when you recombine in above you have multiple names, that make things difficult
- this is essentially a label/concept problem <- people fail to realize labels don’t matter (still)
- so in the aggregate view [how are we getting this aggregate view??] you dynamically re-write names (strictly for pragmatic purposes, the underlying data are linked via ?ID?)
- Lessons for gross anatomy – keeping terms in certain ontologies but dividing work -> really need to clarify terms/concepts divide- labels merging up or down is an application level(?) problem -> parallel nomenclature built in to ontologies of ontologies
- Greg -> points out the problem with “labels” vs. concepts
- Mark -> points out that the merge/visualization slide, suggests a mapping that visualizes the overlap among ontologies
- renaming in aggrecate view -> what’s the status? “very new here” where do we find this aggregate view? Could be used in a phenotype generating application, any view that your looking at multiple widely distributed taxa -> concepts of disambiguation
- Brent -> this is what a monophyly ontology might look like
- Chris has looked at the logical underpinnings of homology ontologies- underlying question-> do we want to have homology ontologies, or a set of overlapping relationships among the classes (yes) that assert homology
- attempts to be neutral about homology, make this on top of assertions
- challenge -> dealing with multiple different assertions of homology (David)
- there are different levels of homology assertions, some are trivial, some have big gaps
- perhaps a fluid conversion is possible -> programatically showing different views
- eye-eye homologous at different levels
- homology from Joel too -> you want to talk about “sameness”, homology is similarity at a certain level !! the “at a certain level” is an additional bit of information
- David brings up the “homologous_to” relationship
- a network of homology statements among ontologies (David)
- the answer is probably going to be a mix
- a pluralist start
- Brent -> importance is that we want to be able to separate statements- which ones are homologous?
- David -> !!! being really clear about what your reasons for classification are !!!
- brings up problems like “fused bone” <- statements of homology
- Martin -> when we talk about homology we mean “problematic homology”,
- David -> define your class structurally -> this may (or may not) have overlap
- we need to be clear what we mean when we define the term
4:00: Visions: What would you like ontologies for?
Brent Mishler: Ontologies as research tools linking phylogenies, systematics, phenotypes, and genomics
- talking about homology a little
- does not accept that you can build a theory neutral ontology and reason later (at least presents this as an argument) BUT WHO EVER SAID IT WAS?…
- two organizing forces in biology (current function/history -> aka homology)
- interplay of these two forces is interesting, neither one by itself
- comments on the “annotation” process -> experience from the moss
- had huge arguments about *naming*, and also “similarity” -> how to resolve this? (it’s names, who cares),
- interuption -> naming genes is classification (GO) <- GO is not about naming genes, it’s about organising function
- GO covers three domains <- that aren’t agreed upon by GO practicioners
- wants to add a historical axis
- “if you solved genes phenotypes would be easy”
- homology -> can reside at any level, requires historical passage of information from ancestor to descendant
- two subcategories -> paralogy, orthology
- “sequence homology” is similarity
- homology is a yes/no thing, not a sorta
- provides examples of paralogs and orthologs
- slides on “names of genes” -> what does it mean to say I have the “same” gene, in a phylogeneti sense, in two different organisms?
- approach -> analyzes genes in absence of nomenclature
- brings in the phylocode -> designed to name clades in big taxonomic groups, a name to rigourously point at a clade in a tree
-> phylocode -> names of genes -> name for branches in a gene phylogeny -> could use these names to do interesting comparative analyses (how to leap from name of a gene to underlying data)
-> names outside and inside triangulate to a node
- a phylogenetic classification of genes then comes from identifiers that are attached to nodes
- goal is to talk about phenotypes -> “morphology is not as problematic as genomic data”
- could have names for phenotypic characters associated with particular names on nodes (again names don’t matter, so is it then ontological classes at nodes?)
- classification could be based on – development location, function, history, structures -> in plants it get’s very hard to classify things without the addition of new criteria
- so he is abstracting labels (like “leaves” to characters which are then tested?)
- for a true homology based ontology there will have to be a link to a tree (this we can agree on)
- RegNum -> a phylocode database for clade names introduced
- Questions -
- Is there an example of a gene name that is phylogenetically derived? – makes distinction b/w computer vs. NL names
- so it’s gene +/- at each node- what about individual nucleotides…
- Have you diagrammed an ontology based on this concept?
- So it seems this is a nomeclatural issue, not a ontology concept development issue
- Question is raised – what about absence of genes? What about genes that get lost multiple times (plant people, always with the problems) – defaults to absence of characters
- David brings up “labels first” or “statements first” , neutral vs. hard coded identifiers (neutral identifiers are nice)
- Brent – “You need to know/keep track of what happens when you add more stuff”.
Andy Deans: Describing biodiversity and determining species
- PI of HAO
- reasons for why we want to develop a robust ontology, and how we can apply it
- coming from hym perspecitive, but parallels to others
- outlines the taxonomic process, example of Linnaeus, 1758
- shows the long history of taxonomy (hundreds of years)
- four authors rediscribed the same taxon
- now we go back to the original specimens, and redescribe things, in this case much finer detail
- upto 42 annatomical parts referenced
- hundreds referenced in fly descriptions
- queried taxacom – 42 pages long
- today- we have analog descriptions across thousands of journals
- points out that some practicing taxonomists don’t use descriptions
- takes us to the advent of the WWW-> how to revolutionize taxonomy
- lots of new digital resources/databases/approaches -> Zoobank for names etc., lots of new resources
- points out that people say the same thing different ways
- proposes a new paradigm -> derive descriptions from phenotype ontologies using common syntax
- playing with the E-Q model-> is it applicable to taxon descriptions
- use what labels you want, reference a class
- why ontologies? ->
1) combat concept drift
2) exploit logic queries for logic (e.g. keying out among hundreds of thousands of species) – query in gene
3) exploit logic for knowledge discovery -> the unknown knows -> the ones we don’t know we don’t know
Question -> about color, seems problematic (it is-> requires some knowledge of preparation etc., essentially a history of the specimen) … but it works, often
Cyndy Parr: Challenge of Semantics for the Encyclopedia of Life
link to slideshow
EOL -> brings up the issue of scale, all species known to science/on earth
- gathering summary descriptions across biological domains
- species level taxa and higher
- anything you might know about an organism
- requires creative commons licenses
- availabe in a single place
- empahsis on quality
- rapidly growing
- EOL is a “content curation community”
- takes data in from say IUCN, GBIF, BHL, DBs, Journals, Public => curation => EOL
- color-based indication of data quality
- EOL now allows multiple classifications to be displayed, based on user’s preference.
- data consist of objects sorted by topic
- links back to original source of the data
- curation functionality is object by object -> provide a rating (how good it is) … IS THIS DATA ALSO OPEN? can we see curator bias?
- mentions wikipedia- inconsistent/not very computable
- wikipedia articles are brought in to EOL<- they get curated data back “not in competition with wikipedia”, providing 200-300k pages – eol not getting the kind of scaling they expected.
- into scaling -> almost 3 million pages (one or more per taxon), as best they can per taxon
- they have 2 million data objects -> so not all pages have data objects
- 500k pages with objects on them (“still better than wikipedia”)
- 100+ partner databases
- they are starting to measure richness -> what makes a page rich? does it need more work?
- 700 people signed up to curate/ 1000s of contributors
- challenge -> we have to be able to describe things that range from diatoms (small) to humpack whale (big), to mushrooms (tasty) to moths (scaly), to bacteria (really small)
- proposing: to capture this level of diversity in an ontological standpoint is not really feasible for a single ontology [ made network slide http://nodexl.codeplex.com/]
- extended networks illustrated – TOL, AKN, OBIS, COL, WoRMS, BHL, BHL-EUrope, GBIF
- to share data there has to be some level of data integration, therefor EOL uses a coarse level
- 33 subjects (TDWG Species Profile Model), but this is sufficient to put most data in, working with TDWG, perhaps, to extend Darwin Core
- no numeric data, minimal controlled vocab, API is present
- can start generating statistics now, quantities of text, reviewed objects etc.
- public and machine interfaces
… but wait, that’s all EOL v. 1
- EOL v. 2 has “vastly improved community interaction”
- v. 3 – ??? -> some push to providing useful data for science; wondering how semantic they can get and what is worth doing.
- Looking at “Rich page calculations” – stats include “info items”, “images”, media, word count etc.
- These species page calculations are stored as key value pairs, so perhaps it could be extended to EQ or somthing similar and stored-> UNIT Label URI type level data
- mentions there is no provinence
- if EOL had this information what would they want to do with it?
– perhaps index their data by taxon, improve discoverability, sort into SPM subjects
– idea is to hide semantics from general user
- Down the road – serve data by API/query interface – “Give me info on rodent elbows [for rodents that like bananas in Cuba]”
– make the whole page semantically browsable (LOD: linked open data) -> Taxon Text Blobs, Character data, Metadata
- EOL could be a consistency check system
- Brent brings up the question of “provenance” – serious science requires this
– providers can wreck this by not providing good enough information
Eva Huala: Anatomy Ontologies from a Model Organism Database (MOD) perspective
- director of TAIR > MOD focusing on genes and their function
- 9500 updated users (21k total)
- plant biologists of all types
- how does TAIR use phenotype ontologies now?
– by annotating expression pattern
– includes a locus code, gene name, PO term, and reference
- More could be done -
- They really want standardized mutant phenotype descriptions
- wanted to search, categorize, infer, and ?
- kinds of questions that could be asked-
- Which genes control seed size in Arabidopsis?
- could subselect the types of data that we are interested in (chose E, Q, Species, Data Type), this doesn’t require comparison across species
- “Is seed size controlled across taxa” -> ties in multiple ontologies
- “Which tomato gene can I mutate to give me smaller seeds?”
– ways to work around this using other species (look for mutants in Arabidopsis-> find genes, and alleles responsible for seed size, -> blast search -> may control size -> test)
- If you can search many different databases at the same time you can find many candidate genes -> truly interoperable
- Barriers? -> curation efficiency -> its a lot of work, and very expensive to have done by professionally trained curators, and there is little $ for it (historically) -> still legacy of lack of interoperability
- Current status in TAIR – 2238 sequences genes with free text phenotypes; Additional 1755 unsequenced mutant loci with phenotypes; 4690 images
- How to get more data? -> collaborate with plant journals, try getting data directly from authors, now span 11 journals -> convinced the authors to send their data to TAIR during the editorial process -> around 50-60% of authors are doing so, which is being done currently with an on-line form
- some smart functionality built into the forms or submit new data
- What’s that cool Article Navigator ?! -> there is need for a literature curration tool, with picklists and easily composed annotations
- Vision -> have phenotype data that is contributed from a variety of fields of biology -> Many questions can be asked from relatively “simple” starting points (e.g. leaf shape)
Question -> Journal acceptance?
- Foot in the door and dance approach, you have to ask to get things initiated
- Do curators check author submitted data? YES -> So is the author data useful? Not really-> but the tools/application *is* making things better. (Applications save the day)
Help from publishers? -> Yes, put author instructions in, allows for followups on why data are not submitted. Cool- people who go through the process may submit data even though they don’t publish in the journal. Evidence of a culture shift.
Cynthia Smith: High-throughput phenotyping of mutant mice to identify models of human disease
- we can find mice that exhibit similar phenotypes in mice and humans, but how is this linked up?
- creating new models for study of human disease
– attempting to generate mutations in every gene in the mouse, then categorize the phenotypes in those mutants
- European Mouse Programmes – EUCOMM -> creating a lot of knockout/target mutations, EUMORPHIA -> standardization of procedures or phenotypes can’t be compared
EUMODIC -> European Mouse Disease clinic
EMMA -> Archiving the mice
- 20k gene trap and target null/conditional ES lines – library archived
- 538 mouse lines generated and re-archived
- complementary lines in other countries
- coordinates, ensures there is no duplication of effort, and tracks where things are in the pipeline, and info on their goals
- develop a comprehensive phenotyping platform able to deliver data
- developed EMPRess -> on standarizing screens
- a website with *many* details on how to do the screens -> howto do blood chemistry etc.
- developed pipelines, a battery of tests that you can run one after another on a mouse (mice ain’t cheap)
- possible to do *many* different tests for a wide variety of phenotypes (behaviour, morphology etc.)
EMDC -> where it all happens, 4 centers doing the first screen then passed onwards if necessary
- sent to centers “where mice are made”
- there is a whole ontology of centers, it’s complicated
Data needed to be unified, done in Europhenome Data Resource
- can search by phenotype, mutant term, list of mutant strains etc.
- results -> where/when/statistically marked up for significance -> simple plots bring up the raw data
- developing systems for describing phenotypes
- phenotypes are central to curatorial annotation and mining of comprehensive animal phenotype databases
- linking phenotypes with other supporting data
Question -> how much curratorial effort do they want to put in to this?
EQ is used/possible
- phenotypes for every gene
- develop ontologies and map among resources
- diseases as collections of phenotypes
- expanding databases (storage) etc.
Question: How many different data points for a mouse? Raw data in every mouse -> outliers vs. controllers -> data to website. Efficient/fast/many data points.
6:00: Shuttle back to hotel
7:00: Walk to restaurant (Pop’s) from hotel
Saturday, Feb. 26, 2011
8:45: Anatomy ontology resources (Andy Deans, moderator)
Paula Mabee: Phenoscape: Connecting evolutionary phenotypes to anatomy
Connecting evolutionary morphology to genes
Look at broad scale patterns of evolution, make inferences about candidate genes
Phenoscape – think about how you can use it
Difficult to relate changes in phenotype in morphologies to genetics
- Most development understood from model organisms
- Among fish, one big one: zebrafish
- Mutagenesis, gene knockdown – mutant data stored in ZFIN database – seed for this project
- Conservation of gene sequences & function and phenotype
- Leveraged into translational medicine
- e.g. Defect in a particular gene affects eye across taxa
- How far does that conservation go? – translational biodiversity?
kb.phenoscape.org – Phenoscape
- Prototype a curated ontology based evol. phenotype db that maps to genetic databases
- Use cases drove KB development
- Search for candidate genes underlying evol. morphology
- Requirements: ontologies, curation, database
– Initiated Teleost Anatomy Ontology, seeded by Zf Anat Ontology
–Teleost Taxonomy Ontology
—- TAO paper (Dahdul et al., Syst Biol (2010) 59 (4): 369-383. doi: 10.1093/sysbio/syq013 ): http://sysbio.oxfordjournals.org/content/59/4/369.full
–Taxonomic Rank Ontology
–Evidence Code Ontology
- Students manual entry of free text characters / Phenex
- Entity – Quality model for taxon phenotypes
- Taxon phenotype annotations
- Import from ZFIN genetic data
- 500000+ taxon phenotypes, 21000+ gene phenotypes for 4000 genes – put it all together
- Interactive user testing of interface
- KB inferred candidate genes – browse and make discoveries
- Siluriform scales absent – basihyal absent / Zebrafish mutants missing scales and basihyal – prediction of candidate genes
- Wet lab test in Monte’s lab (Richard Edmunds) – lack of eda expression in epidermis supports phenoscape kb hypothesis / lack of brpf1 expression in basihyal supports kb hypothesis
- Distribution of all characters across anatomical systems
Semantic framework and reasoning tools
Phenoscape as a resource: ontologies + EQ data for any taxon, reasoning across EQ data types
Mark W.: Where is kb richest?
Jim Balhoff: Curation interfaces: Phenex, Phenote, and MX
Tools for creating EQ phenotype decriptions
Phenote: made initial data annotations with this
- very general piece of software
- stand alone/desktop
- any conceptual model
- tab delimited output
- Interface: type ahead menus, table-oriented, term info panel
- built on top of OBO-edit
Phenex: adaptation of Phenote to focus on evol. character matrices
- stand alone/desktop
- Interface: panels with taxa, characters, term info, phenotype panel with EQs
- composition editor for putting an expression in EQ statement
- built on top of OBO-edit
MX (Matt Yoder): larger piece of software than just EQ annotations
- Export to OWL
- works off of NCBO Bioportal
- Searched by taxon, chose phenotype (“morphology”) to constrain results
- Queried for genes
- Links to source publications
- Click on ontology term to get a popup with details
More ontology resources
Nico Celinese: Clarification of homology; RegNum
- Apomorphy-based definition: the clade originating with the first ancestor of A to evolve M
- Develop a query to look for specific clades
- Synapomorphies not equivalent to descriptive traits that hold for all subordinate members of a clade; instead, they are meant to capture the condition of the ancestral taxon (this was Mishler’s comment in response to a question)
- Stressing need for globally unique indentifiers, presumably to identify clades and homologies (synapomorphies)
Erik Segerdell – eagle-i curation interface
Interface driven off of the ontology (OWL) making it highly configurable without any additional coding
High level resource types (instruments, organisms, etc) flagged in ontology
Fields associated with each resource type change dynamically depending on choice of type
Easy to import external ontologies via OWL
Hamid Tirmizi – Morphster
Morphbank image – has reference to anatomy term – link to BioPortal for details
Tool developed to allow users to build ontologies with image at the center
Lookup images from Morphbank
Use case: “whole organism” as a starting point / drag and drop an image / annotate image
- add “skull” / drag and drop another image
- open up multiple ontologies (PATO, TAO,..) to add characters
Ontobrowser – can view images attached to a term
- new version: search on taxon, anatomy, phenotype quality; can browse taxa, anatomy to refine results
Desktop intallation – need help from Hamid to install on your machine
10:00: Coffee break
10:30: Suzanna Lewis: ‘Lessons from GO: How different is anatomy?’
1998-99 – none of the MODs used standard terminology to describe biological function
- Drosophila sequence was imminent
- Microarray technology was emerging, results needed to be described
- First bio-ontologies workshop
Gene Ontology project
- GO applied almost immediately
- Importance of stress-testing
- Annotations: 3 primary components: ontology term, entity instance, evidence
- Classification rule: disambiguation
– Same term can mean different things
- -Same things can be described by different terms
- Annotation for a healthy ontology
– Develop annotation guidelines and training material
– Following basic construction rules
Do no harm
- Adding new relationships later filled in annotation gaps
Collaborate on concrete projects
- Focus the mind
- Definitions for everything
- Content-free unique IDs
- Don’t confuse representational technology with conceptual modeling
Many implicit ontologies within the GO, e.g. brain anatomy
11:00: Melissa Haendel: ‘How many anatomy ontologies?’
Extracting anatomy out of GO, e.g. brain and substructures
- but e.g. pons is never_in_taxon zebrafish
- need species specificity
What about levels of granularity?
- post-composing terms
Why one ontology (GO) is too few
- large community requires consensus
- non-trivial to represent taxonomic variation
Phenotype ontologies also have inherent anatomy
- usually designed for single species
Maybe everyone should have their own AO?
- Look at them all, see who has done what best
- Small community makes it easier to develop what you need quickly
- Difficult to compare anatomical data across species
- CARO facilitates common design principles
- Has fostered some interoperability
- Species specific subtypes of CARO types: helps organize AOs using standard relations and classes
- Facilitates integration of anatomical scale
- Does not provide species interoperability
- Not enough axes of classification
We need to be able to query anatomical structures across species
Let’s not get paralyzed by homology, we can save for later
Current modes of alignment
- import/MIREOT (=minimum information to reference an external ontology term)
Ontology mappings often not useful
Synchronization: TAO and ZFIN
- Difficult to keep in synch
Import across ontologies
- MIREOT – one can import a whole ontology or just portions of another ontology
- OntoFox – a web server for MIREOTing
UBERON – one mechanism to allow alignment among species
(Combined the two scheduled panel discussions)
11:30: Panel discussion I: technical and social concerns, pros and cons (Suzanna Lewis, moderator): Hilmar Lapp, Matt Yoder, Pantelis Topalis, Pankaj Jaiswal)
Panel discussion II: technical and social concerns, pros and cons (Suzanna Lewis, moderator): Lukas Mueller, Elisabeth Arnaud, David Osumi-Sutherland, Peter Robinson
1. Phenotypes may be captured at the molecular level up to the level of the whole organism. Restricting this question to a single organism, what are the best strategies for capturing the physiological and developmental connections that link a molecular phenotype to its organism-level manifestation? (e.g. a problem in a membrane protein that is associated with tremors)
PJ: Distinguish between traits, phenotypes.
LM: This question can be reduced to a genotype/phenotype problem.
PT: Would be happy if we can connect molecular phenotype to organismal-level phenotype.
Brent M: Need a definition of phenotype: Everything above a DNA level?
Mark W: Really important to have function assigned to a structure at some level. “Structure without function is a corpse, function without structure is a ghost.”
2. How broad a phylogenetic spectrum do you think is feasible for any given cross-species anatomy ontology to cover?
DO-S: Need for cross-species ontologies, but you go up to e.g. arthropods and you get lost.
PJ: Above phylum level is very generic.
HL: Feasible as long as homologous structures among whole group. If only a small subset of ontology is used in annotation, it is too large.
Joel C: This is a fundamental question. From practical standpoint, we need ssAOs. But really need some terms comparable among groups.
Marvalee W: The questions lies in degree to which meshing tools are interoperable.
Martin R: We need to know what questions we want to answer with this interoperability of several ontologies.
What is the purpose? What are use cases.
MH: We sometimes don’t know what the homology is.
Joel C: Sometimes you need to search for correlations.
SL: Other connections and information, not just the “pure”.
3. What are the most appropriate relations to be used for linking cross-species anatomy ontologies? Purely evolutionary? Structural? Functional?
4. What do you think the impact of High-Throughput Sequencing technology will be on phenotypic and evolutionary studies?
DO-S: Mostly indirect.
Brent M: A given phenotype can be homologous between species but be caused by different genes.
5. Given the fact that a lot of research is comprised of international collaborations, is it possible or desirable to agree on semantic standards for phenotype descriptions?
6. What technologies and strategies might be used to connect geographically isolated data warehouses?
7. For yourself and your purposes are pre or post composed phenotype ontologies preferable and why?
PJ: Problem with post-composition: many different ontologies that curators have to use to build a phenotype.
MY: Problem with pre-composition: too easy to accept precomposition without matching the diversity of what they see (i.e. a term looks “good enough”)
HL: Post-composition critical to opening up a whole universe of reasoning, cross-connecting phenotypes.
SL: It’s not an either/or.
12:30: LUNCH; Advisory Board meeting
2:30 pm: Working group breakouts (plants, vertebrates, arthropods)
3:30: Coffee break
3:45:-5:00 Working group breakouts (plants, vertebrates, arthropods, informatics)
Charge to working groups – develop use cases:
What scientific problem(s) do you want to solve (that you currently can’t)?
What data, tools, annotations, etc. might be needed?
Vertebrate Working Group
Paula Mabee, Marvalee Wake, Peter Robinson, Jim Balhoff, Melissa Handel, Joel Cracraft, Hamid Tirmizi, Shane Burgess, Wasila Dahdul, Monte Westerfield, Mark Westneat, Erik Segerdell, Marcelo Weksler, Suzi Lewis, Peter Midford, Cynthia Smith
Joel: Need framework of variation in order to ask interesting questions; have multiple integrated databaes to ask various questions.–> discover associations
But what specific questions?
Wake: all species with digit numbers that vary from 5? loss/gain type questions
Looking for phylogenetic signal, evol. patterns
Mark: What’s the bigger question that frames this?
Big question: Why are jaw mechanisms so diverse in some groups of fishes but same in others? Speicifc question: what are all the jaw shapes in clade A vs. B?
–Why are some groups more diverse than others? Holy grail question – multiscale question but could be answered with an ontology that could explore diversity from one group to another and to genes
Joel: Birds want to link up to genomic data come up. Have large compendium of characters but not known how its distributed across birds.
Marcelo: yes interested in linking to genetics
Jim: systems must prepare to accommodate more genomic data
Someone said ‘everything will be a model organism’ once all information is inter-operable
Paula: does MOD community know what they’re going to do with all this info?
Monte: revolution in genomics is getting genomic maps of wide variety of species, expression data, for thousands of species…hard part will be to link this to morphology etc.
Melissa: task is to enable this linking in a generic way
Peter: Encode looked at 1% of genome (and is now funded to look at the entire human genome), as are worm and fly. (see http://www.genome.gov/modencode/) … can you find useful a smaller dataset? Do this for 10-20 organisms
Joel: large integrated datasets/databases characters and genomics in birds could get at questions re: constraints for example
Classes of questions:
- across studies and species within anatomy within a group
- looking at this against environment
Mark: good example of ontologies going from gene to product to tissue to whole organism to behavior- vertebrate striated muscle;
HAO and fly have very developed muscle nodes
Paula: For Amniotes – mammal ontology is it
Shane: genome evolution, birds genome have selective sweeps, etc. cranio-facial development, nice features; interested in physiology, growth, parasitism, host coevolution;
anchor birds to chicken which is well developed
Suzi: baseline of comparisons across phylogenetic spectrum:
Monte: behavior needs ontological representation
Paula: temporal representation: ontogeny
Melissa: Ontogeny (developmental) or homology (historical) representation is time in an ontology; difficult using DL; no agreement on how to do it;
Paula: annotate time to a taxonomic node
Shane: reference developmental stages
How to compare dev stages across vertebrates? very difficult
Paula: ontogeny varies in evolution
Marvalee: example: temporal patterns of expression of thyroxin for ex.
Paula: what do we need to address these use cases?
Monte: seeding for amniotes should be adult mouse ontology
Melissa: seek out experts for specific parts of the ontology; do one system at a time rather than one taxon
Joel: How to do this: ontology first, then data or simultaneously?
Read in large dataset on ?bird songs- will populate ontology
Monte: Data driving the ontology building; iterative process
Marcelo: need for images
Paula: someone needs to do images (not phenoscape’s domain)
Phenoscape could be used to hold any group’s annotations
How to do start anatomy ontology? Where to match the most with mouse?
What exists? Go needs work in sections but could be split (e.g., behavior or processes split out and further developed)
Mark: start with model system and make sure basics are there (skeletal functions, soft anatomy, etc)
Marcelo: Mammal morphobank data not available in ontological form; need script to export it
Shane: what are components of the phenotype? Groups are going to annotate carcasses (?) something about chicken backfat Need for standard terms in industry to talk about anatomy; what is invovled in describing a phenotype – what is available to do this?
Suzi: as first pass, just assigning entity is helpful
Shane: people using genomics in industry; individual sequencing;
Areas of focus: applying genomes to animal health; phenotypes (big deal for industry)
Melissa: Integrating ontologies
- maintain series of ontologies at different taxonomic levels (euk, plant, metazoan, vertebrate, etc.)
- have common upper level ontology linking these; imported into subontology
Sunday, Feb. 27, 2011
Breakout group reports
Arthropod Working Group
26-27 February 2011
domain experts: C. Desjardins, AR Deans, N Franz, MJ Yoder, D Osumi-Sutherland, M Ramírez, P Topalis, C Smith, S Winterton
observers: J Beach, P Midford, C Parr
Characteristics of the Arthropod Community:
The arthropod working group has three levels of users, with different needs that could be satisfied by the Phenotype RCN:
1. Established ontologies that need improvement
* address (legacy) structural issues, improve semantics and biological content.
* establish an auditing process or service?
2. Young ontologies that need standardization
* developed in isolation, not compliant
* need upper-level alignment
* expert evaluations (auditing process?)
3. Aspiring ontologies
* affirmation (this is worth doing)
* how to get started (e.g., template system)
* where to find support
* best practices and recommendations are needed
4. No ontologies yet
* not generally aware of their utility (except maybe for GO)
* research likely to accelerate if ontologies are adopted
* need proofs of concept and idea of how much effort will be involved
What’s different about the arthropod community:
1. megadiversity – need and potential (opportunity) are greater (e.g., navigation)
*opportunity to examine scalability issues (with respect to reasoners, e.g.)
2. (unfair?) perception that taxonomists/evolutionary biologists who focus on insects are slow to adopt new methods and technologies
Arthropod-related Ontology Use Cases:
● Descriptive taxonomy – annotate specimens (or taxon concepts) with standard, computable syntax
● Re-use of descriptive data for purposes other than taxonomy (e.g., to address questions related to ecology, behavior, natural phenotypes)
● Explore transcriptional changes underlying behavioural organization
● Predict behaviours based on phenotypes
● Explore semiochemicals involved in defense behaviour across (some taxon)?
● current challenge: chemicals have syntax/grammar, different species use the same chemicals differently
● Develop pathways that connect arthropod models to humans
● Model organism researchers desire to query across their databases (need coordinated ontologies but also simpler data structure); RCN could coordinate the ontology aspect of this challenge
● Biogeographical distribution of phenotypes (visualize color patterns on latitude/longitude gradients)
What result would make us feel like something useful had been accomplished? [brain dump]
* Bridge the gap between ontology development and implementation in a taxonomic description framework; make descriptions semantics-compatible
* Coordinate a community effort to develop behavioral ontologies for insects
* Develop arthropod anatomy reference ontology (e.g., tagma, segment, exoskeleton)
* Develop insect anatomy reference ontology (e.g., Johnston’s organ, etc.) <= easier, perhaps, than arthropods
* Flybase needs to improve its structure (expert advice on musculature)
* Hymenoptera link into FlyBase, which requires similar structure
* Best practices examples or problems to be hashed out:
o ontology of insect development (a first approach anyway)
* identify aspects of insect anatomy (and other aspects of phenotype?) that are relevant to ecology and other domains; brings the power to ontologies to the masses (formic acid and ants, weather change)
* multi-species grids, scored for character states (presence absence analyses, related to biogeography); ontologies establish the standardization needed for characters
* focus on small things that work … but that maybe are scalable
* identify some annotation issue that would end up as a proof of concept (brains, olfactory receptors; muscles; central complex of mushroom bodies and antennal lobes)
* common ways to visualize? <= informatics product
* establish longer-term buy-in
o who’s missing?
+ behavior community
+ Orthoptera, Hemiptera, brain biologists <= identify communities with active gene expression, taxonomy, comparative morphology
o educational component
Which projects, not represented at this meeting, do we need to engage, and when?
1. Honey bee behavior, neural anatomy
2. Nasonia wasp genomics
3. Hymenoptera Genome Database
4. Cotesia (and other wasp) behavioral ecologists
5. ant behavior researchers
6. Telenomus genome project
7. descriptive /evolutionary systematists
1. Heliconius butterfly wing patterns, genomes
2. Limenitis / Papilio butterfly wing patterns
3. Bicyclus butterfly wing patterns
4. Bombyx genome
5. descriptive /evolutionary systematists
1. Mosquito genomics
3. descriptive /evolutionary systematists
1. Tenebrio beetle MOD
2. beetle horn development
5. Pea aphid genome
6. Applied entomologists more generally (pesticide resistance, susceptibility)
8. Spider, lepidopteran, and other silk communities
Note: We need proof(s) of concept, template ontologies, and documentation on how to get started (where are the resources) before we approach these communities. See Year 1 goals below.
How do we effectively communicate with potential partners?
- Option A – white paper
○ include use cases, both real and imagined
○ outline steps user needs to take to do to get there (address the perception that this is a huge amount of effort with very little return)
○ includes explicit, published examples (bibliography)
- Option B – poster / presentation
○ see above
○ Kansas State University – Arthropod genomics meeting
- Option C – project-specific examples
○ target certain communities with more relevant examples, proofs of concept
Note: We need proof(s) of concept that involve arthropods and span anatomy, behavior, chemicals, and genetics.
What’s missing from current set of tools:
- Visualization – both of the ontology itself for iterative development purposes, and of images/annotations relevant to classes.
- user-friendly audit tools (necessary for non-expert developers)
- Phenotype statements ==> matrices tool
Explicit Goals of the Arthropod Working Group
Goal: Establish both a common arthropod and insect anatomy ontology (Facilitators: Osumi-Sutherland, Yoder, Deans)
Motivation: This resource will act as a starter anatomy ontology for researchers interested in developing lower-level taxonomic ontologies within Arthropoda. This ontology would also facilitate the alignment of existing and future arthropod anatomy ontologies.
Needs from RCN: Working group funds.
Other informatics needs: Better tools for working with nested ontologies (templates).
Deadline: June, 2011 for first draft.
Notes: Develop in Protégé for logic reasons, use Git (or SVN?) as our shared repository. We need an OBO <=> OWL translator.
Goal: Formally develop an SOP for the template approach, in terms of workflow and standards (Facilitators: Osumi-Sutherland, Yoder, Deans)
Motivation: There are numerous groups developing or at least interested in developing arthropod anatomy ontologies, and this approach will facilitate development.
Needs from RCN: Working group funds.
Notes: Interaction with Vertebrate group regarding template approach (they already did this kind of work)
Goal: Write and distribute a dynamic, user-friendly best practices guide. This guide will be refined regularly based on outcomes of iterative ontology development. (Yoder, Deans)
Motivation: Communities can’t wait for a how-to guide.
Needs from RCN: Context-specific wiki?
Deadline: Decide medium by June, 2011.
Goal: Propose at least three concrete, tractable research questions or hypotheses that could be addressed using phenotype ontologies (Deans)
Motivation: We need a proof of concept for broad community buy-in and understanding.
Needs from RCN: Conduit for communication.
Deadline: Decide by June, 2011.
Notes: Sensilla, silk, muscles, surface structures, olfactory system ….
- eye mutants in Drosophila and Nasonia (Desjardins)—(peach, micky mouse, st)—but no candidate genes (centromere problem, no one yet interested) [except cinnabar example]
- wing size mutants in Nasonia (Desjardins) – candidates known for most, so could be strawman candidate; many are introgressions from one into another (i.e., not mutants)
- olfactory system – semiochemical work (CSmith), spiders (MRamírez), evolution of chemosensilla, link through function by sensory modality, link through behavior, anatomy as well (antenna, antennal lobe, higher brain processing); gives us homology link
Goal: Develop an upper-level behavior ontology, and flesh out the branch for foraging behavior ontology. (CD Smith, P Midford)
Motivation: Widely conserved across all taxa, high enough behavior that we could answer comparative species questions, bound in the number of genes, and we can exploit knowledge of experts.
Needs from RCN: Working group funds. Networking tools.
Other informatics needs: …
Deadline: June, 2012? for first draft.
Goal: Develop an insect development ontology
Motivation: We need to address the time/development aspects of arthropod phenotype, which has proven to be an intractable problem. This ontology would intersect with anatomy and others as necessary.
Needs from RCN: Working group funds? Networking tools?
Other informatics needs: …
PM: How will group stay in touch?
AD: Email, repository for parking ontology (e.g. svn), Webex
PM: Specific activities for June meeting?
AD: Expect to have draft ontology by then. Chance to face-to-face evaluate. Write up SOP/get started guide.
MY: Might already be a lot of best practices documentation on other wikis.
Vertebrate Group Collaboration Proposals
Participants: Paula Mabee, Joel Cracraft, Shane Burgess, Suzanna Lewis, Melissa Haendel, Marvalee Wake, Peter Robinson, Mark Westneat, Hamid Tirmizi, Monte Westerfield, Peter Midford, Cynthia Smith, Marcelo Weksler
- Ontology building and ontology alignment (yr 1): Bringing up amphibian (young ontology), bird and mammal anatomy ontologies up to the level of the fish and mouse (and perhaps then incorporate as cross-products in the GO).
- Mammal and birds will derive their own taxon specific anatomies using mouse as a template on and make interoperable with mouse anatomy ontology (after VAO added).
i. Erik, Melissa, Chris M., Shane Paula- give Chicken a starting place for an interoperable anatomy ontology by deriving uberon+VAO+jonathan Bard’s + geisha chicken ontology
ii. Joel C-Shane – need to create an avian anatomy ontology-building off known vertebrate ontologies, perhaps starting with cranial skeleton, then expanding from there. Could be comparative and could start by importing character data from matrices that exist. Would like to know where densest phenotype annotations are within the mouse skeletal system.
- potentially: find existing mouse anatomy terms in relevant characters for these groups. Needed new terms will be added and relationships modified as per recommendations of morphologists in collaboration with mouse database. Will communicate with Maureen O’Leary and Seth.
- Amphibians (Wake) need to import Vertebrate Anatomy Ontology (VAO). Need to align with model organism (Xenopus data base). Reconcile Taxonomy ontology
- Vertebrate working groups need to be formed to align body parts across ontologies (e.g. figure out how cranial skeleton in different vertebrates should be represented)
i. Begin with limbs, cranial skeleton, or axial skeleton. [ need to figure out where to start based on where annotations are and where expert anatomists want to start]
- Possible collaborations:
- Small group to work on one particular piece of the anatomy that is common across 3-4 species. See below, perhaps use the anatomy of hands ( feet/paws/etc.) for syndactyly.
- Mark W, Shane, Peter R. David O-S. Build a proof of principle vertical system using ontologies to make inferences going from gene to protein to tissue to whole organism to behavior- vertebrate striated muscle;
- Peter R: Get 30 (or so) organisms annotated (for genes, phenotypes, etc.) to saturation. Place them in a phylogenetic tree and for each node, look for the distinguishing characters/phenotypes. For these look at the sequence differences that are associated with these differences in phenotypic (and associated environmental etc.?) characters.
- Revisit behavioral ontology in relation to GO (yr. 1). This group will report in year 2 annual summit
- Suzi, Mark W., Peter M., Monte: We need to revisit how to build a better behavior (and maybe physiology) ontology. Do we add terms to GO processes, start a new ontology based on existing behavior ontologies, strip behavior terms out of GO,…?
- Peter R., Peter M, Chris S, and Chris M. to develop a purpose built behavior ontology (or add to GO)
AD: Issue of behavior came up in arthropod group as well
MH: ICBO meeting in Buffalo this July is a good outreach opportunity for this group
AD: re behavior, perhaps focus on foraging behavior as an early proof of concept
Plants Working Group
Already pretty good coordination among plant related ontologies.
Proof of concept/test case: leaves
Want to be able to annotate to quantitative data (e.g. temperature, leaf area).
Want a resource page, which tools are available.
Plan, in next several months:
-Nico, Brent develop PhyloCode, tools for making tags
-The rest collect, expand annotations relevant to leaf development
-At first workshop, go through relevant part of PO, explore how to make links to homology
-Look at annotations, what kind of info can be extracted from that; analysis of quantitative data
-Go more into quantitative annotations, other species; influence of environments on phenotype
If test case is successful, then expand it, bring in other areas of PO
SL: Homology needs to be done as the same way as other groups are doing it.
General discussion on RCN goals
Lightning ++ talks – what will you be doing?
Eva H – looking for Arabidopsis annotations, quant. annotations
Lukas – look at tomato leaf mutants
Hamid – integrating informatics tools, currently Morphster, Morphbank, Specify. Should look at opportunity to connect more tools together
Andy D – maintain communication started here, how to communicate with other arthropod groups
Chris S – get ontology into open space and fixed, work with CM, submit grant for behavior ontology
David O-S – signed up for a lot of stuff! Draft of upper level ontology stuff, feedback
Pantelis – upper ontologies, insect ontology contribution
Suzi L – talk to all who are developing annotation tools; make sure Shane’s people can start chicken annotations for phenotype and GO; candidate gene analysis based on pheno searches (w/ Peter and Monte)
Shane – communicating to prepare for chicken ontology/annotation
Martin R – integrate spider ontology with MX
Monte – PATO project, longer range needs for behavior ontology (communication with GO)
Paula: seems to be a need to bring in behavior group next year
Peter M – involved in behavior activities, spider curation
Nico Franz – is now a user of MX; will assemble literature and create and compare classes for Coleoptera (-specific) structures with those of the HAO glossary, as a starting point for creating a HAO-compatible Coleoptera phenotype ontology; eventually (after three months) I intend to bring in other beetle specialists. Also will be involved in reviewing the higher Insect/Arthropod ontology structure.
Jim B – development of phenotype curator interfaces, help anyone ready to start producing data
Quentin C – support of folding in of ontology wrt leaf development, building phylogenies
Marvalee – facilitate amphib ontology efforts, collaboration with vertebrate WG; introduce ontological thinking to herpetological community
Matt – work with David and Andy – templating for insect level ontologies
Marcelo - linking data with mouse ontology, help with mammalian taxonomy ontology
Joel C – communications with RNC, ornithologists
Laurel C – PO perspective, follow up on contacts made here, framework for annotations
Dan B – Nudge developer comm to male it easy to work with continuous phenotype data
Melissa H – UBERON, vertebrate anatomy, whatever else vertebrate group needs
Wasila – contribute to Vertebrate working group, first complete teleost ontology links to VAO
Cynthia Smith – mouse behavior
Peter – collaboration with Cynthia S, human phenotype ontology
Rosemary – collecting, measuring plant traits
SL: People working on the anatomy ontologies are urged to attend ICBO meeting in Buffalo.