How do phenotypic data factor into the issues relating to integrating complex data? Three frequent phenotypers (Ramona Walls, Chris Mungall, and Maryann Martone) were supported by this RCN to participate with sixteen others in an ‘Integrating Complex Data’ workshop organized by the American Institute for Biological Sciences (AIBS) with NSF funding (EF-1450894), on March 30-31 at the Hyatt Regency Crystal City in Arlington, Virginia. The workshop was co-chaired by Paula Mabee, Corinna Gries, and Robert Gropp, facilitated by Kathy Joyce, and observed by various program officers and staff from NSF.
Complex data integration, defined as ‘bringing together data from two or more fields’, is required to address many fundamental scientific questions as well as understanding how to mitigate the challenges facing the planet. Participants (whose research interests ranged from genetics, genomics, metagenomics, systematics, taxonomy, and ecology, to bio/eco-informatics and cyberinfrastructure development) initially discussed specific use cases in which complex data integration was required. They then focused on the barriers that impede integration, recognizing domain silos as major problem at this scale. They illustrated with examples that data discovery and integration are currently hampered by lack of common standards, including those for IDs, representation, ontologies, data formats, data collection, and communication protocols. The usefulness of ontologies in connecting phenotypic data to other data types across domains was described by Phenotype RCN participants.
Suggestions and next steps required to achieve better data integration were the focus of the second day of the workshop. Community coalescence around shared standards, rather than more standards, was considered key. Participants advocated for interagency discussions about how to provide linkages across their data systems, thus making data from all sites more readily discoverable and distributing the financial burden. Participants further recognized that the technical expertise required for complex data integration is high; they promoted cross-training in informatics for graduate students and a higher level of specialist ‘data scientist’ training. They also felt that funding mechanisms to enable scientists to employ technical specialists for specific data integration tasks would enable complex data integration. Particularly at this juncture, where cross-domain data analysis is required to address societal problems, participants stressed that it is important to try to solve the immediate problems while working toward long-range solutions. A full report from this workshop is in preparation and a link will be posted when it is available.