Data Integration

Investigators studying genotype to phenotype relationships need access to multiple sources of current information about all genes/ proteins/metabolites/etc. with known or suspected roles in diverse response networks. Equally important are sets of data on environmental variables that can strongly influence (i) gene product dynamics under non-constant conditions and/or (ii) reaction rates or other features central to laboratory measurement techniques. An enormous amount of all these data types is currently available but either difficult or impossible to access in any useful way. The many impediments to data use range from incompatible indexing, storageData Integration Working Group's roadmap (photo: T. Lee).Data Integration Working Group's roadmap (photo: T. Lee)., or display formats to the near impossibility of maintaining awareness of new data sets as they are formed or, in time, superseded by better technology. As a result, current data are vastly under-utilized. Moreover, as great as the existing corpus of data is, it will be rapidly dwarfed by the exploding rate of acquisition that is taking place due to new technologies such as next generation sequencing.

The data integration work group will investigate and apply methods for describing and unifying data sets into virtual systems that support other project activities. This will not entail the physical merging of data from different sources . such concatenation is neither practical in the short term nor sustainable over time. Instead, the approach will build upon existing middle-ware systems that use metadata to achieve situational awareness of available data, the logical relationships between different data sets, and tools that enable users to find relevant information even when they are not sure what data may exist. The system will support both data intended to be publicly distributed as well as secure, private and/or user-local repositories and will enable information to be pipelined into statistical inference, visualization, and/or modeling tools and applications.

 

Working Group Members

Name Role Institution CV
Doreen Ware Team Lead CSHL; USDA-ARS CV
Chris Jordan Team Lead University of Texas, Austin CV
George M. Coupland Collaborator Max Planck Institute for Plant Breeding Research CV
Justin O. Borevitz Collaborator University of Chicago CV
Eva Huala Collaborator TAIR, Carnegie Institution CV
Lukas A. Mueller Collaborator Boyce Thompson Institute CV
Carolyn J. Lawrence Collaborator Iowa State University; USDA-ARS CV
Ruth Grene Collaborator Virginia Tech CV
Pankaj Jaiswal Collaborator Oregon State University CV
Doina Caragea Collaborator Kansas State University CV
Weijia Xu Collaborator University of Texas, Austin CV
Zhenyuan Lu Collaborator Cold Spring Harbor Laboratory CV
Qi Sun Collaborator Cornell University  
Matt Pickard Collaborator University of Arizona - graduate student  
Damian Gessler Collaborator University of Arizona  
Jim Jones Collaborator University of Florida
Christos Noutsos Postdoc Cold Spring Harbor Laboratory