| About | Grand Challenges | Discovery Environment | Communities | News & Media | Events | Contact |
Data Integration
Investigators studying genotype to phenotype relationships need access to multiple sources of current information about all genes/ proteins/metabolites/etc. with known or suspected roles in diverse response networks. Equally important are sets of data on environmental variables that can strongly influence (i) gene product dynamics under non-constant conditions and/or (ii) reaction rates or other features central to laboratory measurement techniques. An enormous amount of all these data types is currently available but either difficult or impossible to access in any useful way. The many impediments to data use range from incompatible indexing, storageData Integration Working Group's roadmap (photo: T. Lee)., or display formats to the near impossibility of maintaining awareness of new data sets as they are formed or, in time, superseded by better technology. As a result, current data are vastly under-utilized. Moreover, as great as the existing corpus of data is, it will be rapidly dwarfed by the exploding rate of acquisition that is taking place due to new technologies such as next generation sequencing.
The data integration work group will investigate and apply methods for describing and unifying data sets into virtual systems that support other project activities. This will not entail the physical merging of data from different sources . such concatenation is neither practical in the short term nor sustainable over time. Instead, the approach will build upon existing middle-ware systems that use metadata to achieve situational awareness of available data, the logical relationships between different data sets, and tools that enable users to find relevant information even when they are not sure what data may exist. The system will support both data intended to be publicly distributed as well as secure, private and/or user-local repositories and will enable information to be pipelined into statistical inference, visualization, and/or modeling tools and applications.
Working Group Members
| Name | Role | Institution | CV |
|---|---|---|---|
| Doreen Ware | Team Lead | CSHL; USDA-ARS | CV |
| Chris Jordan | Team Lead | University of Texas, Austin | CV |
| George M. Coupland | Collaborator | Max Planck Institute for Plant Breeding Research | CV |
| Justin O. Borevitz | Collaborator | University of Chicago | CV |
| Eva Huala | Collaborator | TAIR, Carnegie Institution | CV |
| Lukas A. Mueller | Collaborator | Boyce Thompson Institute | CV |
| Carolyn J. Lawrence | Collaborator | Iowa State University; USDA-ARS | CV |
| Ruth Grene | Collaborator | Virginia Tech | CV |
| Pankaj Jaiswal | Collaborator | Oregon State University | CV |
| Doina Caragea | Collaborator | Kansas State University | CV |
| Weijia Xu | Collaborator | University of Texas, Austin | CV |
| Zhenyuan Lu | Collaborator | Cold Spring Harbor Laboratory | CV |
| Qi Sun | Collaborator | Cornell University | |
| Matt Pickard | Collaborator | University of Arizona - graduate student | |
| Damian Gessler | Collaborator | University of Arizona | |
| Jim Jones | Collaborator | University of Florida | |
| Christos Noutsos | Postdoc | Cold Spring Harbor Laboratory |
