Partners Land Grant to Build Data Infrastructure
Renaissance Computing Institute (RENCI) and partners at UNC Chapel Hill, Duke University, Indiana University and the city of Durham, NC, will work together on a project that aims to allow scientists to share and analyze data across institutional boundaries while keeping that data safe and in compliance with regulations that control data location, availability, movement and access.
The three-year, $3 million project, funded by the National Science Foundation, will allow researchers to focus more fully on science by building a technology infrastructure that supports best practices in moving data, managing data, ensuring security and preserving privacy. The system will be tested with a variety of use cases involving scientists at UNC’s Odum Institute for Social Science Research and Duke’s Social Science Research Institute (SSRI). Neighborhood Data Works, a nonprofit launched by the city of Durham, will participate in the project by providing access to data relevant to the social science test cases, including data on crime, voting records, building permits, property records, and business licenses.
"Scientific progress today depends on being able to share and analyze data across disciplines and institutional boundaries, but policies, regulations and privacy concerns often make that difficult and time consuming,” said Ilya Baldin, director of networking research at RENCI and principal investigator on the project called Infrastructure for Privacy-assured CompuTations (ImPACT). “We plan to lower the barriers that impede researchers from using privacy-restricted data in collaborative settings. The goal is to let technology deal with these concerns and allow scientists to concentrate on science.”
ImPACT will enable researchers to focus on science by supporting analysis of multi-institutional data while satisfying organizational data policies. It will develop a model for managing trust and best practices for multi-institutional networking, data management, security and privacy. ImPACT will use different approaches to data privacy and security that recognize the different data policies and data privacy concerns of organizations, including trusted third parties (TTPs) and emerging technologies such as secure multi-party computation (SMC), homomorphic encryption (HE), and the use of differential privacy.
The system will build on existing data cyberinfrastructure technologies developed to manage, share, and archive data, including:
- Dataverse, a repository for archiving, publishing and discovering social science research data sets located at the Odum Institute.
- CyVerse, a web-based platform for handling large-scale data sets and conducting complex analyses launched by the NSF iPlant Collaborative.
- The Open Resource Control Architecture (ORCA), a control framework developed at Duke. ORCA is used as part of the NSF Global Environment for Network Innovations (GENI) initiative to control the ExoGENI distributed cloud environment managed by a team at RENCI and Duke.
To ensure that the framework addresses security concerns, ImPACT will leverage the work of Silver, an NSF project focused on cloud security issues, and the Indiana University’s Center for Applied Cybersecurity Research (CACR) which will offer guidance on cybersecurity and privacy challenges.
"The power of data to yield truly transformative insights expands exponentially as data from different sources comes together. This project will build exciting new pathways for this to happen while fully insuring that privacy and confidentiality is maintained. If successful, it will allow us to answer questions we have never been able to address before,” said Tom Nechyba, director of Duke’s Social Science Research Institute.
“Highly productive, usable research infrastructure that provides for data security and privacy is a key goal for the scientific community. We are excited to be a part of this team, providing advice and evaluation in cybersecurity and privacy to maximize ImPACT’s fulfillment of this critical need,” Von Welch, director of CACR added.
The social science researchers involved in the project will use data sets from Durham’s Neighborhood Data Works project to test the platform on real-world research questions. One research project will investigate relationships between race, justice and political engagement using election data, policing and crime data, and community-level demographics. Other questions to be researched over the course of the grant include the impact of housing, crime and juvenile detention on educational outcomes—using local justice and educational data—and the relationship between recurring stressful community incidents (such as violence), social inequities and public health—using crime data, property records, and aggregated local health records.
"The imPACT project is a chance to frame data relationships around trust and to bring the worlds of public data access and sensitive research closer together,” said John Killeen, director of Neighborhood Data Works and an employee of the City of Durham’s Neighborhood Improvement Services Department. “That’s especially important since so much research is not readily accessible – even to the people in the data.”
By integrating data privacy technologies into data sharing practices the ImPACT team hopes to facilitate new multi-institutional research and new discoveries and to serve as a model for other scientific disciplines that struggle with data privacy concerns.
"Social scientists increasingly require access to data spread across multiple sites, some of which requires privacy protection,” said Tom Carsey, director of the Odum Institute for Research in Social Science and co-PI the project. “The IMPACT project will be critical to making this possible.”