Research Datasets

Research Datasets

SSRI Managed Datasets

SSRI actively manages access to a number of Duke licensed protected datasets. If you have any questions or want access to the data, please email the contact person.

Catalist Voting Data

The Catalist dataset draws on voter registration files from all fifty states, supplemented by information from commercial sources, to provide unprecedented information on approximately 265 million individuals within the United States. The dataset contains roughly 700 different variables. The level of information varies by state, but typically includes an individual's political party registration, turnout history, age, gender, and race as well as information from government and commercial sources. Duke now has a subscription granting researchers access to a 1% national sample of the dataset. This data promises to be valuable for researchers in disciplines such as political science, business, and sociology. For more information, please email Sunshine Hillygus.

Dataquick Housing Data

Through an agreement with Dataquick Inc., researchers at Duke University have access to a vast quantity of data describing housing transactions from throughout the United States. The data include information about transaction dates, prices, loan amounts, buyers', sellers' and lenders' names, and housing characteristics such as (but not limited to) exact street address, square footage, year built, lot size, number of bedrooms, number of bathrooms, and number of total rooms. Data typically cover the period from the mid-to-late 1990's to 2012, but go back further than that for many counties. Contact us for getting access to these data.

E-Risk Data

The E-Risk Longitudinal Study aims to build knowledge about children's behavioral development and mental health, including disruptive behaviors such as oppositional, conduct, hyperactive and inattentive behaviors, depression, and anxiety. It addresses: a) which specific environmental risk factors contribute to the early emergence of mental health problems;

b) whether environmental risk factors interact with genetic risk to influence mental health problems;

c) whether and how child-specific parenting experiences explain differences in behavioral outcomes between children in the same family;

d) how the effects of risk are mediated through children's neuropsychological executive functions, social-information processing, and verbal skills.

Data were collected at age 5, 7, 10, and 12 to investigate specific environmental risk factors that contribute to the early emergence of mental health problems. Data collected included: interview and observational data from parents (typically, mothers) and  children; survey data from teachers; cognitive assessments; grades; and other key data.

Researchers at Duke University may gain access to the E-Risk study data after submitting a concept paper. Please contact ehdidata@duke.edu for more information about the data and the paper submission process.


Other Datasets

Below is a list of data sets that may be of interest to social science faculty and graduate students.

Add Health

The National Longitudinal Study of Adolescent to Adult Health (Add Health) is a longitudinal study of a nationally representative sample of adolescents in grades 7-12 in the United States during the 1994-95 school year. The Add Health cohort has been followed into young adulthood with four in-home interviews, the most recent in 2008, when the sample was aged 24-32.

National Health and Nutrition Examination Survey (NHANES)

A program of studies designed to assess the health and nutritional status of adults and children in the United States. The survey is unique in that it combines interviews and physical examinations.The NHANES interview includes demographic, socioeconomic, dietary, and health-related questions. The examination component consists of medical, dental, and physiological measurements, as well as laboratory tests administered by highly trained medical personnel.

Collaborative Psychiatric Epidemiology Surveys

The Collaborative Psychiatric Epidemiology Surveys (CPES), were initiated in recognition of the need for contemporary, comprehensive epidemiological data regarding the distributions, correlates and risk factors of mental disorders among the general population with special emphasis on minority groups. The primary objective of the CPES was to collect data about the prevalence of mental disorders, impairments associated with these disorders, and their treatment patterns from representative samples of majority and minority adult populations in the U.S. Secondary goals were to obtain information about language use and ethnic disparities, support systems, discrimination, and assimilation, in order to examine whether and how closely various mental health disorders are linked to social and cultural issues.

NC Integrated Data for Researchers

North Carolina Integrated Data for Researchers (NCIDR) is a research data warehouse that integrates mental health data from four sources in the state of North Carolina. One source is Medicaid claims and enrollment data for individuals with mental health, developmental disability, or substance abuse diagnoses. The Integrated Payment and Reporting System includes mental health services for those people that are not eligible for Medicaid. The Healthcare Enterprise Accounts Receivable Tracking System (HEARTS) tracks services provided by inpatient mental health facilities run by the state. Lastly, Piedmont Behavioral Health includes data from a managed care organization serving five counties within North Carolina.

NC Vital Statistics

The North Carolina State Center for Health Services (SCHS) compiles yearly data on vital records such as deaths, births, fetal deaths, matched birth/infant deaths, marriages, and divorces. Yearly reports on state population, births, deaths, marriages, and divorces are available from 1914; reports on causes of mortality are available for 1971.

Youth Risk Behavior Surveillance System (YRBSS)

The Youth Risk Behavior Surveillance System (YRBSS) monitors six types of health risk behaviors that contribute to the leading causes of death and disability among youths and adults. These behaviors include: behaviors that contribute to injuries and violence, sexual risk behaviors that lead to unwanted pregnancies and sexually transmitted diseases, alcohol and other drug use, tobacco use, unhealthy dietary behaviors, inadequate exercise, obesity and asthma.

RAND Center for Population Health and Health Disparities Data Core

The RAND Center for Population Health and Health Disparities (CPHHD), along with the other centers, shares an overall goal: to support cutting-edge research to understand and reduce differences in health outcomes, access, and care.  There are currently seven studies derived for a variety of substantive areas including: Cost-of-Living, Disability, Pollution, Segregation Indices, Street Connectivity, Index of Neighborhood Socioeconomic Status, and Census.

RWJF Health and Medical Care Archive

HMCA preserves and disseminates data collected by selected research projects funded by the Foundation and facilitates secondary analyses of the data. Our goal is to increase understanding of health and health care in the United States through secondary analysis of RWJF-supported data collections. This includes the subject areas: health care providers, cost / access to health care, substance abuse and health, chronic health conditions, and other health related studies.

U.S. Health Data Warehouse

The US Health Data Warehouse is dedicated to making high value health data more accessible to entrepreneurs, researchers and policy makers in the hopes of better health outcomes for all.  Data is available from the following governmental entities: Administration for Children and Families, Administration for Community Living, Agency for Healthcare Research and Quality, Assistant Secretary for Planning and Evaluation, Centers for Disease Control and Prevention, Centers for Medicare and Medicaid Services, Department of Health and Human Services, Health Resources and Services Administration, Indian Health Service, National Institutes of Health, Office of Inspector General, Office of the National Coordinator for Health Information Technology, Substance Abuse and Mental Health Services Administration, and U.S . Food and Drug Administration.

Environmental-Risk (E-Risk) Longitudinal Twin Study

The E-Risk study aims to build knowledge about children’s disruptive behaviors such as oppositional, conduct, hyperactivity and inattentive behavior.  It addresses: 1) which specific environmental risk factors contribute to the early emergence of disruptive behavior, 2) whether environmental risk factors interact with genetic risk to influence disruptive behavior, 3) whether and how child-specific parenting experiences explain differences in behavioral outcomes between children in the same family and, 4) how the effects of risk are mediated through children’s neuropsychological executive functions, social-information processing, and verbal skills. 

Catalist Voting Data

The Catalist dataset draws on voter registration files from all fifty states, supplemented by information from commercial sources, to provide unprecedented information on approximately 265 million individuals within the United States. The dataset contains roughly 700 different variables. The level of information varies by state, but typically includes an individual's political party registration, turnout history, age, gender, and race as well as information from government and commercial sources. Duke now has a subscription granting researchers access to a 1% national sample of the dataset. This data promises to be valuable for researchers in disciplines such as political science, business, and sociology. 

Dataquick Housing Data

Through an agreement with Dataquick Inc., researchers at Duke University have access to a vast quantity of data describing housing transactions from throughout the United States. The data include information about transaction dates, prices, loan amounts, buyers', sellers' and lenders' names, and housing characteristics such as (but not limited to) exact street address, square footage, year built, lot size, number of bedrooms, number of bathrooms, and number of total rooms. Data typically cover the period from the mid-to-late 1990's to 2012, but go back further than that for many counties. 

National Longitudinal Surveys

The NLSY97 consists of a nationally representative sample of approximately 9,000 youths who were 12 to 16 years old as of December 31, 1996. Round 1 of the survey took place in 1997. In that round, both the eligible youth and one of that youth's parents received hour-long personal interviews.

The NLSY79 is a nationally representative sample of 12,686 young men and women who were 14-22 years old when they were first surveyed in 1979. These individuals were interviewed annually through 1994 and are currently interviewed on a biennial basis.

National Longitudinal Study of Adolescent to Adult Health (Add Health)

Add Health is a longitudinal study of a nationally representative sample of adolescents in grades 7-12 in the United States during the 1994-95 school year. The Add Health cohort has been followed into young adulthood with four in-home interviews, the most recent in 2008, when the sample was aged 24-32*. Add Health combines longitudinal survey data on respondents’ social, economic, psychological and physical well-being with contextual data on the family, neighborhood, community, school, friendships, peer groups, and romantic relationships, providing unique opportunities to study how social environments and behaviors in adolescence are linked to health and achievement outcomes in young adulthood. The fourth wave of interviews expanded the collection of biological data in Add Health to understand the social, behavioral, and biological linkages in health trajectories as the Add Health cohort ages through adulthood.

NC Education Data Center (NCERDC)

This ongoing project was established in 2000 through a partnership with the N.C. Department of Public Instruction to store and manage data on the state’s public schools, students and teachers. The data, which include information dating back to the mid-1990s, are available to university researchers and nonprofit research institutions and government agencies. The North Carolina Education Research Data Center is a unique portal to an immense store of data from the North Carolina Department of Public Instruction (DPI) and the National Center for Education Statistics (NCES).

Triangle Research Data Center (TRDC)

The TRDC is a partnership between the Center for Economic Studies at the U.S. Census Bureau and Duke University, in cooperation with the University of North Carolina at Chapel Hill (UNC), North Carolina State University (NCSU), and RTI International (RTI). Under this partnership, the Census Bureau allows researchers with approved proposals to perform statistical analysis on non-public microdata from the Census Bureau's economic and demographic censuses and surveys.

Database of Genotypes and Phenotypes (dbGaP)

dbGaP was developed to archive and distribute the results of studies that have investigated the interaction of genotype and phenotype. Such studies include genome-wide association studies, medical sequencing, molecular diagnostic assays, as well as association between genotype and non-clinical traits. The advent of high-throughput, cost-effective methods for genotyping and sequencing has provided powerful tools that allow for the generation of the massive amount of genotypic data required to make these analyses possible.

dbGaP provides two levels of access - open and controlled - in order to allow broad release of non-sensitive data, while providing oversight and investigator accountability for sensitive data sets involving personal health information. Summaries of studies and the contents of measured variables as well as original study document text are generally available to the public, while access to individual-level data including phenotypic data tables and genotypes require varying levels of authorization. More complete descriptions of the dbGaP system are available in Pub Med Central and the NCBI Bookshelf.

Inter-university Consortium for Political and Social Research (ICPSR)

ICPSR maintains a data archive of more than 500,000 files of research in the social sciences. It hosts 16 specialized collections of data in education, aging, criminal justice, substance abuse, terrorism, and other fields. 

Data Expeditions within iiD may have research data sets available.

The Energy Data Analytics Lab may have research data sets available.

Duke SSRI

Durham, NC 27708 | 919.681.6019

SSRI

Gross Hall, 2nd Floor
140 Science Drive
Durham, NC 27708

SSRI

Erwin Mill Building
2024 W. Main St.
Durham, NC 27705

Social Media