There are two sets of barriers that currently limit the ability of scholars across the social sciences to study the socioeconomic impacts of climate change. The first is the absence of high-quality, harmonized and georeferenced survey data. Of the six major barometers in political science, for example, only one, the Afrobarometer, includes coordinates that precisely locate respondents. The second set of barriers is primarily computational: researchers must generate valid estimates of climate exposures for tens or hundreds of thousands of locations for up to 30 years. This requires both facility with extremely large datasets (upwards of 150GB per product) and modern geospatial computing. These constraints are especially strong for scholars who lack access to cloud computing and the necessary background in climate econometrics.
Our goal with this project is to solve both sets of challenges simultaneously by building two databases — one containing survey data, the other climate data — that work seamlessly together, allowing researchers to quickly find pre-harmonized survey data for their area of study and generate corresponding exposure measures without doing any user-side computation of any kind. A prototype version of the survey database, made possible by a previous King Seed grant, currently contains more than 1.1 million survey responses from 84 countries spanning nearly 30 years. To build out this database, we have developed a set of tools that leverages advances in natural language processing to provide unsupervised algorithmic solutions to the problem of geolocating, harmonizing and thematically grouping survey responses.
Along with documentation that gives practical, accessible guidance to beginner users in the selection of competing climate products and the application of new approaches to dealing with measurement error bias in remotely sensed data, we ntend to publish all tools used in the development of the databases as companion R packages, complete with an R-based API lookup that will allow users to query, download and save data directly from a console. Our goal is to build not just a single, static product but an ecosystem of connected products, complete with an online dashboard and a regularly updated list of survey locations, administrative units, and climate products.