This is a synthesis of a section from the World Bank’s Big Data for Development report (2014).
It pulls examples from 5 mediums: mobile, satellite, internet, social media, and financial.
There are three recurring example types: (1) awareness (how can we efficiently access relevant information), (2) understanding (how can we create a context to gain deeper insight), (3) forecasting (how can we predict future trends or events).
use cases that are elaborated on in further detail are split into 4 parts: motivation, data generation, data interpretation, and insights for action.
As a general practice, The Global Pulse recommends that metadata describe the: (1) type of information contained in the data, (2) the observer or reporter, (3) the channel through which the data was acquired, (4) whether the data is quantitative or qualitative, and (5) the spatio-temporal granularity of the data, i.e. the level of geographic disaggregation (province, village, or household) and the interval at which data is collected. This is especially helpful when interacting with multiple datasets. With complete metadata, the analysis is more transparent regarding assumptions.
Mobile | Call Detail Records (CDRs)
A study in Afghanistan has shown that you can use CDR data to detect impacts from “micro-violence” like skirmishes and IEDs. Microviolence has clear effects on the ways people communicate and patterns of mobility and migration, similar to what you might see after a natural disaster.
One study in Afghanistan showed that CDR data could be used to detect impacts from small-scale violence, such as skirmishes and improvised explosive devices, in terms of their impacts on communication methods and patterns of mobility
Digicel’s data to track population displacement after the Haiti earthquake and modeling of infectious disease spread show great promise.
A study in the UK used mobile and census socioeconomic data to examine the connection between the diversity of social networks and socioeconomic opportunity and wellbeing, validating an assumption in network science previously untested at the population level-that greater diversity of ties provides greater access to social and economic opportunities.
Another project done by the lead researcher in the Afghanistan study was to capture seasonal and temporary migration, usually overlooked by traditional survey models, permitting a more precise quantification of its prevalence. An ongoing project which builds upon these results aims to measure precisely the extent to which wage disparities in Rwanda, Afghanistan, and Pakistan are arbitrated by migration
Research has shown that when mobile operators see airtime top-off amounts shrinking in a certain area, it tends to indicate a loss of income in the resident population. Such information might indicate increased economic distress before that data shows up in official indicators.
Satellite | Remote Sensing Images
Following the 2013 typhoon in the Philippines, Tomnod (now DigitalGlobe) took their high resolution satellite images, divided them into pieces and then shared them publicly to crowdsource identification of features of interest and enable rapid assessment of the situation on the ground: where buildings were damaged, where debris was located, and where roads were impassable.
AWhere’s “Mosquito Abatement Decision Information System (MADIS)” crunches petabytes of satellite data imagery to locate the spectral signature of water primed for breeding mosquitoes and combines it with location intelligence algorithms and models of weather and mosquito biology to identify nascent outbreaks of mosquitoes even before they hatch.
United Nations University engaged in a project using satellite rainfall data combined with qualitative data sources and agent-based modeling to understand how rainfall variability affects migration as well as food and livelihood security in South and Southeast Asia, SubSaharan Africa and Latin America
Internet | Search Queries
Google searches for “unemployment” were, found, for example, to correlate with actual unemployment data
Research has shown that trends in increasing or decreasing volumes of housing-related search queries in Google are a more accurate predictor of house sales in the next quarter than the forecasts of real estate economists.
Similar data was used to notice changes in the Swine Flu epidemic roughly two weeks before official US Centers for Disease Control and Prevention data sources reflected it
Internet | Text
Pricestats uses software to crawl the internet daily and collect prices on products from thousands of online retailers, enabling them to calculate daily inflation statistics which are used by academic partners to conduct economic research and public institutions to improve public policy decision-making and anticipate commodity shocks on vulnerable populations.
sentiment analysis can be used to find favorable/unfavorable views on a policy
One group analyzed the concept of honor in the Middle East, for example, and found how it differed by region and changed over time in response to the events of September 11th. Such analysis could inform the appropriate selection of language in, say, diplomacy or educational materials. Further applications in this regard could include, for example, developing a contextual lexicon on financial literacy in order to tailor micro-lending by region.
Social Media | Tweets
Similar to the example of analyzing search queries above, social media data such as tweets can be used as an early indicator of an unemployment hike or to evaluate crisis-related stress
Another case utilized tweets to know about a cholera outbreak in Haiti up to two weeks prior to official statistics
Both of these cases demonstrate the ability to reduce reaction time and improve process with which to deal with various crises
Financial | Credit Card Transactions
companies use purchase data to identify unusual behavior in real time and quickly address potential fraud
I find these examples useful to create mental models for what is currently being attempted in the space of big data for development.