Reifying New Connotations of Data

This perspective is an attempt to find a way to effectively apply Machine Learning / Statistics / Analytics towards poverty reduction. It is best seen through two hypothesis: the first is GiveWell’s null hypothesis, and the second is based on a criticism of Information Communication Technology For Development (ICT4D).

GiveWell’s null hypothesis, is that charities “fall far short of what they promise donors, and are very likely failing to accomplish much of anything (or even doing harm). This doesn’t mean we think all charities are failing – just that, in the absence of strong evidence of impact, this is the appropriate starting-point assumption.” This is useful to filter for charities that have already integrated an evidence ethos, and therefor may be a useful heuristic for finding those who can leverage machine Learning / Statistics / Analytics.

The second hypothesis is that technology is only a magnifier of human intent and capacity, not an additive (or a substitute). “If you have a foundation of competent, well-intentioned people, then the appropriate technology can amplify their capacity and lead to amazing achievements. But, in circumstances with negative human intent, as in the case of corrupt government bureaucrats, or minimal capacity, as in the case of people who have been denied a basic education, no amount of technology will turn things around.” They mostly refer to ICT4D, and I don’t think data science 4 good is limited to this hypothesis, but I do think it provides support for the idea that magnifying the abilities of researchers and designers of interventions in developing countries is a high impact field.[1]

tech * (research + design) 4 good

I think it is important to point out how data science for good (DS4G) is not limited to this second hypothesis. It currently seems to me that the largest potential for future impact of DS4G is found in the space that it deviates from this hypothesis. It deviates by decoupling the magnifier hypothesis from human capacity, while keeping the magnifier effect. That is to say, it still magnifies the capacity of the data scientist, but is not limited in scale by what can be perceived.

This means that there is an opportunity for technology to have a high impact in a low capacity environment.[2] This seems possible through leveraging the scalable collection of unbiased rich data which carries societal value. This seems like the premise of premise data.

Why is this different?
This is different because it is the basis of a new connotation of data and its uses, which goes far beyond the traditional incentives of Evaluation & Monitoring (E&M). For context, below is a quote that I view as a criticism of the limitations of quantified-based solutions to social problems (I view these solutions as mostly using data for the purposes of E&M):

“Guys like you are in what we call the ‘reality-based community’ of people who believe that solutions emerge from your judicious study of discernible reality. That’s not the way the world really works anymore. We’re an empire now, and when we act, we create our own reality. And while you’re studying that reality judiciously, as you will – we’ll act again, creating other new realities, which you can study too, and that’s how things will sort out. We’re history’s actors… and you, all of you, will be left to just study what we do.” – Bush aid to journalist

However, I think the new connotations of data4development are not confined by this. As a result its impact has a state space that can be generalized beyond GiveWell’s hypothesis, while maintaining its ethos for evidence. The CEO of Jana summarizes well these new connotations:

“Data-driven development represents an opportunity to transcend observational science, enabling us not only to learn more about the underlying dynamics driving behaviour, but to be able to use these insights to design better mechanisms, better systems, better tools that can improve the lives of these billions of people generating this data and the societies in which they live. Rather than the passive, observational roles that scientists have played in these other fields, there is an opportunity to take an active role – not collecting data – but designing more appropriate interventions.”

This view is in need of being evolved.

[1] Therefor i think this provides support for the impact of a future feature in Innovation for Poverty Action’s “business model.” They currently have Principal Investigators with PhDs, usually economists, who design their studies and handle publishing. In the future, they are going to build out an “internal PI” model where they have people on staff to handle aspects of study design (i.e. data science), even if they don’t have PhDs but do have certain expertise, but this is a ways off.

Therefor, I think it would be interesting to find the data science/engineering equivalent of SurveyCTO. I’m skeptical of survey design being more than a source to augment other more scalable data sources in the social sciences, but i’m also very ignorant of the newer capabilities in the space.

[2] I do not think that this means that this kind of technology could have a high impact in a strictly low capacity environment. This is because of the pitfalls of incomplete metadata, which leads to unknown biases. This is similar to saying that it is important for datasets to be interpreted by people with local knowledge. This becomes less true with the ability to create the local context with calibrated confidence.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s