How To Solve the President’s Big Data Challenge

Publication Type: 
Other Writing
Publication Date: 
January 31, 2014

Cross-posted from Privacy

In his recent remarks on the NSA and surveillance, President Barack Obama grabbed the Big Data bull by the horns. We commend the president’s decision to task the Council of Advisors on Science and Technology (PCAST) to reach out to privacy experts, technologists and business leaders to examine the challenges inherent in Big Data. Government surveillance raises distinct civil liberties concerns that commercial and scientific use of Big Data does not; still, it is appropriate to address the profound impact of new technologies on Big Data business opportunities.

Big Data was all the rage in privacy circles in 2013, and now it is achieving appropriate broad policy attention. It implicates modern day dilemmas, which transcend privacy and impact a variety of delicate balancing acts at the core of free market democracy. The examination requires engagement not only by privacy professionals but also by ethicists, scientists and philosophers to address what may very well be the biggest public policy challenge of our time.

In his focus on Big Data, the president has recognized some of the privacy and civil liberties issues implicated by NSA surveillance could presage a wave of similar policy dilemmas. These debates will pit compelling interests such as national security, public health and safety and sustainable development against grave risks of inequality, discrimination, narrowcasting and filter bubbles. They will require us to answer the quintessential Big Data questions: whether we need to create huge data haystacks to find precious needles and at what price. 

Innovation enthusiasts argue that fundamental privacy principles such as notice and choice and purpose limitation have lost their relevance in a world of Big Data. Conversely, privacy traditionalists worry that Big Data may become an excuse to override individual rights in order to facilitate intrusive marketing or ubiquitous surveillance. We caution against disposing of sound principles that have guided privacy policy for more than 40 years. We propose a more nuanced approach, based on sophisticated benefit-and-risk analysis to determine the legitimacy and fairness of innovative data practices.  And we urge a practical application of fair information principles that accounts for modern day realities of collection and use. 

To help perform such complex calculations and weighty value choices, we support exploring Ryan Calo’s proposal for engaging ethical review boards to vet and clear Big Data projects. Such boards, which Calo calls “Consumer Subject Review Boards,” although they need not be restricted to the consumer context, will draw on the experience gained by institutional review boards (IRB) that currently safeguard research involving human subjects. They would operate in both the public and private sectors and carefully conduct cost-benefit analyses—assessing, prioritizing and, to the extent possible, quantifying Big Data projects’ rewards and associated risks. They would specialize in crafting ethical choices that account for data innovation while remaining attentive to individuals’ privacy and dignity concerns. They would employ privacy risk mitigation techniques, including newly developed methods such as Privacy by Design, accountability and data featurization. Whether, when and how to compose and operate such boards requires careful consideration, lest research and innovation become mired in formalistic bureaucracy.

Consider, for example, the decision by the UK’s National Health Service to not only establish a massive database collecting health information about the entire population but also to make such data available to researchers, insurance companies and pharmaceutical manufacturers. Supporters point out the potential breakthroughs in medical research unleashed by the data deluge as well as improvements in the performance of hospital units and healthcare bureaucracy. Privacy experts warn there will be no way for the public to work out who has their medical records or to what use their data will be put. Additionally, the new initiative could impact insurability, employability and overall equality of patients and susceptible individuals.

The tradeoff between the promise of better healthcare and risks inherent to collecting the medical history of an entire nation, digitized and stored in one place, recurs in Big Data projects in areas ranging from urban planning and disaster recovery to sustainable development and education technologies (ed tech).

In education, for example, schools increasingly deploy Big Data analytics to enhance student performance, evaluate teachers, improve education techniques, customize programs, devise financial assistance programs and better leverage scarce resources to optimize education results. Educational institutions traditionally have collected a broad variety of information as a matter of course, including demographics, grades, references, disciplinary details, financial and health information. But students now leave a larger data footprint, with schools increasingly logging granular location information, online browsing habits, biometric identifiers, dietary choices and even electricity usage in dorm rooms. Further, schools and ed tech vendors now have an opportunity to understand not just answers provided by students but also how they came about to those responses; for example, how long it took them to answer, on what portion of a question they spent the most time, what supplementary materials were accessed and more.

The benefits of ed tech abound and include enabling schools to customize programs for individual students; make education more collaborative and engaging through social media and “gamification,” and facilitate access to education for anyone with an Internet connection in remote or underprivileged areas of the world. At the same time, the confluence of enhanced data collection with highly sensitive, sometimes fateful, information about children and teens could, if misused, limit students’ future choices and further stratify society into “cans” and “cannots.”  

It is only through careful, measured analysis that we will ensure Big Data progress does not become a free-for-all data rampage. As the president said in his speech, “When you cut through the noise, what’s really at stake is how we remain true to who we are in a world that is remaking itself at dizzying speed.”