Big Data for All: Privacy and User Control in the Age of Analytics

Much has been written over the past couple of years about “big data” (See, for example, here and here and here). In a new article, Big Data for All: Privacy and User Control in the Age of Analytics, which will be published in the Northwestern Journal of Technology and Intellectual Property, Jules Polonetsky and I try to reconcile the inherent tension between big data business models and individual privacy rights. We argue that going forward, organizations should provide individuals with practical, easy to use access to their information, so they can become active participants in the data economy. In addition, organizations should be required to be transparent about the decisional criteria underlying their data processing activities.

The term “big data” refers to advances in data mining and the massive increase in computing power and data storage capacity, which have expanded by orders of magnitude the scope of information available for organizations. Data are now available for analysis in raw form, escaping the confines of structured databases and enhancing researchers’ abilities to identify correlations and conceive of new, unanticipated uses for existing information. In addition, the increasing number of people, devices, and sensors that are now connected by digital networks has revolutionized the ability to generate, communicate, share, and access data.

Data creates enormous value for the world economy, driving innovation, productivity, efficiency and growth. In the article, we flesh out some compelling use cases for big data analysis. Consider, for example, a group of medical researchers who were able to parse out a harmful side effect of a combination of medications, which were used daily by millions of Americans, by analyzing massive amounts of online search queries. Or scientists who analyze mobile phone communications to better understand the needs of people who live in settlements or slums in developing countries.

At the same time, the “data deluge” presents formidable privacy concerns. Protecting privacy become harder as information is multiplied and shared ever more widely among multiple parties around the world. As more information regarding individuals’ health, financials, location, electricity use and online activity percolates, concerns arise about profiling, tracking, discrimination, exclusion, government surveillance and loss of control. From a more technical legal angle, big data challenges some of the most fundamental concepts of privacy law, including the definition of “personally identifiable information”, the role of individual control, and the principles of data minimization and purpose limitation.

In our article, we make the case for providing individuals with usable access to their data. The call for transparency is not new, of course. Rather the emphasis is on access to data in usable format, which can work to create value to individuals. Transparency and access alone have not emerged as potent tools because individuals do not care for, and cannot afford to indulge in transparency and access for their own sake (see one oft-cited counterexample here). The enabler of transparency and access is the ability to use the information and benefit from it in a tangible way. This will be achieved through “featurization” or “app-ification” of privacy. Organizations should build as many dials and levers as needed for individuals to engage with their data.

We expect that “featurization” of big data, harnessing its immense force for not only organizational but also individual benefit, will unleash a wave of innovation and create a market for personal data applications. The technological groundwork has already been completed with mash-ups and real-time APIs making it easier for organizations to combine information from different sources and services into a single user experience. Regardless of lingering questions concerning who – if anyone – “owns” the information, we think that fairness dictates that individuals enjoy beneficial use of the data about them.

Our second proposal would require organizations to disclose the decisional criteria underpinning their data analytics machinery. In a big data world, it is often not the data but rather the inferences drawn from them that give cause for concern. Inaccurate, manipulative or discriminatory conclusions may be drawn from perfectly innocuous, accurate data. Much like in quantum physics, the observer in big data analysis can affect the results of her research by defining the data set, proposing a hypothesis or writing an algorithm. At the end of the day, big data analysis is an interpretative process, in which one’s identity and perspective informs one’s results. Like any interpretative process, it is subject to error, inaccuracy and bias. Louis Brandeis, who together with Samuel Warren “invented” the legal right to privacy in 1890, has also written that “[s]unlight is said to be the best of disinfectants”. We trust if the existence and uses of databases were visible to the public, organizations would be more likely to avoid unethical or socially unacceptable uses of data.

(Re-posted with permission from Concurring Opinions blog. See original post here).