Ira Rubinstein & Woodrow Hartzog, Anonymization and Risk, 91 Wash. L. Rev. (forthcoming 2016), available on SSRN.
In the current Age of Big Data, companies are constantly striving to figure out how to better use data at their disposal. And it seems that the only thing better than big data is more data. However, the data used is often personal in nature and thus linked to specific individuals and their personal details, traits, or preferences. In such cases, sharing and use of the data conflict with privacy laws and interests. A popular remedy applied to sidestep privacy-based concerns is to render the data no longer “private” by anonymizing it. Anonymization is achieved through a variety of statistical measures. Anonymized data, so it seems, can be sold, shared with researchers, or even possibly released to the general public.
Yet, the Age of Big Data has turned anonymization into a difficult task, as the risk of re-identification seems to be constantly looming. Re-identification is achieved by “attacking” the anonymous dataset, aided by the existence of vast datasets (or “auxiliary information”) from various other sources available to the potential attacker. It is, therefore, difficult to establish whether anonymization was achieved, whether privacy laws pertain to the dataset at hand, and if so, how. In a recent paper, Ira Rubinstein and Woodrow Hartzog examine this issue’s pressing policy and legal aspects. The paper does an excellent job in summarizing the way that the current academic debate in this field is unfolding. It describes recent failed and successful re-identification attempts and provides the reader with a crash course on the complicated statistical methods of de-identification and re-identification. Beyond that, it provides both theoretical insights and a clear roadmap for confronting challenges to properly releasing data.
Read the full post on Cyberlaw Jotwell.
- Date Published:11/30/2015
- Original Publication:Cyberlaw Jotwell