There Is No Such Thing as “Public” Data

Publication Type: 
Other Writing
Publication Date: 
May 19, 2016

Are you an OkCupid user? Would you consider the data on your profile public—fair game for anyone to download and share with the rest of the world?

That’s the argument made by a group of Danish researchers who released a data seton nearly 70,000 users of the popular dating website. The researchers used an automated tool called a “scraper” that captures parts of a webpage—a possible violation of the website’s terms of use. These users had answered questions on intimate topics like drug use and sexual preferences. The researchers took no steps to deidentify the data set when they released it, despite it being possible to reidentify many of the profiles. When the researchers were called out about this lapse on Twitter, one of them shrugged it off with the flip statement “Data is already public.”

I hear arguments like this all the time. Websites that post mug shot photos to shamepeople say they’re just using public records. Harassers who take “upskirt” photos of women say they are blameless because their activities occurred “in public.” Police say they are free to use powerful technologies to surveil anyone for as long as they like as long as they are “in public.”

It’s time to abandon the misguided notion that public information is fair game.

This justification is fundamentally wrong. Not just because we should be able to expect a certain amount of privacy in public, but because, despite frequency of use and seeming self-evidence, we actually don’t even know what the term public even means. It has no set definition in privacy law or policy. I often ask people to define the term for me. Common responses include “where anyone can see you” or “government records.” But by far the most common response I get is “not private.” Fair enough. But thinking of publicness this way only leads us to the equally difficult question of defining privacy.

Read the full piece at Slate.