“Tool Without a Handle”: 21st Century Data Privacy – A Quantum Puzzle – Part 2
In part 1 of my observations on quantum principles and privacy, I noted that policy discussions about privacy should take into account that metaphors that apply mechanical principles (i.e., data as particles) insufficiently describe the modern information economy. In this part 2, I describe a quantum principle – the familiar “uncertainty principle” – and how it applies to privacy law and policy considerations.
Quantum paradoxes in privacy principles
Here, the paradox relates not to the properties of personal data, but to the principles by which personal data should be governed. One common principle looks to data “ownership,” starting with the belief that personal data is the property of the data subject. But “ownership” is an incomplete solution, as we lack a clear understanding of data “ownership” beyond some obvious cases.[1] Solving problems through such a property rights regime is likely to prove unsatisfactory in the current technology and business environment for at least two reasons:
1) It insufficiently recognizes the property rights of parties other than the data subject. These include data compilers, analysts, and interpreters, many of whom invest capital, creativity and labor in producing worthwhile products from data. It also includes legal rights of 3d parties to access personal data – i.e., the right of a potential home buyer to see personal data associated with a registered title, land use permit and the like. It also does not account for rights of citizens in a democratic society to learn of personal data, such as financial disclosures by candidates for office. Both commerce and journalism would be much worse off with an absolutist view of data ownership by data subjects;
2) It doesn’t fully define principles that privacy professionals actually follow (or those that privacy advocates want to be sure professionals do follow). Such principles include when a data processor should provide some level of transparency and disclosure. These principles don’t turn on the ownership of the data but on a sense of fairness – integrating productive uses of data with a degree of respect for the party to whom the data pertains, or from whom it was collected. Even data that doesn’t “belong” to a customer still needs to be governed responsibly.
In that light, privacy discussions have come to consider the concept of data stewardship – the respectful management of information. “Data stewardship” is preferable to a property rights metaphor because it better integrates the interests of the variety of parties involved in contemporary data processing and analysis. That said, the process of managing data respectfully creates something of a quantum paradox, which I turn to next. In short, the paradox is a form of an “uncertainty” principle, whereby to better afford privacy for certain data, one in fact needs to know additional information about the data subject.
Heisenberg’s “uncertainty” principle
The “uncertainty principle” in quantum physics posits that there is a fundamental limit to the precision with certain qualities of a particle can be measured simultaneously. The limit derives from the fact that the very act of measuring one property affects the measurement of the other property. So, for example, sending photons at a particle to identify its position necessarily adds energy, which changes the particle’s momentum. This creates a limit on the extent to which position and momentum can be measured simultaneously. This is a fundamental quality of any quantum system.[3] The principle was introduced in the early 20th century by physicist Werner Heisenberg; other physicists later derived a formula to express this principle:
ΔpΔx≥ℏ
Where uncertainty in position (Δp) times uncertainty in momentum (Δx) are approximately equal to a known constant (here, a figure known as “Planck’s constant”). So a high degree of accuracy in position = low accuracy in momentum, and vice versa.
A privacy “uncertainty” principle
In considering privacy principles for personal data, a similar “uncertainty principle” can be found, one which is core to the concept of effective data stewardship. In short, it is that you need to know a person’s preferences in order to respect them, and to obtain a higher degree of accuracy in understanding such preferences, it is important to know more information about the person. Moreover, to respect certain preferences at all, one needs to identify the subject uniquely and associate them with a particular set of preferences (e.g., opts-out of certain marketing, prefers text messaging over email for other marketing and transactional messages, prefers “safe search” browsing mode, etc.).
In other words, it is impossible to fully know the privacy preferences of a truly anonymous person, who prefers to be neither identified uniquely nor to be associated with a given set of preferences. Similarly, true anonymity is not a technical possibility in an environment where there is interaction via networked information technologies – mobile carriers must know which tower you are connected, which number is associated to you, and which device type you are using in order to deliver voice and data to it, for example. Web service providers must know which IP address to send bits, which browser type will render them, etc. The more personal the interaction the less anonymity is possible - for example, e-commerce affords less privacy.
So there will, of necessity, always be an asymptotic relationship between anonymity and data stewardship. For data stewardship, to know that “user XYZ prefers not to have history recorded,” or “Jane Doe prefers not to receive email marketing” is to know something about user XYZ and about Jane Doe. The less a provider knows about user XYZ and Jane Doe, the less the provider can respect their privacy preferences, and vice versa. This, also, can be expressed as a formula:
Δpref Δpriv ≥ℏ
Where uncertainty in preferences for handling of personal data, times uncertainty in the overall level of privacy and anonymity, is approximately equal to a given constant. The limits of anonymity are driven by both technical and commercial factors. As noted above, networked information technologies simply cannot function without some understanding of which device seeks to connect, and some understanding of the properties of that device.
Additionally, personalized preferences simply cannot be delivered without some knowledge of those preferences, and some association of those preferences to a unique individual. That unique individual may, of course, represent a device not a person, or it may be pseudonymous, and one certainly can (and often should) create both technical and contractual barriers to re-identification. But every online actor must be, to some extent, unique, identifiable and thus trackable over time.[4]
As a commercial matter, delivering services and tracking preferences is of course not the only incentive to assign a unique identifier. Few popular websites can be completely indifferent to knowing who visits them, how often, and what is done there and survive, particularly in an advertising-supported business model (which is, of course, a more privacy-friendly model than a paid subscription one).
Even non-profit institutions have an interest in that data, and neither a non-profit nor a commercial institution can long justify the expense of maintaining a website absent at least some measurement of its value. Particularly for a business accountable to public shareholders, measuring performance of various assets and strategies is essential. Many online services simply cannot sustain themselves without advertising revenue, and every ad publisher and its clients who buy ad placements on such services have valid interests in knowing if their cash is well spent.
Many services can, of course, accomplish these tasks along with good “data stewardship” - through the use of pseudonymous identifiers and profiles, de-identified data, and brokerage arrangements where the personalization factors are known only to the entity placing the ad and never to the party whose product or service is advertised. These methods still require, though, some knowledge of what is transpiring during an interaction with an app or website and how frequently it transpires. There is no escaping the paradox.
Neither, though, is this a paradox that must be feared. In a following post, I’ll illustrate how knowledge of personal preferences and/or personal characteristics of the party interacting with a service is often rather desirable for social and policy goals – the issues of children online being one clear example.
[1]See, e.g., http://www.forbes.com/sites/techonomy/2015/06/24/data-data-everywhere-but-not-a-bit-you-own/ (Microsoft’s Horacio Gutierrez: “There are some aspects of data that [let you] clearly say, ‘this is customer data,’ and then there is data you derive from customer data and it gets complicated after that.”)
[2] Tony Hey and Patrick Walters, The New Quantum Universe (Cambridge Univ. Press 2003), p. 22.
[3] That is not to say it is technically necessary to have trackability for all time – systems can, do, and should rotate unique identifiers for a variety of reasons (dynamic IP address allocation for efficiency, for example) and also systems can and should allow users to change IDs used to identify them, or to even limit functionality in exchange for greater privacy (i.e., by blocking cookies).
It’s a good practice for technology providers to provide disclosures and explanations as to the effect of certain settings and setting changes. See, e.g., http://www.windowsphone.com/en-US/how-to/wp8/settings-and-personalization/recommended-settings-in-phone-setup.