Stanford CIS

Client-Side Scanning and Winnie-the-Pooh Redux (Plus Some Thoughts on Zoom)

By Riana Pfefferkorn on

Return to Pooh Corner

Seven months ago, I published this blog post about the idea of fighting child sexual abuse material (CSAM) in an end-to-end encrypted world through what's called "client-side scanning." Client-side scanning is a proposal for retaining the ability to detect CSAM on messaging systems where the messages are end-to-end encrypted. With E2EE, the provider of the app (e.g. Facebook or Signal) cannot intelligibly read the text or images being transmitted; only the "ends" of the conversation (the people communicating with each other) can do so. If what's being sent is CSAM, E2EE makes that harder for the service provider to detect. How to keep finding that material as more and more communications become E2EE? One proposal is the client-side scanning idea: before the image is transmitted, while it's still on the sender's device, the app would take the image's hash value and compare it to a database of hash values of known CSAM; if there's a match, that triggers a report to the provider for review.

My critique of this idea, last October, was: "There is no way in hell that Facebook or anyone else could introduce content moderation for end-to-end encrypted messaging without it inevitably sliding into abuse. It would start with CSAM, but it would not stop there. The predictable result is surveillance and censorship, a chill on privacy and free speech." As an example, I pointed to Chinese app WeChat's prohibition on Chinese users from sending each other phrases and images the Chinese government has deemed verboten, such as "Tiananmen Square" and pictures of Winnie-the-Pooh. (WeChat is not E2EE.) Those messages are interdicted and censored; they're never received by the intended recipient. I speculated that China would love an E2EE messaging system that incorporated client-side scanning and reporting, as it would further enable China's clampdown on speech.

Well, the venerable Citizen Lab in Toronto has just published a report about its research on WeChat, which discovered that WeChat is surveilling the messages sent by users outside China to train its censorship system for users inside China. (From the report, it sounds like this scanning happens server-side, not on users' devices.) As Zack Whittaker summarized it in his excellent weekly security newsletter, "Using file hash surveillance, the messaging app takes a digital signature from files or photos it thinks are politically sensitive inside China and blocks those messages from going through to Chinese users." Millions of people who aren't living directly under China's thumb are unwittingly aiding in Chinese repression. "China is much closer than we think," I wrote last fall. I was more right than I knew.

In that blog post, I wrote, "it is staggeringly naive to believe that, even in the United States of America, client-side pre-encryption 'content moderation' would stop at CSAM." Now, thanks to Citizen Lab's tireless work, a real-world non-E2EE app that does (what sounds like) server-side scanning (i.e., WeChat) has provided a clear illustration of the risks of building client-side scanning into a hypothetical E2EE chat app. China's example shows how a client-side scanning system originally built only for CSAM could and would be suborned for censorship and political persecution -- potentially without recognizing any country borders. Even if Country A (say, the U.S.) never demanded the system be used for anything besides CSAM, once the system was built for that CSAM purpose, Country B (China, etc.) could demand that the system be repurposed to scan for speech and images that Country B's government doesn't like. (Thanks to WeChat censorship, China already has a hash database of such content teed up and ready to go.) And, as with WeChat, the repressive government of Country B might even induce the provider to train the system on the communications of users in Country A, even if Country A really did limit the client-side scanning system only to CSAM.

If the app provider refused to do client-side scanning for politically verboten content for repressive Country B, Country B's government would have multiple levers it could pull on to induce compliance. If the provider has assets in-country, the government (China in this example) could arrest and imprison in-country employees or seize in-country servers and other property. If there's nobody and nothing in-country to seize, the government could still pressure the provider economically by cutting the provider off from the country's market until the provider complied.

Those levers aren't just theoretical things that could hypothetically apply to imaginary client-side scanning systems, of course. They're already regularly deployed when U.S.-based companies refuse to (or simply can't) comply with governments' demands for surveillance and censorship, as when Google and Facebook executives have been arrested in Brazil or when Twitter gets blocked for everyone in an entire country (a phenomenon so frequent that it has its own Wikipedia page).

WWZD? (What Would Zoom Do?)

Which leads me to Zoom. These tried-and-true levers for pressuring tech companies are the reason why it was both heartening and a bit worrisome to see the promises made by video communications platform Zoom last week in a company blog post. In the post, Zoom vowed that "Zoom has not and will not build a mechanism to decrypt live meetings for lawful intercept purposes," will not secretly insert ghost users into conversations, and "will not build any cryptographic backdoors to allow for the secret monitoring of meetings." I was pretty happy to see these promises, which come amidst a flurry of activity at Zoom to respond to the many security and privacy flaws in the service (and just plain misleading statements about end-to-end encrypting Zoom calls) that have been identified since (and before) the COVID-19 pandemic turned Zoom into the de-facto replacement for face-to-face interactions.

Here's the thing: the majority of Zoom's engineers are located in China; as of last January, the company had over 500 R&D employees there. (According to its IPO prospectus from last year, it has a data center there, too -- and in Australia, Brazil, and India, all countries whose governments have become increasingly hostile to encryption.) Zoom's China-centric employment roster has raised suspicions about whether Zoom can be trusted with Americans' sensitive business or government information. Last week's blog post (which was mostly about announcing Zoom's acquisition of Keybase) seems designed in part to allay those suspicions. However, we don't know what Zoom's plan is for protecting its employees (and other company assets) in China if/when the Chinese government does lean on Zoom to build exactly what it has just vowed it will never build.

Sure, that's not something you'd publish the details of in a blog post. But does Zoom have a plan? Security protocols for protecting employees and property in non-U.S. countries have for years been standard for companies with significant international operations, such as Google, Facebook, and Twitter. So you might assume Zoom has one too. But then, responding to abuse on the platform has also long been a standard part of the business at Google, Facebook, and Twitter, and yet Zoom's CEO admitted the company simply never thought much about abuse until its user base suddenly ballooned two months ago, leaving the company scrambling to respond to harassment such as "Zoombombing."

If Zoom was caught flat-footed by abuse, is it similarly unprepared for the possible arrest of its employees and/or seizure of its servers in China (or Brazil, etc.) in the event it refuses to obey government demands to secretly sit in on your Zoom calls? The IPO prospectus is written in the wordy, circumspect legal-ese typical of such documents, but if you read between the lines and translate from securities lawyer-speak into normal-people language, the prospectus suggests that Zoom has indeed considered the threat the Chinese government poses to Zoom's Chinese employees and assets.

The prospectus acknowledges that colocating data centers around the world poses the risk of an "interruption in service" or outright closure of a data center due to a variety of factors (though it doesn't expressly name "dawn raid by the police" among them). Likewise, it recognizes that operating internationally poses risks to the business "if we are not able to satisfy certain government- and industry-specific requirements," "including laws and regulations with respect to privacy, telecommunications requirements, [and] data protection, [...] and the risk of penalties to us and individual members of management or employees if our practices are deemed to be out of compliance." Plus, it contemplates the possibility that the company might need "to relocate our product development team from China to another jurisdiction," hinting that the company will not rule out abandoning China if it has to.

So what happens when the rubber meets the road? If the Chinese government leans on Zoom, will the company bend on last week's promises not to build in lawful-intercept, ghost user, or backdoor mechanisms? What if it's not China, but rather, India or Australia, where Zoom also has data centers? The wording of Zoom's promise sounded pretty unequivocal. But then, it's hard to trust what the company says; it has to rebuild lost public trust after its misleading statements about calls' being "end-to-end encrypted" when in fact they weren't. (And you'll note those promises are only about live meetings; the post makes no promises about whether or not Zoom hands over to law enforcement the stored recordings of meetings after they've concluded.)

There's a sense in which this kind of suspicion and paranoia actually plays right into the hands of repressive regimes. If Zoom abides by its promise and doesn't covertly enable police access, but people believe it does, they may switch to a service that actually allows government access to their communications -- whether intentionally (albeit without telling users so) or unwittingly (because the regime has been able to hack it without the service provider's knowledge). Confusion and fear are fruitful feelings for such regimes to exploit.

It is dangerous for people to mistakenly believe they have more security than they actually do. That's why it was wrong for Zoom to misrepresent that video calls were E2EE. It's why it's messed-up for WeChat to surveil users outside of China, who surely didn't suspect they were exacerbating Chinese state censorship. But it's also problematic for people to believe a service offers less security than it actually does. Yet that's what people may suspect of Zoom going forward, despite last week's bold promises (and its ongoing real, sincere efforts to clean up its act), thanks in part to the loss of trust its past misrepresentations and privacy/security flubs engendered.

And both of these are why proposals for client-side scanning of messages on E2EE platforms are so insidious. If implemented, such systems would give users less security and privacy than they think they are getting when they choose to end-to-end encrypt their conversations. But when the very idea of client-side scanning -- for CSAM or any other purpose -- is put forward and talked about seriously, among democratic government representatives, academics, technologists, and industry actors, it normalizes the notion that we cannot trust our electronic communications platforms. That leads to the chilling effect of self-censorship I discussed last October. And that plays right into the hands of repressive regimes.

In that October post, I acknowledged that maybe everyone would think I was just making a slippery-slope argument that could easily be dismissed as hyperbole. But the publication of Citizen Lab's WeChat report provides me with some vindication, thank you very much. Client-side scanning proposals must not be treated as though they are serious ideas worthy of serious consideration. The ratchet of surveillance still goes only one way. You don't have to help crank it.