Some Humility About Transparency

I am a huge fan of transparency about platform content moderation. I’ve considered it a top policy priority for years, and written about it in detail (with Paddy Leerssen, who also wrote this great piece about recommendation algorithms and transparency). I sincerely believe that without it, we are unlikely to correctly diagnose current problems or arrive at wise legal solutions.

So it pains me to admit that I don’t really know what “transparency” I’m asking for. I don’t think many other people do, either. Researchers and public interest advocates around the world can agree that more transparency is better. But, aside from people with very particular areas of interest (like political advertising), almost no one has a clear wish list. What information is really important? What information is merely nice to have? What are the trade-offs involved?

That imprecision is about to become a problem, though it’s a good kind of problem to have. A moment of real political opportunity is at hand. Lawmakers in the US, Europe, and elsewhere are ready to make some form of transparency mandatory. Whatever specific legal requirements they create will have huge consequences. The data, content, or explanations they require platforms to produce will shape our future understanding of platform operations, and our ability to respond – as consumers, as advocates, or as democracies. Whatever disclosures the laws don’t require, may never happen.

It’s easy to respond to this by saying “platforms should track all the possible data, we’ll see what’s useful later!” Some version of this approach might be justified for the very biggest “gatekeeper” or “systemically important” platforms. Of course, making Facebook or Google save all that data would be somewhat ironic, given the trouble they’ve landed in by storing similar not-clearly-needed data about their users in the past. (And the more detailed data we store about particular takedowns, the likelier it is to be personally identifiable.)

For any platform, though, we should recognize that the new practices required for transparency reporting come at a cost. That cost might include driving platforms to adopt simpler, blunter content rules in their Terms of Service. That would reduce their expenses in classifying or explaining decisions, but presumably lead to overly broad or narrow content prohibitions. It might raise the cost of adding “social features” like user comments enough that some online businesses, like retailers or news sites, just give up on them. That would reduce some forms of innovation, and eliminate useful information for Internet users. For small and midsized platforms, transparency obligations (like other expenses related to content moderation) might add yet another reason to give up on competing with today’s giants, and accept an acquisition offer from an incumbent that already has moderation and transparency tools. Highly prescriptive transparency obligations might also drive de facto standardization and homogeneity in platform rules, moderation practices, and features.

None of these costs provides a reason to give up on transparency – or even to greatly reduce our expectations. But all of them are reasons to be thoughtful about what we ask for. It would be helpful if we could better quantify these costs, or get a handle on what transparency reporting is easier and harder to do in practice.

I’ve made a (very in the weeds) list of operational questions about transparency reporting, to illustrate some issues that are likely to arise in practice. I think detailed examples like these are helpful in thinking through both which kinds of data matter most, and how much precision we need within particular categories. For example, I personally want to know with great precision how many government orders a platform received, how it responded, and whether any orders led to later judicial review. But to me it seems OK to allow some margin of error for platforms that don’t have standardized tracking and queuing tools, and that as a result might modestly mis-count TOS takedowns (either by absolute numbers or percent).

I’ll list that and some other recommendations below. But these “recommendations” are very tentative. I don’t know enough to have a really clear set of preferences yet. There are things I wish I could learn from technologists, activists, and researchers first. The venues where those conversations would ordinarily happen -- and, importantly, where observers from very different backgrounds and perspectives could have compared the issues they see, and the data they most want -- have been sadly reduced for the past year.

So here is my very preliminary list:

Transparency mandates should be flexible enough to accommodate widely varying platform practices and policies. Any de facto push toward standardization should be limited to the very most essential data.
The most important categories of data are probably the main ones listed in the DSA: number of takedowns, number of appeals, number of successful appeals. But as my list demonstrates, those all can become complicated in practice.
It’s worth taking the time to get legal transparency mandates right. That may mean delegating exact transparency rules to regulatory agencies in some countries, or conducting studies prior to lawmaking in others.
Once rules are set, lawmakers should be very reluctant to move the goalposts. If a platform (especially a smaller one) invests in rebuilding its content moderation tools to track certain categories of data, it should not have to overhaul those tools soon because of changed legal requirements.
We should insist on precise data in some cases, and tolerate more imprecision in others (based on the importance of the issue, platform capacity, etc.). And we should take the time to figure out which is which.
Numbers aren’t everything. Aggregate data in transparency reports ultimately just tell us what platforms themselves think is going on. To understand what mistakes they make, or what biases they may exhibit, independent researchers need to see the actual content involved in takedown decisions. (This in turn raises a slough of issues about storing potentially unlawful content, user privacy and data protection, and more.)

It’s time to prioritize. Researchers and civil society should assume we are operating with a limited transparency “budget,” which we must spend wisely – asking for the information we can best put to use, and factoring in the cost. We need better understanding of both research needs and platform capabilities to do this cost-benefit analysis well. I hope that the window of political opportunity does not close before we manage to do that.