Empirical Evidence of “Over-Removal” by Internet Companies under Intermediary Liability Laws

The "Over-Removal" Issue

Most intermediaries offer legal “Notice and Takedown” systems – tools for people to alert the company if user-generated content violates the law, and for the company to remove that content if necessary.  Twitter does this for tweets, Facebook for posts, YouTube for videos, Google for search results, local news sites for user comments, etc.  National law varies in terms of what content must be removed, but some version of Notice and Takedown exists in every major market.  Companies receive a remarkable mix of requests – from those identifying serious and urgent problems, to those attempting to game the Notice and Takedown system as a means to silence speech they disagree with, to those stating wildly imaginative claims under nonexistent laws.

What do companies do with these removal requests?  Many of the larger companies make a real effort to identify bad faith or erroneous requests, in order to avoid removing legal user content.  (I worked on removals issues for Google for years, and can attest to the level of effort there.)  But mistakes are inevitable given the sheer volume of requests – and the fact that tech companies simply don’t know the context and underlying facts for most real-world disputes that surface as removal requests.

And of course, the easiest, cheapest, and most risk-avoidant path for any technical intermediary is simply to process a removal request and not question its validity.  A company that takes an “if in doubt, take it down” approach to requests may simply be a rational economic actor.  Small companies without the budget to hire lawyers, or those operating in legal systems with unclear protections, may be particularly likely to take this route.

Much of the publically available information about over-removal by intermediaries is anecdotal.  But empirical evidence of over-removal – through error or otherwise – keeps trickling in from academic studies.  This data is important to help policy-makers understand what intermediary liability rules work best to protect the free expression rights of Internet users, as well of the rights of people with valid claims to removal.  This post lists the studies I have seen. 

These studies were mostly conducted by academics or advocates with a particular interest in protecting user free expression and ensuring that legal content remains available online.  One day I hope we will see more data from the other side – advocates for rightsholders, defamation plaintiffs, or other groups harmed by online content that violates their legal rights.  That could help build a more complete picture of the over-removal issue as well as any related under-removal problem – intermediaries failing to remove content when notified, even though applicable law requires removal.   

 

The Studies

  • Urban et al's new 2016 research, "Notice and Takedown in Everyday Practice.": This report is a treasure trove of qualitative and quantitative info on DMCA operations. A key finding is the divergence, documented and quantified in the study, between "classic" DMCA practice and new tools like robonotices and ContentID. The new tools are used by major players and dominate public discussion, but manual DMCA processing by small rightsholders and OSPs didn't go away. The report is long and well worth reading, my summary of key findings is here
  • Jennifer Urban and Laura Quilter’s 2006 review of copyright-based removals from Google’s services under the US Digital Millennium Copyright Act (DMCA):  Relying on information released to the Chilling Effects (now called Lumen) database by the company about processed removals (i.e. the ones where the company agreed to remove, not the ones it declined), the authors found that 55% of notices involved disputes between competitors, and 31% presented significant issues regarding the validity of the copyright infringement claim.  (Daniel Seng’s more recent work with a similar but much larger data set has great detailed statistics on DMCA removal trends, but his published conclusions do not include analysis of the validity of the claims processed.)
  • The 2004 Brennan Center study on removals and free expression:  Reviewing a data set of 320 copyright and trademark-based removal requests, the authors concluded that 47% stated weak claims or involved speech with important fair use or free expression legal defenses.
  • Rishabh Dara’s detailed experiment and study of over-removals by Indian intermediaries: Dara submitted increasingly unreasonable removal requests to various intermediaries, and carefully documented the responses.  His results show considerable over-removal, including removal based on clearly invalid legal claims and removal of content not targeted by the requests.
  • The 2004 Bits of Freedom study of Dutch ISPs:  The group created accounts with ten Dutch ISPs and used them to post copies of a famous, public domain, 19th century political essay.  It then used different contact information to send copyright “infringement notices” to the ISPs, under Dutch law implementing the eCommerce Directive.  Of the ten ISPs, seven removed the content despite its age and public domain status.
  • Oxford Program in Comparative Media Law and Policy’s smaller experiment with UK and US ISPs: Researchers posted John Stuart Mill’s 1859 discussion of media freedom from “On Liberty” – which is in the public domain.  They then used different accounts to request its removal via UK and US ISPs.  The UK ISP removed the essay without question, while the US ISP responded by requiring the requester to comply with the more formal requirements of the US DMCA, including “good faith belief” and “penalty of perjury” attestations.
  • Company transparency reports: Transparency reports from Twitter, Google, Yahoo, Facebook, Microsoft and others offer some data about removal requests. The data is valuable for other purposes, but usually not great for sussing out the validity or even volume of complaints. Most reports list only requests from government sources, which represent a small minority of legally-based content removals. Some show the overall percentage of requests accepted and rejected; or include anecdotal examples.  In some cases, particularly for Google’s “Right to Be Forgotten” (RTBF) removals, this data is supplemented by news reports. For example, coverage last summer suggested that most RTBF claims come from non-public figure requestors. It is also reported that relatively few requesters take their claims to Data Protection Agencies when Google rejects their removal requests; and that when they do the Agencies often agree with Google's decision. 
  • The Lumen database and other research drawing on it: Lumen, formerly known as Chilling Effects, maintains a remarkable database containing millions of removal requests made public by companies and other contributors.  Significant academic research has been carried out using the database, much of it relevant for over-removal questions.  An overview of this literature as of 2010 is in Chilling Effects amicus brief from Perfect 10 v. Google.
  • Judith Townsend's research on removals by journalists and bloggers: This survey concerns removals by publishers rather than intermediaries, but contains interesting data on the frequency of requests and compliance, as well as availability of legal counsel to those receiving removal requests.

More studies and data sources surely exist, or will exist.  I have heard in particular of one from Pakistan, but have not found it so far.  If you know of other sources, please let me know or post them in the Comments section so this page can become a more useful resource for people seeking this information.

NOTE: I periodically update this post.  Last update was April 21, 2016. 

Comments

Add new comment