Tracking the Trackers: To Catch a History Thief

Last week we reported some early results from the Stanford Security Lab's new web measurement platform on how advertising networks respond to opt outs and Do Not Track. This week we're back with a new discovery in the online advertising ecosystem: Epic Marketplace,1 a member of the self-regulatory Network Advertising Initiative (NAI), is history stealing.

Many thanks once again to research assistants Akshay Jagadeesh and Jovanni Hernandez.

Background

A link can be styled differently based on whether you've been to the page it points to. You may recall, for example, that in the early days of the web links you hadn't visited were blue and links you had visited were purple. History stealing is a practice that exploits link styling to learn a user's web browsing history. The approach is simple: to test whether the user has visited a link, add it to a page and check how it's styled.2

Members of the computer security community have long considered history stealing a serious privacy vulnerability. The risk goes beyond leaking individual tidbits about past browsing; history stealing can be used to track or even identify a user. Mozilla finally implemented a fix in Firefox 4, and the other major browser vendors quickly followed. According to browser usage statistics roughly half of users remain vulnerable to history stealing.

About a year ago researchers at UCSD conducted the first comprehensive study of history stealing in practice. They found that a few popular adult sites were history stealing to learn whether users had visited their competitors. The UCSD team also discovered history stealing by several advertising networks, including Interclick (another NAI member). Class action litigation is ongoing.

Technical Findings - History Stealing

While testing the JavaScript instrumentation in our new web measurement platform we stumbled across Epic Marketplace history stealing on Flixster and Charter.net. We reverse engineered the Epic Marketplace history stealing script and found a number of features:

  • The script is fast. Thousands of links are tested per second.
  • Links are added in an invisible iframe; there is no apparent effect on the page layout.
  • The script dynamically loads lists of URLs and associated interest segments using JSONP.
  • Progress is stored in a cookie so the script can resume where it left off.
  • The script sets a cookie indicating when it was last run; it will not history steal more than once every twenty-four hours.
  • If history stealing is still in progress when the window is closed (e.g. the user navigates to another page) the script sends its findings before ending execution.
  • The script slows down if a URL list takes over two seconds to process.
  • To prevent multiple history stealing attempts in parallel, the script uses a mutex cookie.
  • The script does not directly report the URLs that it detects the user has visited; it sends a deduplicated list of the interest segments associated with the visited URLs.

(For the technically inclined reader, here are an example iframe, script, and URL list.)

We also examined a series of URL lists (spreadsheet) that contain 15,511 entries. The URLs and interest segments range greatly. Some URLs are for a landing page; others are for a specific page. Some interest segments are broad; others are fine-grained. A few example segments:

Several interest segments are highly sensitive:

  • Segment 760: pages about getting pregnant and fertility, including at the Mayo Clinic
  • Segment 2640: pages about menopause, including at the NIH and the University of Maryland
  • Segment 2014: pages about repairing bad credit, including at the FTC
  • Segment 2265: pages about debt relief, including at the FTC and the IRS

 
Technical Findings - Opt Out

We applied the methodology from last week's study to examine Epic Marketplace's opt-out practices. (Epic Marketplace was one of the eleven NAI members not included in that study.) We found that Epic Marketplace leaves its tracking cookies in place after both opting out with the NAI mechanism and enabling Do Not Track. We also found that history stealing continues after using either choice mechanism.

Privacy Representations

The 2008 NAI Code of Conduct requires member companies to receive express consent from a user before collecting "Sensitive Consumer Information," defined as:

  • Social Security Numbers or other Government-issued identifiers
  • Insurance plan numbers
  • Financial account numbers
  • Information that describes the precise real-time geographic
    location of an individual derived through location-based services
    such as through GPS-enabled devices
  • Precise information about past, present, or potential future health
    or medical conditions or treatments, including genetic, genomic,
    and family medical history

(The Code of Conduct includes the unhelpful footnote, "[t]his provision is to be further developed in a distinct implementation guideline.")

The Epic Marketplace privacy policy contains the following paragraph under the headings "Information We Collect" and "Non-Personally Identifiable Information":

Epic Marketplace also automatically receives and records anonymous information that your browser sends whenever you visit a website which is part of the Epic Marketplace Network. We use log files to collect Internet protocol (IP) addresses, browser type, Internet service provider (ISP), referring/exit pages, platform type, date/time stamp, one or more cookies that may uniquely identify your browser, and responses by a web surfer to an advertisement delivered by us. This information may be stored on our systems for about one year.

The privacy policy also claims that:

Web surfers may elect not to provide non-personally identifiable information by following the cookie opt-out procedures set forth below.

As with our prior work, we leave it to the reader to assess whether Epic Marketplace is complying with its privacy representations.
 
 


Thanks to Gordon Franken for reviewing this post.

1. Epic Marketplace was, until recently, named Traffic Marketplace. It hosts its third-party content on trafficmp.com.
2. Other forms of history stealing, beyond the scope of this post, rely on page layout, background images, and user interaction.

Comments

I predict we will soon see a respons something like the following.

Hello Epic Marketplace. We are Anonymous...

Then I imagine their corp will suffer greatly.

To comment on a violation of privacy, you're sending us to some web site hosted in Libya? Really? I guess the laws of the USA don't suit your purpose.

I don't think I'm going to visit their link to get my history stolen... Shame, shame, shame.

Technology evolves much, much faster than regulation (industry or government). This is a great example of a new (albeit shady) technique of exploiting new technology which has not been broadly used, hence is not regulated. Furthermore, self regulation bodies simply lack the teeth to prevent this sort of behavior - membership in these groups is eye candy to make government regulators and noisy consumer interest groups "feel ok". Unfortunately, the only ways to prevent this is to over-regulate (which no one wants) or be 100% honest and forthcoming (there will always be a bad apple in the bunch to ruin it for all).

So whats the adbock pattern for this script? trafficmp/* ?

Oh the horror... They might display ads to me that are relevant to my interests - how dare they!

Whiner.

Damn right I'm whining lol

When I go shopping (in the "real world" of brick buildings) I don't expect to be followed around.

Why should I accept it in the internet? Followed by someone who wasn't even with me at the time I went to certain shops. It's wrong, plain and simple. If those with the power to do this want to do it, then they can ask me first if I mind them doing it. My choice, informed and clear, not sneaky and all about them and their monetising of me.

When I want something I'll go looking for it. They can stick their targeted adverts where the sun doesn't shine.

Yes, I'm a whiner in your terms. Do I care? No. I'll stuff them with whatever means I can and slow their systems down to a crawl. My little war, which I wil win, for the benefit of those who have no idea what's happening.

beinphormed... It was Phorm which made me look at what was happening.

If you use a "club" card at you supermarket they track you. In fact every rewards card track your spending/ shopping habits. Or there's Wal-Mart who reverse engineers how you walk their the store by the way you unload your cart at checkout so they can be dastardly and put products their consumers want in easy to get to places.
There's a lot of tracking going on associated with giving consumers better offers and most people see the value when they get the discounted items they're looking for. What I have actually never seen is anyone prove real harm other than to play up this drama about big bad corporate America spying on me.

How about we put together a list of domain names that they're using for tracking so that we can all add them to /etc/hosts as:

0.0.0.0 cdn1.trafficmp.com

Did you determine how much bandwidth this process chews up? In this day and age of usage caps and bandwidth limitations anything like this is seriously unwelcome.

Thank goodness for NoScript.

So, does anyone have a list of domains/ip addresses operated by epic to add to my hosts and/or adblock list?

The only way to punish these people is to makes sure no information about your computer ever reaches theirs.

Two follow up questions that it would be great to know the answer to after reading this post:
1) You mention that Mozilla published a fix for history stealing - what about other browsers? Safari? Chrome?

2) Does Ghostery block this from happening?

We all should take the URL from their tracking javascript, http://i.pixel.trafficmp.com/a/bpix? and submit garbage to it. I'm sick of the excuses, trash their dataset by submitting garbage. It's not like it's not obvious how it works.

It would be too bad if they accidentally stole little Bobby tables' history.

Spike:

Can people comment on spike's post:

"Bandwidth wastage
Comment by Spike (not verified), posted July 22, 2011 - 10:23am
Did you determine how much bandwidth this process chews up? In this day and age of usage caps and bandwidth limitations anything like this is seriously unwelcome."

How much bandwidth is used for hx sniffing? Based upon the study the url's

Well, take a look at http://cdn1.trafficmp.com/prod/ig/110701-130258_adv_0.html

This is by far the longest URL list used by any history sniffing site, and it is something like 80 kilobytes. It's hard though to add up to meaningful amounts of data using text alone ...

Are there any studies about bandwidth use for hx sniffing and how does that equate to the user's limited data plan?

The links for the iframe, script and URL list are no longer accessible because cdn1.trafficmp.com is no longer listed in the DNS servers. Neither is trafficmp.com. The domain is still owned by Epic Media Group, according to Network Solutions' WhoIs service. Neither is their main domain, theepicmediagroup.com. Have they gone out of business?

Add new comment