Virgin Media to Monitor Copyright Infringement

truthliesandbroadbandLate last week The Register reported that Virgin Media is going to be trialling Detica’s Deep Packet Inspection (DPI) appliances to measure the levels of copyright-infringing file sharing that is occurring along Virgin Media’s networks. It’s important to note a few things right up front:

  1. I have a request in to the company manufacturing these appliances, Detica, and have been promised responses to my questions. In light of this, I’m not accusing Detica or Virgin Media of engaging in any ‘privacy invasive’ uses of DPI, at least not at the moment.
  2. The information that I’ll drawing on is, largely, from a consultation paper that Detica presented in late September of 2009.
  3. This post is largely meant as a ‘let’s calm down, and wait to hear about the technology’s details’ before suggesting that a massive campaign be mounted against what might be a relatively innocuous surveillance technology.

With that stated…

Detica describes themselves as a “business and technology consultancy specialising in helping clients collect, manage and exploit information to reveal actionable intelligence. As the digital revolution causes massive amounts of data to converge with a new generation of threats, many of our clients see this as one of their greatest challenges.” Their CView DPI system is meant to let ISPs better identify the amount of copyright infringing work that is coursing across their networks, in an effort to give ISPs better metrics as well as to determine whether arrangements between ISPs and content providers has a significant, measurable effect on the transfer of copyright infringing files.

The consultancy piece that is provided by Detica maintains that their DPI system is meant to preserve customer privacy, though the lack of technical insight in the paper itself means that many individuals and groups are deeply concerned about the actual instantiation of privacy-protective measures (e.g. Twitter’s #virginmedia hashtag is filled with concerns and complaints). What I’ll do is outline the most relevant parts of the consultancy piece, and follow each part with the questions that were provided to Detica. I’ll also include one question set that I forgot to ask, and will be sending along once I get my response from Detica. First, what’s exactly is this DPI appliance, anyways?

CViewTM is a pre-built, secure service providing a statistically-significant sample of all illegal file sharing activity across an ISP network. CViewTM resides within the secure environment of an ISP network, operating in a “lights out” environment (i.e. without human intervention). It aggregates all of the individual “in-network” file sharing activity and performs analysis on this dataset with pre- defined statistical models to present ISPs and CPs with detailed reports of the volume and nature of the P2P activity by subscriber groups (achieved by clustering similar behaviour types).

That the technology operates without human intervention is a no-brainer; the company is selling a device that is meant to massively aggregate and analyze data traffic. If an individual human, or team of network specialists, had to watch the logs and then run their own calculations based on what the log revealed then the product would be a dud for the purposes that it’s being sold for. Something that I expect is unsettling for many, especially those concerned with the impact of imposing statistical behaviour sets on users, is that there are pre-defined statistical models that will report on users by clustering similar behaviour types. What I failed to ask in relation to this includes:

  • What are these kinds of behavioural types?
  • Can the ISP with a CView product in their network infrastructure alter how behaviour types are collated?
  • When Detica suggests (earlier in the consultancy piece) that they will be creating a ‘piracy index’, is this index meant to be a standard that is controlled by Detica, or is it fungible so that ISPs can configure it to suitably engage with their own customer base and customer habits? If the latter is true, then doesn’t this suggest that a industry standard index is immediately jeopardized?

There are a set of four principles that are ingrained with the development of the CView devices themselves. The first:

anonymous data collection — all records collected from the network have their IP addresses strongly anonymised such that no reference to an individual can be made, even in conjunction with other ISP systems. No content data is recorded (e.g. URLs).

As pertains to records being ‘strongly anonymised’, I want to know what, exactly, this entails. While Google claims to have ‘strong’ anonymization for the IP address information that they collect, they only remove the last 8 bits of the IP address in their logs. Given that this comprises the last octet only, and each octet can contain the values from 1-255, Google’s technique lets a computer user hide amongst 254 computers at most. Google’s approach is juxtaposed against, say Microsoft’s, which deletes cookies and full IP addresses along with other identifiable information after 18 months. What, exactly, is entailed in Detica’s ‘strong anonymisation’ process?

The second:

proportional to right to privacy — traffic is inspected to establish what the content is and the application being used, with no persistence of traffic data or identity information.

I presume that this means that there is simply an inspection of the content, a record or log kept concerning what is (and what isn’t?) identified, and then no efforts to store content streams offline. Is traffic inspected inline with the Virgin network, or is content being offloaded and subsequently analyzed ‘offline’? I fully expect that CView examines known protocols (which DPI appliances are generally capable of doing) but wonder what method is used to identify content. Is Detica using a file hash-based identification process or fingerprinting system? I ask because broadly identifying protocol alone would render any analysis of P2P data traffic as inherently infringing somewhat problematic, given that P2P is also used for legitimate file transfers (in Canada, our national news station, film board, and other government bodies are using P2P for the dissemination of public content, as an example), and there are substantial differences between the application of fingerprinting or hash-based systems (fingerprints might catch mash-ups, whereas hash-based targets full files). Further, when it is suspected that encrypted P2P traffic is crossing a network, does this constitute infringing traffic, non-infringing, or place a user in an entirely separate  behavioural category-type?

The third and fourth:

closed system — no traffic data or identity information is ever made available to a person. Traffic application data is produced by an entirely closed and automated “lights out” system. Appropriate hardware, software and process controls prevent intentional or accidental breaches of privacy (e.g. preventing access to the live system when data is being processed).

no feedback loop — none of the behavioural data collected can ever be attributed back to a person or drive action against an individual.

This appears to be a positive maneuver, though I would wonder how much access ISPs actually have to these devices: are they prevented from reconfiguring these devices, or offloading information to a SAN for their own analysis? Should an ISP demand it, is is even possible for these devices to disclose the traffic data or identity information of ISP subscribers and their related data traffic? How are updates performed to the device, and what would such updates comprise (e.g. would they update the protocols/files that are detected, or go so far as to modify the ‘piracy index’ as well and extend the ability to discretely associate infringing content with particular IP addresses)? Finally, given that different categories of users are being established, while the device cannot use behavioural data to target individuals, can it be used to target the groups that the device identifies?

As stated at the head of this post, I haven’t heard back from Detica, and until I do I’m refraining from decrying (or praising) this technology, in part because I’ve been aware of this kind of technology for some time: despite Detica’s suggestions, DPI manufacturers such as iPoque have included this in some of their devices for some time (also, and contrary to their consultancy piece, Canadian ISPs have been tracking P2P use for some time). Identifying and preventing the distribution of copyright infringing files, while certainly a problem for the P2P movement, would likely be read (in the UK) as complying with the recent ‘Digital Britain’ initiatives. If the technology genuinely provides some significant level of anonymity, and presuming that the ‘piracy index’ isn’t rigged in some manner (perhaps have it open-sourced?), then this could just be a manifestation of a company selling a very particular product to address a particular need for network intelligence in compliance with British Law. This isn’t something that is strongly desired by some parties – the ‘dumb pipe’ position is commonly adhered to by network neutrality advocates – but perhaps speaks to the real need to address the misconceptions about information services, and how they legally (at least in Canada and the US) differ from telephone services.

I’ll end this by noting that I’m less familiar with UK law and regulations; I don’t know RIPA in and out, and I’m not trying to justify the Detica appliance. Instead, I’m just suggesting that until more data is released that privacy advocates and network neutrality advocates alike should take a step back, take a deep breath, and wait for a little more information before letting loose the dogs of war.