According to the UKs draft Investigatory Powers Bill (aka the Snooper’s Charter) introduced by Theresa May on 4th November 2015, the Police and security services will have access to your web browsing data. This data will contain, we’re told, the sites we go to, but not the actual pages we visit. This recording of just the sites, not the complete browsing data is a concession to critics who thought that the bill gave the government too much power. However, it’s not quite as simple as that because the web doesn’t work in pages and sites. It works in individual requests that come together to build up a page.
Take, for example, the page on the BBC news website about the Google street view car being stopped for driving too slow (http://www.bbc.co.uk/news/technology-34808105). If you visit this, the home secretary wants us to believe that the government will only store the visited site (www.bbc.co.uk). However, like almost all web pages, it’s made up of a number of scripts, images and styles that are each separate items. Your web browser gets these from a variety of different domains (that’s the site that the ISPs have to record and hand over to the government). If you look at the developer tools (Ctrl-Shift-W in Firefox, Ctrl-Shift-I > Network in Chrome), you’ll see that what appears to you as a single page is actually about seventy requests. The government will have access to the site of each and every one of these.
The list of requests for the BBC news article on the Google Car.
By our interpretation of the act, the ‘site’ of these will be stored. This list of sites of requests creates a fingerprint of each page you visit. Not every page fingerprint is unique so won’t give complete details of your browsing history, but it can reveal far more about your browsing history than the Home Secretary suggests it can. On the BBC site, for example, this fingerprinting is enough to reveal whether you’ve visited a news story, a live news stream, a sports news page, a weather page, etc. Other sites are more revealing. Our initial research suggests that on Reddit, the many subreddit pages have quite different fingerprints due to the different set of images displayed around the newsfeed. On many sites, including Ars Technica, it would allow the government to narrow down the pages you may have visited based on the number of images in each page. This could be a powerful approach since they would be able to see if you entered via the home page (which has a distinctive fingerprint), and then there are a smaller number of possible next pages based on the links from the home page. A spy (or, more likely, automated spying computer) could see which of the possible routes you take through the site match the patterns of requests, and build up a detailed map of your browsing history.
This technique won’t work perfectly for every site. Sometimes it will allow the government to pinpoint the actual page a user’s visiting. Other times it’ll just allow them to narrow it down to one of a group of pages on a site. Either way, it’ll allow the government to build up a far more detailed picture of our browsing history than it first appears is possible in the Snooper’s Charter.