Bennett Haselton writes this week with a dissection of the effects of one well-known, long-known problem with so-called Internet filters. "The New Braunfels Republican Women, the Weston Community Children's Association, and the Rotary Club of Midland, Ontario are among the sites categorized as 'pornography' by Blue Coat, a California-based Internet blocking software company. While the product may not be much worse than other Internet filtering programs in that regard, it reinforces the point that miscategorization of sites as 'pornographic' is a routine occurrence in the industry, and not just limited to a handful of broken products." Read on below for the rest.
On Monday I released a blog post through the Citizen Lab at the University of Toronto, listing some of the sites that we had found to be blocked by Blue Coat's Internet filtering program. Previously we had released a similar report on sites that were miscategorized as "pornography" by Smartfilter. We ran some of the same URL lists through both programs, and found that some unfortunate sites were even blocked as "pornography" by both products, including Barenboim-Said (a youth orchestra featuring musicians from Israel, Palestine, and different Arab nations), and the aforementioned New Braunfels Republican Women.
The full list of sites we said were "miscategorized" is at the end of the Citizen Lab blog post. As far as I know we didn't miss any porn hidden on any of the sites that were in the list. The closest we came was a photo on performancespace.org/ showing what appears to be a model taking one for the team by lying on the floor of a grungy art exhibit. There was also the other borderline case of http://safe-sex.org/, which does include articles on topics like "Safe Sex with Expensive London Escorts." But Blue Coat's own working definition of 'pornography' defines it as "Sites that contain sexually explicit material for the purpose of arousing a sexual or prurient interest," and the articles on Safe-Sex.org do not appear intended to arouse ("The heartwarming fact about having safe sex with expensive London escorts is that they usually present a clean bill of health to clients."), so it gets counted as a miscategorization. The overwhelming majority of miscategorized sites were completely G-rated fare like the Kiddie Kollege Nursery School (which, by the way, would probably have grounds for a lawsuit against Blue Coat, if parents trying to access their website were greeted with a message that it had been blocked for containing "pornography").
Anyone can play the parlor game of examining blocked websites looking for signs of what caused them to be blocked. Is the website of the New Braunfels Republican Women blocked by both Blue Coat and Smartfilter because it has the word "women" in the title? (Tempting to thing so, but unlikely, since there are so many other sites with "women" in the name which were not blocked by either product.) One of the blocked websites, http://www.foundations4betterliving.org/, until recently contained statistics such as "A growing variety of sexual behaviour is being practiced by teens 15- to 19-year-old... 53% admit to masturbating; 49% have participated in oral sex; 11% have had anal sex," all of which you could read on their front page while Bette Midler's 'From A Distance' auto-played in the background. (I was hoping to introduce you to that sublime experience, but unfortunately the domain apparently expired right after the report was published. When you list 150 domain names in a report, that's bound to happen with some of them.) And there's neobit.org/, the homepage of a manufacturer of emulators for dongles. While many Americans probably heard the term for the first time when Amy Poehler asked the Best Buy salesman "Can I use a dongle with this? Does it make you uncomfortable when I use the word 'dongle'?", the eggheads at Blue Coat should know what a dongle actually is. 'Dongle' has never been generally accepted anatomical slang, one rogue entry at the Urban Dictionary notwithstanding.
On the other hand, most websites in the report are not only not pornographic, they don't even seem to contain any content that could have triggered an accidental block. So it's quite possible that Blue Coat simply blocks a certain number of sites as a result of some pseudo-random process, and just by chance, some of those sites happen to contain content which looks like it might have caused the block, but the content actually had nothing to do with it.
Still, that leaves open the question of why so many sites turned up blocked by both Blue Coat and Smartfilter. Out of about 150 sites miscategorized by Smartfilter and about 150 sites miscategorized by Blue Coat, 8 sites showed up on both lists, or about 6%. (That group of 8 is listed in the middle of the blog post, beginning with balticsail.org.) Now if either Smartfilter or Blue Coat were blocking non-pornographic sites completely at random, then the percentage of overlap should be about the same as the percentage of non-pornographic sites that the product blocks generally. (For example: Suppose Blue Coat blocked 1% of non-pornographic sites completely at random. Out of 150 non-pornographic sites blocked by Smartfilter, we would therefore expect 1% of them -- about 1 or 2 sites -- to also be blocked by Blue Coat.) But despite the huge number of errors made by both products, neither of them comes close to blocking 6% of all non-pornographic websites as "pornography"; the percentage of overlap is much higher than we would expect if the blocking were random.
So this suggests that some factor is at work that caused the 8 sites in that list to be more likely than average to be blocked, such that they ended up blocked by both products. Did any of the domain names used to be registered to a porn site? It seems hard to imagine that balticsail.org or barenboimsaidusa.org/ could have ever been in demand as domain names used to advertise porn. moriah.org/ sounds like it possibly could have been (many domain names consisting solely of female first names are registered to porn sites), but according to the Wayback Machine, the a previous owner was a Christian band, before the domain expired and was bought by its present-day owner, a Jewish boarding school. Perhaps the IP addresses of these sites used to be held by porn companies, but then why would the products block the sites by their domain name as well? So I really don't know.
The good news is that, unlike Smartfilter, at least Blue Coat's blacklist doesn't appear to be used by any countries for nationwide Internet censorship. Citizen Lab had previously discovered installations of Blue Coat Internet blocking software in 19 "countries of interest" with poor human rights records, but none of them appeared to be set up to filter Internet traffic in and out of the country. In the one country where the product was being used for statewide Internet filtering, the United Arab Emirates, the Blue Coat software was being used in conjunction with Smartfilter's blacklist, so the sites that are mis-blocked by Blue Coat are not blocked in that country (unless of course they also happen to be mis-blocked by Smartfilter).
For the time being, it is not against U.S. law for a company to sell Internet censoring software to foreign governments, even with the knowledge that the tools are being used to restrict freedom of speech in a manner that would be considered a human rights violation by international standards, so both companies have made it a core part of their business.
What a bunch of dongles.