Hugo No Go: Why Copyright Net Bots Just Don't Work [Updated!]

Three days ago, fans of science fiction television, film, and literature were watching the online broadcast of the Hugo Awards on video streaming service Ustream. Midway through the broadcast – just as novelist Neil Gaiman was accepting an award for an episode of Dr. Who he had scripted – the broadcast went dark and viewers were confronted with the notice “banned due to copyright infringement.” The program never returned and a torrent of anger and frustration filled the twittersphere. You can read about the whole situation at io9 (thanks to Adam Doyle for the link). So why did this happen?

Because a net bot employed by Ustream determined that the Hugo Awards broadcast violated several copyrights and did what it was programmed to do: shut it down. See, like all awards ceremonies, the Hugos used brief clips of the nominated shows and films. The net bot (just a fancy word for “software”) determined that these clips infringed on the clip owners’ copyrights and cut off the feed. As it later turned out, no infringements had actually been made. The clips had been provided by the rights holders for use in the ceremony and, according to Ustream’s CEO, the net bot hadn’t been calibrated properly to filter out works that were permitted.

Maybe this was a simple case of bad programming, or maybe the bot was doing exactly what it was supposed to do. Either way, this little technical hiccup illustrates a growing problem in the online world: major copyright holders (often they are corporations), in attempting to prevent the unlawful use of their copyrighted material, are turning to net bots to automatically find and disrupt websites that display those materials. Unfortunately, the net bots are taking down lawful works because they can’t tell the difference between works that are in the public domain and those that are copyrighted.

In 2007, Cory Doctorow of The Guardian warned of this impending problem. His main thesis in opposition to these net bots is the difficulty in programming them to take down actually pirated works, while differentiating them from lawful uses. Because these programs cannot tell the difference, they have the potential to disrupt how we communicate. According to Doctorow, the bots “would [] have to be nearly perfect in regards to false positives - every time it misidentified a home movie of your kids' first steps or your gran's 85th birthday as Police Academy 29 or Star Wars: Episode 0, Jedi Teen Academy, your own right to use the Internet to communicate with your friends and family would be compromised.”

Five years later, Doctorow has been proven right. Last month, Youtube briefly removed video of the Curiosity Mars rover landing because it was flagged as an infringing use of copyrighted work by Youtube’s Content ID system, even though the video was uploaded by NASA and was in the public domain. A similar occurrence happened this past February when a man received a bogus takedown notice from Youtube because his nature video contained singing birds in the background, content that Rumblefish had deemed to be copyrighted.

These articles at Ars Technica and Motherboard explain how Youtube’s Content ID system works, and have nice breakdowns as to why the system is so flawed. Part of the reason appears to be a simple technical inability to program the bots to tell the difference between lawful and unlawful uses of copyrighted works. But there's also an element of corruption: automated systems like Youtube’s Content ID heavily favor large copyright holders because they pressured Youtube into having an automated system that went beyond the notice-and-takedown regime established under the current law (the Digital Millennium Copyright Act).

Because of this, Motherboard claims that the system encourages copyright holders to cast a wide net over a vast array of works, some of which are only tangentially related or completely unrelated to their copyrights. “YouTube’s system is [] heavily biased in favor of claimants, and [] is increasingly controlling of content that has serious educational or scientific value... many of Martin Luther King, Jr.‘s speeches are no longer available on YouTube thanks to automatic and manual copyright claims by the owner of King’s speeches, the British music giant EMI Publishing."  

Obviously the benefits to an automated system are cost effectiveness and speedy removal of copyrighted works from a website. But because there is no human being at the other end making determinations as to whether or not a work is being infringed, the little guy often suffers. Forget fair use of someone else's copyright, some people have had their own copyrighted work taken down because it was claimed to be owned by someone else – just two months ago, comedian Brian Kamerer had his video taken down from Youtube because NBC claimed it owned the copyright, despite the fact that Kamerer had made the video and it had been used (without his permission) on The Tonight Show.

I think that copyright law one of the things that make America truly great. Seriously. It sends a message that we as a society prize originality and expression. It says that we understand the value of independence and innovation. It shows that we are willing to allow the fruits of your work to be protected from theft. Certainly, we can all quibble over the specifics of the law, such as duration of the life of the copyright, how much in damages an infringer should pay, or the types of works protected; there are legitimate policy discussions to be had about these things. Even with all that room to argue, I think copyright protections are good for artists, good for innovators, and good for our country as a whole. But when a system isn’t monitored properly or at all, it can start to lay waste to everything it should be protecting. Copyright law is no exception.

[Update: 9:30am, September 5, 2012] Youtube has taken down Michelle Obama's DNC speech from last night because it triggered Youtube's copyright infringement filters.  Ugh.