Do you remember a time when CAPTCHA was easy? For those that don’t know, CAPTCHA is that silly looking word test that you have to complete on a website in order to verify that you are not a robot. I understood at the time that this simple test was highly effective in fighting spam and helped websites to function properly.
Then a few years after CAPTCHA came reCAPTCHA. I remember being annoyed when the new two word test was introduced. The reCAPTCHA test words were more difficult to decipher and I started to fail the simple test because the words were often illegible. I soon found out why. These new words were actually scanned images from old newspapers.
In 2009, I read an article that reported that the New York Times archive had almost been totally digitized through reCAPTCHA. This is an amazing feat and I was totally intrigued. The article went on to inform me that I may have helped with this process. Really?
After reading about reCAPTCHA I was instantly disturbed by a few facts:
1. I had never agreed to “work” for the New York Times or the people who own reCAPTCHA
2. I was not being paid for my “work” as a translator
3. I was not being credited for my participation in the New York Times project
4. I actually have to pay in order to access the New York Times archive that I apparently helped to translate for free
5. There was no way for me to opt out of this project
In theory, I get what the creator of reCAPTCHA, Luis von Ahn, is trying to do. He claims that he was motivated to come up with this system so that millions and millions of human brain cycles were not wasted on fruitless quizzes (estimates are 100 million CAPTCHAs every day), but essentially what he did was come up with a way to exploit and profit from the masses. He claims that he has research that shows that reCAPTCHA does not take any longer than CAPTCHA, but anyone who has ever done a reCAPTCHA vs a CAPTCHA knows better. I literally just went to the reCAPTCHA website to get some screen shots for this article and this reCAPTCHA test popped up:
Seriously? You can’t even read that second word! Why, because it was probably a document from 1882 that was scanned and the OCR reader could not decipher it, because it is not a word and a computer is too stupid to know that. This bum word (and countless others like it) undoubtedly slows down human users because you have to request a second set of test words in order to pass the test. This rarely happens with computer generated CAPTCHA, as they are usually legible.
Few people may realize that reCAPTCHA is actually a form of crowd sourcing. And as far as I’m concerned, if you are not paying for crowd sourcing, then you are exploiting the masses. And with rates of unemployment near 10% in the United States, I think that most Americans would want their pennies delivered. Not to mention people around the world that work for pennies each day. I would even be up for an option to donate my pennies to people that need it. Really, anything is preferable to outright exploitation. I sent the infamous Richard Stallman (American software freedom activist) the email I sent to von Ahn in 2009 complaining about his reCAPTCHA system. I asked him for his thoughts and he replied: “I don’t know what reCAPTCHA is except what I saw in your message. I think I agree that they should make the results public [meaning free access to the NYT archives].” I’m not surprised that he had never heard of reCAPTCHA (I don’t think he really uses websites that would use them), but as the founder of the free software movement, I was pretty sure that he would support open access to the New York Times archive, especially since the general public was responsible for translating it.
To give you an idea of how successful reCAPTCHA has become, Google recently purchased reCAPTCHA to assist them with their massive Google Book Search project, because they realize that humans are needed and that there would be nothing to “Google” if humans weren’t doing the data entry for the internet. It looks like they also realized that reCAPTCHA was a great way to get the data entry done “for free.” All I know is that people who are smart enough to invent reCAPTCHA are smart enough to figure out how to properly compensate people for their time —they just aren’t being forced to and don’t care enough to do so. To give you a sense of the mentality of von Ahn, here is a quote from his white paper reCAPTCHA: “Unfortunately, human transcribers are expensive, so only documents of extreme importance are manually transcribed.” But, aren’t all documents important at the end of the day? The internet could not function without human translation and data entry. I know that there are other forms of crowd sourcing and exploitation, but reCAPTCHA is different. reCAPTCHA relys on real key strokes; this is work (not tracking, survey’s etc).
Thankfully, humans are smarter (so far) than robots and computers and we are still needed. Information needs to be digitized, tagged, translated etc. in order to be used by programmers and computers. So, if it wasn’t for us humans, then computers, software and the internet would be useless. I just ask that we not be exploited in the process.
I support companies like Amazon’s Mechanical Turk, which is a crowd sourcing internet marketplace that allows people to get paid for this type of work. Some claim that it is a “virtual sweatshop,” but at least they are being paid. Here is an example of a project completed using paid crowd sourcing: the Sheep Market. Each person was paid $.02 USD to draw a sheep. The average wage was $.69/hr. Clearly there may be minimum wage issues etc., but at least there is not outright exploitation. More information on paid crowd sourcing can be found at Smart Sheet.
I’m not happy with being exploited, and there are not a lot of options for me, but I’d like to share one solution that I’ve come across—it’s quite easy to fail the reCAPTCHA test and get through. Failing the test is the best way I’ve found to essentially opt out of participating in reCAPTCHA and the various projects that are linked to it.
reCAPTCHA’s two word test works by having a control word and a random word that has not been verified yet. If you get the control word right and a few other people agree on the unknown word, then it is assumed that the unknown word has been deciphered correctly. In order to opt out, it’s pretty simple to distinguish the control word from the unknown word. The control word is usually something like “very” while the unknown word is something like “wricaule.” Really, it’s so easy to fail these, you’d almost think that a robot could do it. So, you spell the control word correctly and then just come close on the unknown word (e.g., type “wricaul” instead of “wricaule” and it will usually let you through, since, essentially, a one word CAPTCHA is all that is needed to verify that you are not a robot—the unknown word is the one that they’ve tacked on in the hopes that you’d translate it for free. To me, failing the test means that I have not contributed to translating the unknown word. It doesn’t save me any time (actually takes me longer sometimes), but at least this process allows me to opt out in some way.
Maybe you think I’m crazy after reading this and think that I should not care about frittering away a few seconds of my life on performing reCAPTCHA, but this is exactly the same argument the von Ahn uses to justify reCAPTCHA. I just know that deciphering illegible OCR words is not my favorite past time—I would rather decide how my time and “brain cycles” are spent instead of leaving it up to a computer scientist entrepreneur.