reCAPTCHA Exploits the Masses

Do you remember a time when CAPTCHA was easy?  For those that don’t know, CAPTCHA is that silly looking word test that you have to complete on a website in order to verify that you are not a robot.  I understood at the time that this simple test was highly effective in fighting spam and helped websites to function properly.

Then a few years after CAPTCHA came reCAPTCHA.  I remember being annoyed when the new two word test was introduced.  The reCAPTCHA test words were more difficult to decipher and I started to fail the simple test because the words were often illegible.  I soon found out why.  These new words were actually scanned images from old newspapers.

In 2009, I read an article that reported that the New York Times archive had almost been totally digitized through reCAPTCHA.  This is an amazing feat and I was totally intrigued.  The article went on to inform me that I may have helped with this process.  Really?

After reading about reCAPTCHA I was instantly disturbed by a few facts:

1.     I had never agreed to “work” for the New York Times or the people who own reCAPTCHA

2.     I was not being paid for my “work” as a translator

3.     I was not being credited for my participation in the New York Times project

4.     I actually have to pay in order to access the New York Times archive that I apparently helped to translate for free

5.     There was no way for me to opt out of this project

In theory, I get what the creator of reCAPTCHA, Luis von Ahn, is trying to do.  He claims that he was motivated to come up with this system so that millions and millions of human brain cycles were not wasted on fruitless quizzes (estimates are 100 million CAPTCHAs every day), but essentially what he did was come up with a way to exploit and profit from the masses.  He claims that he has research that shows that reCAPTCHA does not take any longer than CAPTCHA, but anyone who has ever done a reCAPTCHA vs a CAPTCHA knows better.  I literally just went to the reCAPTCHA website to get some screen shots for this article and this reCAPTCHA test popped up:

Seriously? You can’t even read that second word!  Why, because it was probably a document from 1882 that was scanned and the OCR reader could not decipher it, because it is not a word and a computer is too stupid to know that.  This bum word (and countless others like it) undoubtedly slows down human users because you have to request a second set of test words in order to pass the test.  This rarely happens with computer generated CAPTCHA, as they are usually legible.

Few people may realize that reCAPTCHA is actually a form of crowd sourcing.  And as far as I’m concerned, if you are not paying for crowd sourcing, then you are exploiting the masses.  And with rates of unemployment near 10% in the United States, I think that most Americans would want their pennies delivered.  Not to mention people around the world that work for pennies each day.  I would even be up for an option to donate my pennies to people that need it.  Really, anything is preferable to outright exploitation.  I sent the infamous Richard Stallman (American software freedom activist) the email I sent to von Ahn in 2009 complaining about his reCAPTCHA system.  I asked him for his thoughts and he replied: “I don’t know what reCAPTCHA is except what I saw in your message.  I think I agree that they should make the results public [meaning free access to the NYT archives].”  I’m not surprised that he had never heard of reCAPTCHA (I don’t think he really uses websites that would use them), but as the founder of the free software movement, I was pretty sure that he would support open access to the New York Times archive, especially since the general public was responsible for translating it.

To give you an idea of how successful reCAPTCHA has become, Google recently purchased reCAPTCHA to assist them with their massive Google Book Search project, because they realize that humans are needed and that there would be nothing to “Google” if humans weren’t doing the data entry for the internet.  It looks like they also realized that reCAPTCHA was a great way to get the data entry done “for free.”  All I know is that people who are smart enough to invent reCAPTCHA are smart enough to figure out how to properly compensate people for their time —they just aren’t being forced to and don’t care enough to do so.  To give you a sense of the mentality of von Ahn, here is a quote from his white paper reCAPTCHA: “Unfortunately, human transcribers are expensive, so only documents of extreme importance are manually transcribed.”  But, aren’t all documents important at the end of the day?  The internet could not function without human translation and data entry.  I know that there are other forms of crowd sourcing and exploitation, but reCAPTCHA is different.  reCAPTCHA relys on real key strokes; this is work (not tracking, survey’s etc).

Thankfully, humans are smarter (so far) than robots and computers and we are still needed.  Information needs to be digitized, tagged, translated etc. in order to be used by programmers and computers.  So, if it wasn’t for us humans, then computers, software and the internet would be useless.  I just ask that we not be exploited in the process.

I support companies like Amazon’s Mechanical Turk, which is a crowd sourcing internet marketplace that allows people to get paid for this type of work.  Some claim that it is a “virtual sweatshop,” but at least they are being paid.  Here is an example of a project completed using paid crowd sourcing: the Sheep Market.  Each person was paid $.02 USD to draw a sheep.  The average wage was $.69/hr.  Clearly there may be minimum wage issues etc., but at least there is not outright exploitation.  More information on paid crowd sourcing can be found at Smart Sheet.

I’m not happy with being exploited, and there are not a lot of options for me, but I’d like to share one solution that I’ve come across—it’s quite easy to fail the reCAPTCHA test and get through.  Failing the test is the best way I’ve found to essentially opt out of participating in reCAPTCHA and the various projects that are linked to it.

How to Opt Out of reCAPTCHA

reCAPTCHA’s two word test works by having a control word and a random word that has not been verified yet.  If you get the control word right and a few other people agree on the unknown word, then it is assumed that the unknown word has been deciphered correctly.  In order to opt out, it’s pretty simple to distinguish the control word from the unknown word.  The control word is usually something like “very” while the unknown word is something like “wricaule.”  Really, it’s so easy to fail these, you’d almost think that a robot could do it.  So, you spell the control word correctly and then just come close on the unknown word (e.g., type “wricaul” instead of “wricaule” and it will usually let you through, since, essentially, a one word CAPTCHA is all that is needed to verify that you are not a robot—the unknown word is the one that they’ve tacked on in the hopes that you’d translate it for free.  To me, failing the test means that I have not contributed to translating the unknown word.  It doesn’t save me any time (actually takes me longer sometimes), but at least this process allows me to opt out in some way.

Maybe you think I’m crazy after reading this and think that I should not care about frittering away a few seconds of my life on performing reCAPTCHA, but this is exactly the same argument the von Ahn uses to justify reCAPTCHA.  I just know that deciphering illegible OCR words is not my favorite past time—I would rather decide how my time and “brain cycles” are spent instead of leaving it up to a computer scientist entrepreneur.

-jen grygiel


Share on FacebookTweet about this on TwitterPin on PinterestShare on Google+Share on TumblrShare on StumbleUponDigg thisEmail this to someone

  5 comments for “reCAPTCHA Exploits the Masses

  1. Jay
    November 16, 2010 at 7:50 pm

    I think you are too quick to dismiss whatever “study” it is that von Ahn claims to demonstrate that reCAPTCHA takes no longer than CAPTCHA; and your argument relies crucially on that point. Could the same argument be made if, hypothetically, one could prove that they take the same amount of time? (Not a rhetorical question).

    Also, I don’t think RMS uses anything other than Emacs.

  2. admin
    November 17, 2010 at 9:35 am

    Jay, I’ve read the study, have you? Last time I checked in, he had admitted that he had no “significant” research showing that it does not take any longer to do recaptcha vs captcha. So, he has not proven it…with his millions of dollars of research funds. I don’t think I’m quick to dismiss–he had no hard evidence. I think my “opinion” is as good as his and I have my own personal experience to back up my beliefs. And regardless, if they are using you to produce work in any capacity (physical key strokes) and don’t give you credit or pay, then you are being exploited.

    Also, I don’t understand your comment about emacs and how it is related to my comments about Stallman. He does use the internet sparingly (when he has free access and a borrowed computer) and has also devised his own way to surf: “To look at page I send mail to a demon which runs wget and mails the page back to me. It is very efficient use of my time, but it is slow in real time.”

  3. Jay
    November 17, 2010 at 1:59 pm

    No!, I haven’t read the study. It’s not linked here, and it sounded like he was maybe being intentionally vague about it, and that you hadn’t read it either: “He claims that he has research…” Adding that you have read it and that he has admitted that he has no significant evidence makes it a different story altogether (something you didn’t mention initially). I would also be interested to hear why specifically you think the research is bunk (genuinely — that’s not a challenge).

    On top of that, I’m in no way trying to claim that it takes the same amount of time — only (a) that I’m open to the idea that it does (though, not having read this elusive “study”/”research,” I have no way of knowing), (b) that personal experience is not an adequate substitute for tests involving many different people (whether those tests prove your point or his is another matter — the issue is that anecdotal evidence from one person is hardly a basis for serious argument), and (c) I think your case is dramatically weakened if it could be shown (again, I’m not claiming that it has been) that reCAPTCHA takes no more time than ordinary CAPTCHAs, and I was looking for your take on that. It seems that you think your claim is just as strong when accounting for that possibility; I’m inclined to disagree, but on potentially shaky ground. Here’s another question: If I were to show that reCAPTCHA was significantly faster than CAPTCHA on average, would you make the same case?

    My Stallman comment was meant to be an off-hand joke about his love of plain text and his arcane way of doing just about everything, bless his heart.

  4. April 17, 2011 at 3:20 pm

    Yeah, I am doing the exactly same thing with stupid reCAPTCHA ..

    Also, I don’t understand, why they apply the “wavy” transformation for both words.. So they not just ask you to work for free, but even make this work intentionally harder than it could be.. This is just simply wrong thing.

  5. Dion
    May 11, 2011 at 6:15 am

    1. I am sure that the admins would prefer to spend their time checking your response to articles to make sure that you are human. You of course would expect this for free or would you be willing to pay for it?

    2. I am sure that you would be happy to spend hours sift through websites riddled with SPAM rather than spend a few seconds using your brain.

    3. Please make sure that the programmer who made the open source CAPTCHA code at the bottom of the article available to you in this blog is compensated. No? Then using your logic, you are exploiting the programmer.

    4. Please make sure you pay for all the bandwidth you use on every part of the web that you travel. No? Then you are exploiting those that provide it. You most certainly did not ask them.

    5. You have my email with this submission. Please make sure I get paid for this submission. You don’t want to exploit me and my time. While you are at it, please pay me for my time to read this misguided drivel.

    6. The sites that use reCaptcha do it by choice. They can continue to use Captcha or they could actually write their own. Instead they chose to “exploit” a piece of code written by someone else to do this work for them.

    7. The sidewalk that you walk on using your logic came from other peoples tax dollars – not yours. Many were probably laid before you were born… get over it dude… seriously…
    5.

Comments are closed.