Speech Technology Group: Hacking and defeating Google's reCAPTCHA with a 99% accuracy

Speech recognition and captcha solving......

One exciting thing about this: the entire model of reCaptcha (at least the text ones; I assume the audio ones are similar) is to make people do useful work when solving captchas by having them complete tasks that they consider too hard for computers to do well (in the text reCaptcha case, OCR). If someone writes software that can defeat the captcha, it does mean the security model is broken, but it also means the state of OCR technology (or audio recognition or whatever) has been advanced, and the digitization of books that had previously required human intervention can now be accomplished by automated means. In other words, spammers are incidentally creating the tools to expand the scope of digital human knowledge. Win-win, really.

fuelfive 1 day ago | link

Unfortunately, this attack does nothing to advance the state of the art in OCR (or audio recognition). It's basically the same story as every other CAPTCHA attack to date: take advantage of some accidental statistical regularity in the generation function. As soon as this kind of flaw is discovered, it only takes a few hours for the generation code to be patched in such a way that completely prevents this sort of attack from working.

fghh45sdfhr3 1 day ago | link

So either the code is easy to patch, or we DO advance. Win/Win?

xibernetik 23 hours ago | link

Not really... Even if the code is difficult to patch, speech/audio recognition doesn't advance much when an attacker figures out how to remove the (non-random) noise added by a machine over the sound file. Actual speech recognition relies on the ability to filter out background noise - which is a lot more complex/random - added by surroundings, not a machine.

It's very difficult to generate some sort of noise via algorithm that a) humans can filter out and b) can't be removed by some algorithm. As a result, audio captchas are a huge vulnerability and the weakest link in almost any captcha system, although you can't get rid of them by law.

Hypotheticals aside, the code was easy to patch - note the footnote: > In the hours before our presentation/release, Google pushed a new version of reCAPTCHA which fully nerfs our attack.

d2vid 19 hours ago | link

Could one take real recorded noise and add that rather than noise generated via algorithm? Wouldn't that force attackers to solve a real problem (removing background noise from an speech sample)?

xibernetik 12 hours ago | link

It's not really solving the "real" problem... If I'm just mashing two audio files together, that's going to be different than someone talking in the middle of a train platform and there will likely be algorithmicly-determinable difference from the artificially generated words and the naturally generated noise.

All of this aside, removing background noise is not a huge issue anymore. We have pretty decent noise-cancellation technology. Speech recognition - the other big component - has advanced a lot in recent times and is actually pretty good, although not for every company/product.

Even if it would be helpful, you'd have to record an incredible amount of noise in the first place, seeing as you're getting millions of hits a day and if you have a small sample set, the attackers will just figure out the solutions to that sample set and be done.

I'm not saying it's impossible, but I am saying it's probably not worth it at this point. Captchas (in their traditional forms) don't make sense as a long-term strategy anyways.

robryan 14 hours ago | link

Yeah, you would think they could record thousands of hours of real world noise then randomly use sections of it on each audio captcha.

A1kmm 5 hours ago | link

If the attacker manages to obtain all the random noise, they could index every window in the noise in a k-d-tree and perform an efficient nearest neighbour search for the exact background from the CAPTCHA audio, and then simply subtract the background, giving perfect segmentation in O(log(N)) asymptotic average time complexity for N windows (at 64kHz and 2000 hours of audio, N=460800000, log N = 19.95).

jopt 23 hours ago | link

That kind of Win-Win is also a Lose-Lose. It's just a glass-half-empty thing.

TeMPOraL 20 hours ago | link

> If someone writes software that can defeat the captcha, it does mean the security model is broken, but it also means the state of OCR technology (or audio recognition or whatever) has been advanced, and the digitization of books that had previously required human intervention can now be accomplished by automated means.

No it doesn't. reCaptcha only checks one of two words it displays (the other one being what OCRs can't handle themselves), so naturally you only need to crack one and input garbage as the other, thus actually making the world a worse place.

apendleton 20 hours ago | link

Probably not substantially worse. You would need to be able to tell which was which, since they're not consistently ordered and the "good" one is deliberately obfuscated/smudged/whatever, and, since recaptcha depends on multiple users agreeing on the right answer, you and other attackers would need to be consistent in your garbage in order for it to make it into the canonical book transcript.

anonymoushn 20 hours ago | link

For ReCAPTCHA, you only have to type the known word almost-correctly. You can just type anything for the word they don't already know. Even if someone had tech that could pass visual ReCAPTCHA reliably, it wouldn't need to be capable of doing useful work.

aptwebapps 1 day ago | link

Are the audio versions sourced from recordings that need transcription as the visual versions are sourced from scanned documents? I assumed that was just to improve access.

If not, then it isn't useful work, really.

vhf 1 day ago | link

Looking at how the hacked it, we can safely deduce the audio version is only to improve access.

Useless for transcription, because it 'validates' phonetically. This is part of the attack : Whether audio is "wagon" or "van", the 'word' "wagn" validates. Same for "Spoon" and "Teaspoon", which both can be validate by entering "Tspoon". (Since Ts can be said "Ss" as in "Tsunami" (Sunami) and "T S" as in T-Rex.)

apendleton 20 hours ago | link

That's a shame. I would imagine that a productively similar audio task could be devised around humans helping to transcribe hard-to-auto-transcribe audio recordings, and that it could use a similar strategy to the visual one: a known-good clip and an ambiguous one, requiring the user to transcribe both.

vhf 16 hours ago | link

Well, I'm not a linguist or anything like that, but I'm not sure it would work so well, if it was useful at all.

1/ Homophony. Knowing how to write a word you hear often needs contextualisation. Giving two whole sentence for the user to hear is too long and takes too much time for the user. 'Right, but no need for a whole sentence, a few words suffice' Right, but if the computer knows where to cut these sentences, it can also transcribe it itself. 2/ You have to assume good spelling from user. 3/ Is it useful ? I mean, 100M recaptcha are done everyday, what part of these 100M are audio recaptcha ? 0.0005% ? Less ? Transcribing 2 sentences every week makes no sense. Keep in mind that homophony+bad spelling are two factors increasing hugely the number of times the same 'unknown clip' will have to go through 'human validation' until we can assume with a certain level of confidence that it has been transcribed.

Funny 4/, take a look at the [cc] button on some youtube videos : on the fly transcription. Thanks Google. :) Oh, and also on the fly translation of on the fly transcription, btw. Google even said they were working on Voice to voice translation for Google Voice : English speaker calls Chinese speaker, english voice transcribed, then translated, then synthesized, same the other way. :)

throwaway1979 1 day ago | link

Google's captcha system is horrid. I've mentioned this to people on the accessibility team but to no avail. They used to have a wheel chair icon next to the bloody scrambled text. I taught a computer class to seniors and it was painful watching them deal with the account sign up process (also, I thought it was insulting asking a mobile senior to click on the wheel chair icon ... to the designer ... FU!). Clicking on the wheel chair would give audio that barely made any sense to me. The whole process was stupid.

Like many others, I can barely get through their captcha service. I'm actually happy people circumvented it. Maybe someone will think it through this time around.

aidenn0 23 hours ago | link

http://en.wikipedia.org/wiki/International_Symbol_of_Access

TazeTSchnitzel 1 day ago | link

OK. Show me a CAPTCHA that is easy for humans to read, and very difficult for computers to read.

rytis 5 hours ago | link

It shouldn't be about being able to "read". Ideally it should be something that only human could "solve". May be stupid example, but I can't come up with a better one (and if I could, I would be a lot wealthier by now ;) ): show a picture where a forest in landscape is violet and ask a user to identify what's odd on that picture. Another example, show a face with two noses. Ask them to identify the odd item. Things like that. Easy for human, impossible for computers. The problem is that these tasks need to be generated by humans, as I cannot think of any (irreversable) way to do this automatically. May be mechanical turk to the rescue?

saraid216 21 hours ago | link

> Like many others, I can barely get through their captcha service. I'm actually happy people circumvented it. Maybe someone will think it through this time around.

Anytime you're ready, we're listening.

entropy_ 1 day ago | link

Well, chances are it'll get even harder now, not easier since they'll need to add further complexity to differentiate humans from bots.

tmh88j 1 day ago | link

I also thought that was insulting with the wheel chair icon. A person in a wheel chair has problems walking, not (necessarily) their vision. How about "Help" in text?...less confusion and possible anger

ceejayoz 20 hours ago | link

Apparently the wheel chair is actually an ISO standard.

http://en.wikipedia.org/wiki/International_Symbol_of_Access

DanBC 18 hours ago | link

But it's a symbol for mobility - not for all disabilities.

There are symbols for visual impairment, but they're not international.

(http://commons.wikimedia.org/wiki/File:Pictograms-nps-access...)

(http://3.bp.blogspot.com/-HfEx4Y_O_Gs/Tf0huVPZBXI/AAAAAAAAAC...)

excuse-me 1 day ago | link

Since the point of the audio version is not to be hit with lawsuits under the ADA - perhaps it should just be a little icon of a lawyer?

PassTheAmmo 23 hours ago | link

I imaging that whole sites could be designed using only your proposed lawyer icon, possibly with some additional icon representing political correctness.

Graphon 23 hours ago | link

What's the international symbol for a lawyer?

https://www.google.com/search?q=parasite+icon

omonra 1 day ago | link

This may be very interesting to crack, but who is responsible for Google making their CAPTCHA almost impossible for human to decipher now? I seriously have to click 5 times before even seeing anything resembling letters I can parse

joelthelion 20 hours ago | link

This simply means computers are getting really good at this game. And that Google, with all its power, hasn't found a better alternative to Captchas yet.

I find that pretty worrying for the future of the internet. An internet without working captchhas will probably be full of bots and spam.

Jach 19 hours ago | link

Naive Bayes spam classifiers are fast enough that I think they could be used as drop-in replacements, and they'd be well-trained too considering Google's existing success with spam in email. And Naive Bayes isn't the only solution, there are plenty of other heuristics. Even something as dumb as "first post from this user/IP/'identity' and full of links?" would catch a lot of common spam.

Jach 6 hours ago | link

I really wish HN required a comment with a downvote on accounts with more than 100 karma (where one can reasonably assume the user isn't a newbie leaving empty comments like "I agree"). I'd ask that person to think it through: why doesn't gmail require a captcha every time you want to send an email? What about other mail providers? What about your own native client? An internet without working captchhas[sic] will probably be full of bots and spam. Is your inbox full of bots and spam? Mine isn't. Even my spam box is at less than 800 over the period of a month; before one of the big botnets was taken down a few years ago that was generating most of the world's spam I still had less than 4000 over a month.

masonlee 20 hours ago | link

Is this an argument that future web services may depend more heavily on identity/reputation services?

duiker101 1 day ago | link

I didn't go in depth of the article/method but from what I've read it takes advantages of the audio function not the images.

fibertbh 1 day ago | link

Agreed. They are unbearable. But name a better alternative!

Anything that I can think of like adding to the CAPTCHA such as a half-line of letter above and below or adding noise would probably make cracking them programmatically even easier.

verroq 1 day ago | link

You only have the type one word correct, and they make sure one word is readable so with some practice you can easily recognise which word is the key-word.

btilly 1 day ago | link

With practice and decent eyes. Once your eyes go downhill, it is amazing how much changes.

Most software developers are under 40 and therefore have little to no appreciation for what happens to people's eyes after they are past 40.

ohgodthecat3 1 day ago | link

He isn't speaking of reCAPTCHA but google's nearly impossible to read captchas that take way too much time to decipher.

If you'd like to see it you can usually get it by putting in a bad password to a gmail account too many times (though I don't know if that has other consequences).

Edit: Here is an image example