21 April 2009

Google What's Up CAPTCHA

I heard about the new Google robot detector a.k.a. CAPTCHA from a security blog I've been reading, ha.ckers.org.

The captcha problem is a real favourite of mine and I think about it all the time.

Good old "rsnake" seems to dismiss this new system seemingly out of hand. I think it's great. Google's What's Up CAPTCHA makes several improvements over the previous warped letters systems. Many of these are described in their report.

Briefly, the system involves showing randomly rotated pictures and the user is prompted to indicate which way is up.

Firstly captcha is only about robot detection. It won't detect large hordes of humans paid paltry sums or coerced to pass your detector. Don't blame captcha. This is not the seven letter acronym you are looking for.

The current letter recognition strategies all require a large amount of noise or distortion to be added. It doesn't help that computer scientists have been working for decades to improve automated letter detection in the presence of noise for legitimate purposes. The reality is, as with Chess, computers have become pretty good at it. Excessive noise can throw them off, trouble is most people seem to have almost as much difficulty.

Warped letters, along with most if not all visual stimuli leave blind people in the lurch. Often a parallel audio cue is used for them.

Another problem with letters is that they are not very international. Many cultures are comparatively unfamiliar with the English alphabet which is invariably adopted for captcha and will have an even harder time than native English speakers in recognising letters after they've been twisted.

RSnake takes a hurried swipe at the new system by running through the captcha acronym coming up with the following categories of dismissal:

  • It leaves blind people in the lurch.
  • It is not Completely Automated, requiring Google staff to run through images to see if there is an obvious "up".
  • It requires JavaScript, Flash or other new fangled technology which people like RSnake never use.
I think he's wrong on all three counts.

Firstly, a parallel audio alternative is no less an option than it is today. Blind people can use that. It can remain a completely parallel audio captcha effort with it's own analogous problems. Perhaps people aren't writing audio captcha solvers. Perhaps it's still too hard.

Secondly it can be completely automated. The fact that humans will reliably discern "up" in pictures that have a discernable up will show up in large numbers of responses. Robots will be random, systematically wrong or inaccurate across a greater range than humans. If you present a series of prompts and use the combined results to decide then you can get whatever statistical probability you deem acceptable. The only critical factor is that humans find it easier than computers by a significant margin.

Without human intervention, you may get non-performant source images, such as a photo pointed straight down at the ground. Presumably humans will not be able to tell "up" uniformly and these can be weeded out of the source pool using only responses to the prompts.

So, assuming "up detection" is still too hard to automate, this will make a nice captcha system. If this assumption starts to be violated, then it will be time to dream up a new system.

Lastly, I can say with some confidence that this system can be implemented with no new-fangled technology or plugins. Just images and HTML. Not as "2.0" as the AJAX version, but adequate.

There are some other awesome captcha systems that I have heard of over the years. Of particular note is kitten auth and hot or not captcha where you prove you are human by identifying kittens and sexiness respectively.

My mind is bubbling with exciting ideas on this topic, so it's probably a good point to cut this post off and promise to put those to print another day.


  1. Interesting post, Chris, as much for the broad topic of captcha as this particular development.

    But with regard to the up-captcha, how do you see audio working for people with vision impairment? With a letter string, the audio reads out the letters, both triggered by code.

    What would the audio for up-captcha say, and how would that be tied to the visual representation?

  2. Hi Ricky,

    If audio prompts of letter sequences have been broken like visual cues then there will need to be some change to the system used. However, if the audio captcha systems have not been broken then they can continue being used as they are today.

    In other words, there would be two separate and unrelated ways to prove you are human, one an audio prompt asking you to type letters and the other a better visual prompt, in this case, asking you which way is up.

    I really should research the solving of audio captcha but my central point is that these two modes can be completely distinct, they don't have to be alternate representations of the same prompt.