Computers get smarter, CAPTCHAs get harder

01 Mar, 2011

This week’s CBC tech column (CBC.ca version) is all about CAPTCHAs, and how some of them are getting harder. CUE OMINOUS VOICE. The robots are getting smarter.

[Audio to follow]

===

Tell me if this sounds familiar: you’re online, ready to buy some concert tickets or to sign up for a new email account. But before you’re allowed to proceed, you have to prove you’re a human being by deciphering a mess of distorted, squiggly letters and numbers, then typing them into a text box. This is what’s called a CAPTCHA, or completely automated public Turing test to tell computers and humans apart.

For a while now, I’ve had a sneaking suspicion that CAPTCHAs are getting harder. Increasingly, I’m left wondering, Is that an uppercase “x” or a lowercase “x”? An “o” or a zero? A “q” or an “o” with a squiggle through it? Sometimes, even though I’m 100 per cent sure I’ve typed exactly the right thing, the computer disagrees with me.

For months, I thought I was alone in this frustration. I worried that my increasing inability to pass these tests suggested that I’m not entirely human.

Then last week, I opened an email message from a colleague that read: “You know those distorted letters we have to type to pass security tests online? I notice they’re getting more and more distorted.”

According to Luis von Ahn, one of the computer scientists who coined the term “CAPTCHA,” the tests are getting harder.

“The thing about CAPTCHAs is that many people do their own implementations,” he told me. “Over time, some of these implementations have gotten a lot harder, because the really easy ones – essentially, the undistorted ones – can be broken by bots.”

Traditionally, identifying squiggly, distorted letters has been difficult for computers but comparatively easy for humans. But computers are getting better and better at it, and easy CAPTCHAs aren’t as effective as they once were.

Still, von Ahn says his own implementation of CAPTCHAs, called reCAPTCHA , isn’t getting any harder.

“It’s still the case, as it was three or four years ago, that a person who submits a solution [to reCAPTCHA] is going to be correct 96 per cent of the time,” von Ahn said. “That number remains the same.”

ReCAPTCHA, which was acquired by Google in 2009, generates more than 100 million CAPTCHA images a day for various websites for free. The CAPTCHA images it provides are also used to help decipher words that can’t be identified during the process of digitizating printed material.

Computers are getting better at solving CAPTCHAs because devising automated ways of bipassing the test is potentially lucrative. Imagine that you’re an email spammer. Wouldn’t it be great if you could automatically sign up for hundreds or thousands of bogus email accounts? Or, imagine you’re a ticket scalper. Wouldn’t it be terrific if you could write a computer program to automatically buy all the tickets for a concert? CAPTCHAs can help keep spammers and scalpers at bay.

Because there’s a lot of money to be made, software developers are actively writing code they say can crack CAPTCHAs that, von Ahn says, sells for $10,000. Von Ahn said he has even seen ticket scalpers advertise software they say can break reCAPTCHA for as high as $50,000.

According to von Ahn, it’s simply a matter of time before software will rival humans at solving CAPTCHAs, but it could take decades.

In the meantime, as easier, ineffective systems are phased out, people will continue to be frustrated by some CAPTCHAs. And as frustrating as CAPTCHAs are for the average user, they can be even more frustrating for people who are visually impaired or use screen-reader software. Some CAPTCHA implementations include an audio alternative, but accessibility will continue to be an issue.

Regardless of how they are implemented, CAPTCHAs are all built around the idea of creating a task that’s hard for computers and easy for humans. As computers get better and better at reading squiggly letters, we may be asked to prove our humanity by performing other types of tasks.

For instance, computers are still very bad at determining the contents of a photograph. It’s difficult for software to tell the difference between a photo of a cat and a photo of a dog. Microsoft Research built a CAPTCHA system called ASIRRA (Animal Species Image Recognition for Restricting Access) based on this idea. Companies like Solve Media and NuCaptcha have put their own twists on CAPTCHAs that require users to enter words from a text or video advertisement.

If von Ahn is right and computers will eventually be able to reliably solve text-based CAPTCHAs, that’s not necessarily a bad thing. Though CAPTCHA-busting technology could be used by spammers or ticket scalpers, it could also help decipher hard-to-read parts of digitized books or identify skewed and distorted text in photographs.

So, the next time you’re confounded by a mess of squiggly, distorted letters, don’t be too hard on yourself. Maybe it’s the CAPTCHA’s fault.

As von Ahn told me, “Sometimes, they’re really bad. Sometimes, they are so hard to read that I can’t read them myself.”

Comforting words from one of the people responsible for all those squiggly letters.

#CAPTCHA #Luis von Ahn