Scientists have created a computer model that can “solve” CAPTCHAs.

CAPTCHA is a widespread text-based system used by websites to check whether a user is human, with very little training data. 

Much like the human brain, the researchers behind the new model say it has the ability to learn and generalise using relatively few examples, especially compared to current deep learning models.

The researchers say it is roughly 300 times more data-efficient than previous models.

CAPTCHAs are made to be uncrackable for computer algorithms by clustering many different letter combinations together in a million different styles.

While humans can naturally recognise an object even amidst layers of overlap or styles, computers have difficulty classifying each letter from the jumble.

Previous algorithms for solving CAPTCHA are data-intensive, requiring training on millions of labelled CAPTCHA image examples or coded rules on how to crack each type of image.

But researchers now have a more efficient model, dubbed the Recursive Cortical Network (RCN), which incorporates insights from neuroscience to “train” the computer to generalise beyond what it is initially taught.

The key to RCN’s success, the authors say, is that it is encoded with strong assumptions that it then uses to recognise inputs it never encountered in training.

With this, RCN can solve CAPTCHA text, identify handwritten digits, delineate complexly layered objects and recognise text in photos of real-world scenarios.

Compared to state-of-the-art deep learning approaches for reading text, RCN has comparable or higher accuracy while using around 5,000 times fewer training images.

The authors of a new study on the RCN model suggest the need for more robust spam-thwarting and human-checking techniques that go beyond what is encoded into today’s CAPTCHA system. 

The research is accessible here.