The End of CAPTCHA? Testing GPT-4V and AI Solvers vs. CAPTCHA | CHEQ


CAPTCHAs have been a go-to security measure for deterring bots and automated scripts for years. We’ve previously covered how bots and fraudsters are able to bypass CAPTCHAs, but now, a new, cheaper, and more effective tool has emerged: advanced AI.

In this article, we’ll not only delve into the history and vulnerabilities of CAPTCHAs, but we’ll also put them to a real-world test using cutting-edge AI tools like ChatGPT with multimodal capabilities and AI CAPTCHA-solving plugin and API, with which we were able to solve multiple challenges from hCAPTCHA and reCAPTCHA. 

Does multimodal generative AI like GPT-4V mean the end of CAPTCHA?

OpenAI, arguably the highest profile generative AI company, has released the long-awaited GPT-4V, a multimodal update for GPT-4 that will allow the AI model to interact with images, sounds, plugins, and the wider web (via a Bing search integration). 

The tool is capable of answering questions about images, explaining the content and context of images, and performing optical character resolution (meaning it can read text in images). So, it stands to reason that it would now be relatively simple for GPT to solve CAPTCHA puzzles.

Indeed, back in March 2023, researcher Tanishq Mathew Abraham was allowed to use an alpha version of the update and was able to upload a screenshot of a typical “find the crosswalks” reCAPTCHA challenge. ChatGPT promptly identified the images containing sidewalks and provided instructions for solving them.

Fast forward to the current release of GPT-4V, and it’s clear that the team at OpenAI has spent a lot of time evaluating and addressing the potential security ramifications of this capability. In the system card released for GPT-4V, the OpenAI team revealed that during extensive red-teaming in alpha, GPT-4V’s capability for CAPTCHA breaking was analyzed and trained against public datasets of CAPTCHA challenges. 

Ultimately, OpenAI determined that “a powerful, general purpose CAPTCHA breaker that’s easily accessible can have cybersecurity and AI safety implications,” noting that such capabilities could bypass anti-bot security measures and let AI systems interact with “systems intended for human use.” 

In light of that statement, I’d assumed that CAPTCHA solving would simply be against ChatGPT’s strict content policy. To my surprise, when I tried it for myself, I found that ChatGPT was easily able to solve a classic reCAPTCHA “find the crosswalks” puzzle.

In another test, ChatGPT was also able to solve a “find the dissimilar image” hCAPTCHA challenge. However, there were several failures along the way; overall, from my limited testing of five different CAPTCHA challenges, ChatGPT was able to solve four. 

GPT-4V Solves a reCAPTCHA puzzle.
GPT-4V Solves a reCAPTCHA puzzle.

An 80p percent success rate is still pretty impressive, but even if ChatGPT can solve the puzzle, we would still need to find a way to have ChatGPT communicate these instructions to a bot to make this an efficient CAPTCHA killer. And I suspect that’s why the OpenAI team is letting their tool solve CAPTCHAs–it’s a neat way to show off its capabilities, and without an API, there is little bad actors can do to abuse this capability in any efficient manner.

There are, however, plenty of other CAPTCHA solvers publicly available, from home-grown AI tools and GitHub repositories to full-scale CAPTCHA-solver-as-a-service tools.

Later in this article, we’ll test one of these tools and see just how easy it is to write a script to bypass reCAPTCHA and hCAPTHCA. But first, let’s take a look at the long back-and-forth battle between CAPTCHA and AI. 

A brief history of the arms race between CAPTCHA and AI

It’s important to note that this is not CAPTCHA’s first rodeo with AI solvers. Researchers have been using AI-powered optical character recognition (OCR) techniques in attempts to solve CAPTCHA for over a decade, and CAPTCHA is frequently used as a tool to train AI models for better image recognition. Google, for example, has openly used reCAPTCHA challenges to digitize books and help its AI models interpret photographs. 

Early OCR technology was limited, requiring close human supervision for optimal outcomes, but the technology eventually evolved to the point that it was able to solve distorted texts with 99.8% accuracy by 2014, according to a Google study. 

This led to the development of behavioral-based CAPTCHA tools like Google’s reCAPTCHA v2 (now v3) and hCAPTCHA, which themselves use AI models to examine device fingerprints and behavioral data – i.e., how a user interacts with a web page – to make decisions on their validity.

For bot makers and other bad actors, that shift towards behavioral detection essentially moved the battlefield from flat-out CAPTCHA solving to CAPTCHA-bypassing, with many web scrapers and bot programmers opting to focus their efforts on avoiding detection altogether rather than looking for increasingly complicated (and potentially costly) methods to solve the actual challenges. 

In the years since the release of reCAPTCHA v2 in 2014, it’s become common practice for competent web scrapers and for more nefarious bad actors to utilize techniques like user agent spoofing, IP rotation, residential proxies, rate limiting, and the randomization of fingerprinting attributes, like canvas and WebGL data, to evade detection.

But against unsophisticated bots and scrapers, CAPTCHA remained a somewhat viable “last line of defense.” That was, until the explosion of machine learning and generative AI and the subsequent wide-spread democratization of powerful AI tools.

Machine learning and the spread of CAPTCHA-breaking AI

Conversations about CAPTCHA-breaking AI often center around academic research and massive advanced AI models, giving the impression that these tools are limited to those who can afford huge data sets and teams of programmers.

But in 2023, that’s simply not the case. The knowledge and toolsets necessary to build a passable CAPTCHA solver are publicly available, and the internet is full of tutorials that can guide those with the necessary coding skills through the process. 

Creating a CAPTCHA-solving machine learning model involves several steps, each simplified by modern libraries and frameworks. Data collection is the starting point, and open-source datasets offer an easy way to gather diverse CAPTCHA images. Preprocessing these images involves techniques like cropping, noise reduction, and grayscale conversion, tasks that libraries like OpenCV and TensorFlow have made more straightforward.

When it comes to model training, Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are often the architectures of choice. These models are trained to adapt to the variability in CAPTCHA designs, often using data augmentation techniques like rotating and scaling to improve robustness. Modern machine learning models like mobilenet, resnet, or yolo have shown that you don’t need a massive dataset to become effective; a few hundred images can be sufficient.

An easier way: AI CAPTCHA solving-as-a-service

So ML models are more accessible than ever, but let’s be honest, that’s all still pretty academic.  Crafting a custom CAPTCHA-solving bot from scratch is probably outside of the realm of cost-effective effort for the average web scraper, let alone for a non-technical user like me.

Lucky for us, the grey market offers a more accessible solution: CAPTCHA solvers-as-a-service. These services have mushroomed in popularity, providing an easier and more cost-effective way to navigate CAPTCHA challenges. These services make it so that anyone, even those with no coding experience whatsoever,  can easily bypass CAPTCHA. But seeing is believing, so let’s prove it. 

Case study: Passing CAPTCHA challenges with ChatGPT and AI solvers

To prove my thesis, I wanted to create a rudimentary bot that would go through the account creation process on a popular social media website and solve any CAPTCHA challenge that was presented during that process. However, with zero knowledge of Python and no first-hand experience with web scraping, I wasn’t sure exactly where to start.

So, my first step was to consult ChatGPT. I simply told the AI what I wanted to achieve, and it happily walked me through the process, first helping me understand the basics of Python, then guiding me through the installation of Selenium, a popular open-source framework for automating browsers, and eventually helping me write and troubleshoot a script to perform my desired task. 

Finding a CAPTCHA solver

If writing a Python script is outside of my skill set, then building an ML model to solve CAPTCHAs was out of the question. To get past the reCAPTCHA challenge served during the account creation process, I needed to find an AI tool that could be integrated into my script, either via an API or via a plugin on the Selenium-controlled browser. So, I quickly set out to find a cost-effective (i.e., free) solution. Within minutes, I found a tool that was not only cost-effective —(free to use for the first 100 challenges and then approximately $0.0001 USD per CAPTCHA challenge) but also incredibly popular, boasting nearly 2 million total users and approximately 130,000 active users online at the time of writing.

The service offers multiple ways to solve CAPTCHAs. For those who prefer manual navigation but want to bypass CAPTCHAs, there’s a browser extension that automatically solves them. For a more automated but still browser-based solution, the service provides a Recognition API that gives you coordinates for where to click to solve the CAPTCHA. And for fully automated bots that don’t interact with a browser interface, the service offers pre-solved CAPTCHA tokens via a Token API. All of this, according to marketing materials, is achieved via”state-of-the-art deep learning algorithms” from tech giants like Google, Meta, Microsoft, and NVIDIA.

The scripting process

Once I’d selected my tool, I turned back to ChatGPT to help finalize a script. The process was simple, and I was able to easily troubleshoot issues through multiple iterations and even had the AI walk me through the process of finding appropriate identifiers for various actions, such as selecting and using suggested usernames and filling out form fields. Here’s a brief overview of the process:

  • Setting up the environment: With ChatGPT’s guidance, I installed the necessary Python packages and set up my Selenium environment.
ChatGPT guides me through the basic steps to start writing a bot.
  • Installing the browser extension: I configured Selenium to install the CAPTCHA-solving service’s browser extension automatically when launching the browser.
  • Navigating to the site: I wrote a Python script that used Selenium to open a web browser and navigate to the registration page of a popular social media site.
  • Filling Out forms: The script automatically filled in the required fields like username, email, and password.
  • Solving the CAPTCHA challenge: Here’s where the browser extension came into play. As the CAPTCHA appeared, the extension automatically solved it for me. A more complex bot might have opted for the recognition or token API at this step. 
  • Finalizing account creation: Once the CAPTCHA was solved, the script clicked the ‘Submit’ button to complete the registration process, and I was redirected to a welcome page for my new account. 
The CAPTCHA solver easily solves a reCAPTCHA challenge.

The Verdict: Success!

The script worked flawlessly, and I was able to create an account without any manual intervention. Following this success, I created two more scripts that solved demo challenges from hCAPTCHA and reCAPTCHA to verify a high success rate for the CAPTCHA solver. 

Note: All screenshots demonstrate the solving of reCAPTCHA and hCAPTCHA demo pages. I was able to achieve our goal of account creation on a popular social media website, but for obvious reasons, I don’t want to show that process in detail. 

hCAPTCHA challenges are also successfully solved.

While the script I created is rudimentary, it’s worth noting that this is just scratching the surface of what’s possible. In the hands of someone with more technical expertise, these tools can be used to create far more sophisticated and powerful bots capable of bypassing even more complex security measures.

My experiment serves as a basic example, but the implications are far-reaching: if even a beginner can bypass CAPTCHA so easily, imagine what a more skilled individual could accomplish.

Have bots surpassed humans at solving CAPTCHA?

In light of these developments, it’s no surprise that recent research has found that AI bots are now outperforming humans in solving CAPTCHAs. A study conducted by a collaborative team from ETH Zurich, Lawrence Livermore National Laboratory, Microsoft, and the University of California, Irvine revealed that bots are not only more accurate but also faster at decoding CAPTCHAs, effectively outperforming humans by as much as 15%. 

According to the study’s findings, bots demonstrated an accuracy range of 85-100%, substantially outperforming human accuracy, which ranged from 50-85%. Moreover, bots solved these puzzles in significantly less time across the board, with the exception of reCAPTCHA, where human and bot-solving times were nearly identical.

The study underscores a critical shift in the CAPTCHA landscape. Unlike the early days when machine vision was not advanced enough to crack these puzzles, today’s pre-trained vision models can easily identify objects in CAPTCHA images. Some CAPTCHA providers like hCaptcha have tried to adapt by frequently changing the image categories, but this approach has proven to be futile. Modern machine learning models like mobilenet, resnet, or yolo require only a few hundred images to become accurate enough to solve these CAPTCHAs.

Given these developments, the utility of CAPTCHA as a security measure is increasingly questionable. The deterrent is no longer the complexity of the CAPTCHA but the computational expense of solving them at scale, which is dropping precipitously. In essence, today’s CAPTCHAs serve more like a proof-of-work system, akin to Hashcash, than a true security layer.

Moving Past CAPTCHAs with advanced landing page security

While the evolution of CAPTCHAs and their interactions with AI have shaped much of the internet’s defense against bots, it’s time to acknowledge the limitations of relying solely on CAPTCHA in this advanced digital age. In our next post, we’ll talk about some of the other risks of using CAPTCHA, from privacy and compliance concerns to click farms dedicated to defeating CAPTCHA challenges. In the meantime, if you’re seeking a more effective, user-friendly, and compliant security solution for your website, consider exploring CHEQ’s Sign-Up and Lead Protection (SLP).

SLP secures web form processes in real-time, offering advanced user validation methods beyond typical CAPTCHAs. It provides flexible activation modes and enhances CRM quality by filtering out fake leads.. Ready to fortify your website’s security and enhance user experience? Schedule a demo today. 

Latest Posts

Ready to secure your
Go-to-Market efforts?

GET started