The "CAPTCHA" has infuriated web users for years: It's that login test that asks you to type in a hard-to-read sequence of letters or numbers in order to prove are not a robot. Get one letter wrong and you'll be denied access.
In a bid to ease that irritation, Google launched what it dubbed the "No Captcha ReCAPTCHA" in December, which it claims has the ability to verify a human user by looking at things like the behavior of their mouse movements and the way they type.
But device recognition company AdTruth believes it has found evidence Google's CAPTCHA killer is collecting far more information than mouse coordinates alone, and that it could use the security tool to inform its advertising services too. The new tool isn't overtly labeled as a Google service, yet anyone clicking through it "consents" to be tracked by Google's cookies, AdTruth found. And while the service is intended to do only one thing — determine whether you are a human or not — it is also able to identify a lot more information about which specific human you are.
All of this is poorly disclosed to users, AdTruth believes. Google told Business Insider the signals the new CAPTCHA examines are not used for ad targeting and that the tool does not analyze specific user interests or preferences.
The original CAPTCHA was designed to protect websites from spam and bots, but Google research found that artificial intelligence technology has now become so sophisticated it can solve even the most distorted of text at 99.8% accuracy.
That is why it created the "No CAPTCHA reCAPTCHA" which simply asks users to click a check box, or complete another task, such as selecting all the cats in a selection of images, to confirm they are not a robot. Google says its risk assessment software uses behavioral cues, such as where users click, how long they linger over a checkbox, and their typing cadence, to work out whether they are human or not. Google created this video to explain how the process works, and you can try out a demo of No CAPTCHA reCAPTCHA for yourself here .
No CAPTCHA reCAPTCHA seems so easy and reliable that companies such as Snapchat, BuzzFeed, WordPress, and Humble Bundle immediately signed up to adopt it.
But according to research from AdTruth, seen by Business Insider, Google's No CAPTCHA reCAPTCHA appears to be collecting personally identifiable additional data beyond mere behavioral cues about their users, too.
Here's all the data No CAPTCHA reCAPTCHA collects
AdTruth's lead engineer Marcos Perona was skeptical of Google's claim to look for "human behavior" to distinguish a real person from a bot and decided to investigate. He wanted to find out what Google actually "captures" from a machine with the No CAPTCHA to work out whether a user is a bot or not.
After taking a close look at the embedded code for the No CAPTCHA product, he found that the system used a re-purposed version of Google's Botguard technology, which was originally intended for anti-spam and bot detection within Gmail. On top of that, No Captcha uses a level of encryption that hides what the mechanism is doing, by constantly changing the No CAPTCHA code and encryption keys, making it difficult for bot makers to crack (and it also has the by-product of making it difficult for researchers like Perona to uncover exactly how the No CAPTCHA works.)
But Perona and other anonymous programmers from information security backgrounds, believe they have decoded the new CAPTCHA system and the information it pulls from a browser when a user says they are not a robot. (AdTruth points out to Business Insider that it is not releasing any information that could help botmakers circumvent the No CAPTCHA reCAPTCHA.)
According to Perona, Botguard first takes a look at whether you already have a Google cookie on the machine. The No CAPTCHA reCAPTCHA then drops its own cookie from Google into your browser. It then takes a pixel-by-pixel fingerprint of the user's browser window at that time, pulling information such as:
- IP address
- CSS information from the page you are on
- A count of mouse and touch events
In addition, Google's new CAPTCHA will also make use of any cookies that have been set by other Google properties — like Gmail, Search, Analytics, and so on — in the last six months. The belief is that humans use Google's services in certain "human" ways, whereas bots do not, and those patterns can be detected.
All of this personally identifiable information gets encrypted and sent back to Google.
The reCAPTCHA gives Google "a very high level of entropy when it comes to distinguishing an individual"
Perona told us: "The use of Google.com's domain for the CAPTCHA is completely intentional, as that means Google can drop long-lived cookies in any device that comes into contact with the CAPTCHA, bypassing third-party cookie restrictions [like ad blockers] as long as the device has previously used any service hosted on Google.com."
He added: "The mix of a fingerprint and first-party cookies is pervasive as Google can give a very high level of entropy when it comes to distinguishing an individual person."
The way the new CAPTCHA works also seems to support this theory, as there appears to be at least three main CAPTCHA types, according to AdTruth's research:
- If Google cookies are present, and your fingerprint is obtained, you will often see the checkbox that asks you to prove whether you are a human.
- If you delete all your Google cookies, the CAPTCHA will likely ask you to fill in a two-word CAPTCHA.
- If you are using a form of anti-fingerprinting plugin, Google will likely ask you to fill in a two-word CAPTCHA, regardless of your cookies.
The implication is that Google isn't just looking to identify whether you're a human with its No CAPTCHA, but potentially exactly which human you are. The combination of first-party cookies and a browser fingerprint can be tied back to an individual — and most individuals simply clicking "I'm not a robot" won't know this is happening behind the scenes.
AdTruth EMEA managing director James Collier told us: "This is a way for Google to indirectly link activity outside of Google's properties - collected under the guise of security - to Google's knowledge of that individual, without providing the consumer an opt out for the security fingerprint. When they went to market with reCAPTCHA they spoke about humanity and transparency. But in reality, their intentions appear hidden, as was the case with the collection of location data for traffic maps. It's a question of trust: Google have developed a digital ecosystem that relies on them without question, and as the stakes get higher, consumers and industry alike should wake up to the risk of relying on companies that don't transparently handle clear conflicts of interests in relation to their data."
And it's also the same policy that refers to unique device identifiers and states: "We also use the information to offer you tailored content — like giving you more relevant search and ads."
It potentially means Google could be using the data collected from what is meant to be security software (which, remember, is also placed on sites other than its own), to improve services beyond anti-spam security, like advertising. It's unlikely that would be the case, but Google's policy doesn't say otherwise.
Google combined 60 of its privacy policies into one in 2012. Indeed, in January this year, the UK's Information Commissioner's Office ordered Google to sign a formal undertaking to improve the information it provides to people about how it collects personal data in the UK. The ICO's three-year investigation found Google was "too vague" when describing how it uses personal data gathered across its web services and products.
Business Insider contacted the ICO with AdTruth's findings on Google's No CAPTCHA product. The ICO provided us with this statement: "The Data Protection Act requires organizations to be clear and open about the way they are using people's information. We are currently looking into the information you have provided to establish full details."
Of course, the privacy concerns raised by No CAPTCHA are not limited to the UK or Europe; its products and services are used by hundreds of millions of users across the world.
No CAPTCHA reCAPTCHA raises "some legitimate privacy concerns"
You'll have noticed lots of "cans" and "coulds" in this story. It's extremely hard to verify how often Google is collecting fingerprinting data and how or if the company is using it. But two prominent privacy researchers told Business Insider they found AdTruth's preliminary conclusions "concerning."
Jeremy Gillula, staff technologist at the Electronic Frontier Foundation, told us: "It's definitely concerning that Google is conflating the privacy policies of their security systems like reCAPTCHA with their other products. Many website relied on reCAPTCHA to prevent spam, and just because I want to post on one of those websites doesn't mean I want it connected to Google's profile on me."
He adds that if Google were to commit to not using data collected via No CAPTCHA reCAPTCHA for any purpose other than further developing No CAPTCHA reCAPTCHA, this aspect wouldn't be so bad.
But there would still be issues: "My bigger concern is that by over-identifying whether or not someone is a human by figuring out precisely which human they are, Google is contributing to the trend of making the web harder to use for people who value their privacy. In essence, Google is assuming you're only human if you're part of their system. If you choose not to use Google services, or if you choose to preserve your privacy, then you're essentially classified as a second class citizen."
Steven Murdoch, principal research fellow in the information security research group at University College London's department of computer sciences, agreed that AdTruth's research into the No CAPTCHA does raise "some legitimate privacy concerns."
But he emphasized that it's unlikely to be a conspiracy. Murdoch told us: "In terms of the way that the No CAPTCHA detector works, I think the reason it collects so much information is likely because the detection algorithm is machine-learning based rather than written by hand. Such systems are generally designed by collecting all information which might be of use then letting the machine learning system come up with an optimal decision."
Google provided Business Insider with this statement: "As we've said before, the purpose of 'No CAPTCHA reCAPTCHA' is to fight spam by verifying that users are humans, and not dangerous spam bots. The signals that we examine are not used for ad targeting and the tool does not analyze specific user interests or preferences."
There is no evidence Google is doing, or is planning to do, anything nefarious with the information the new No CAPTCHA reCAPTCHA scans and collects — and it's unlikely Google ever would use the data scraped through the software for advertising purposes.
The software looks at engagement "before, during, and after" an interaction
However, as AdTruth's Collier pointed out, the key issue is a question of trust: Google's own marketing around the launch of the No CAPTCHA reCAPTCHA is scant on details about the user data the software assesses, although the company did acknowledge in a blog post in 2013 that the software looks at engagement "before, during, and after" a CAPTCHA interaction.
Business Insider could only find one article, from Wired, in which it was explained that Google also examines cookies and IP addresses alongside mouse movements and typing behavior (but nothing to do with a fingerprint) to determine whether that user "is the same friendly human Google remembers from elsewhere on the web."
Essentially, even if you are really interested in discovering more about the mechanics behind the No CAPTCHA reCAPTCHA, it's extremely difficult to find an explanation on the web.
The No CAPTCHA reCAPTCHA is an intelligent tool which will no doubt help cut through the deluge of spam and bots attacking sites across the web. But it may be in Google's interest to set out exactly — and more prominently — how that tool is so clever at telling the difference between bots and humans.
Get the latest Google stock price here.