What is a CAPTCHA?


	This CAPTCHA of "smwm" obscures its message from computer interpretation by twisting the letters and adding a slight background color gradient.

A CAPTCHA (an acronym for "Completely Automated Public Turing test to tell Computers and Humans Apart") is a type of challenge-response test used in computing to determine whether or not the user is human.

The term was coined in 2003 by Luis von Ahn, Manuel Blum, Nicholas J. Hopper, and John Langford. The most common type of CAPTCHA was first invented in 1997 by Mark D. Lillibridge, Martin Abadi, Krishna Bharat, and Andrei Z. Broder. This form of CAPTCHA requires that the user type the letters of a distorted image, sometimes with the addition of an obscured sequence of letters or digits that appears on the screen. Because the test is administered by a computer, in contrast to the standard Turing test that is administered by a human, a CAPTCHA is sometimes described as a reverse Turing test. This term is ambiguous because it could also mean a Turing test in which the participants are both attempting to prove they are the computer.

This user identification procedure has received many criticisms, especially from disabled people, but also from other people who feel that their everyday work is slowed down by distorted words that are illegible even for users with no disabilities at all. It takes the average person approximately 10 seconds to solve a typical CAPTCHA.

Origin and inventorship

Since the early days of the Internet, users have wanted to make text illegible to computers. The first such people could be hackers, posting about sensitive topics to online forums they thought were being automatically monitored for keywords. To circumvent such filters, they would replace a word with look-alike characters. HELLO could become |-|3|_|_() or )-(3££0, as well as numerous other variants, such that a filter could not possibly detect all of them. This later became known as leetspeak.

Subsequent to that work, two teams of people have claimed to be the first to invent the CAPTCHAs used throughout the Web today. The first team consists of Mark D. Lillibridge, Martín Abadi, Krishna Bharat, and Andrei Broder, who used CAPTCHAs in 1997 at AltaVista to prevent bots from adding URLs to their search engine. Looking for a way to make their images resistant to OCR attack, the team looked at the manual of their Brother scanner, which had recommendations for improving OCR's results (similar typefaces, plain backgrounds, etc.). The team created puzzles by attempting to simulate what the manual claimed would cause bad OCR.

The second team to claim inventorship of CAPTCHAs consists of Luis von Ahn, Manuel Blum, Nicholas J. Hopper, and John Langford, who first described CAPTCHAs in a 2003 publication and subsequently received much coverage in the popular press. Their notion of CAPTCHA covers any program that can distinguish humans from computers, including many different examples of CAPTCHAs.

The controversy of inventorship has been settled by the existence of a 1998 patent by Lillibridge, Abadi, Bharat, and Broder, which predates other publications by several years. Though the patent does not use the term CAPTCHA, it describes the ideas in detail and precisely depicts the graphical CAPTCHAs used in the Web today.

Characteristics

CAPTCHAs are, by definition, fully automated, requiring little human maintenance or intervention to administer. This has obvious benefits in cost and reliability.

By definition, the algorithm used to create the CAPTCHA must be made public, though it may be covered by a patent. This is done to demonstrate that breaking it requires the solution to a difficult problem in the field of artificial intelligence (AI) rather than just the discovery of the (secret) algorithm, which could be obtained through reverse engineering or other means.

Modern text-based CAPTCHAS are designed such that they require the simultaneous use of three separate abilities—invariant recognition, segmentation, and parsing—to correctly complete the task with any consistency.

Invariant recognition refers to the ability to recognize the large amount of variation in the shapes of letters. There are nearly an infinite number of versions for each character that a human brain can successfully identify. The same is not true for a computer, and teaching it to recognize all those differing formations is an extremely challenging task.
Segmentation, or the ability to separate one letter from another, is also made difficult in CAPTCHAs, as characters are crowded together with no white space in between.
Context is also critical. The CAPTCHA must be understood holistically to correctly identify each character. For example, in one segment of a CAPTCHA, a letter might look like an “m.” Only when the whole word is taken into context does it become clear that it is a “u” and an “n.”

Each of these problems pose a significant challenge for a computer, even in isolation. The presence of all three at the same time is what makes CAPTCHAs difficult to solve.

Unlike computers, humans excel at this type of task. While segmentation and recognition are two separate processes necessary for understanding an image for a computer, they are part of the same process for a person. For example, when an individual understands that the first letter of a CAPTCHA is an “a”, that individual also understands where the contours of that “a” are, and also where it melds with the contours of the next letter. Additionally, the human brain is capable of dynamic thinking based upon context. It is able to keep multiple explanations alive and then pick the one that is the best explanation for the whole input based upon contextual clues. This also means it will not be fooled by variations in letters.

Accessibility

CAPTCHAs based on reading text — or other visual-perception tasks — prevent blind or visually impaired users from accessing the protected resource. However, CAPTCHAs do not have to be visual. Any hard artificial intelligence problem, such as speech recognition, can be used as the basis of a CAPTCHA. Some implementations of CAPTCHAs permit users to opt for an audio CAPTCHA. Other implementations do not require users to enter text, instead asking the user to pick images with common themes from a random selection.

For non-sighted users (for example blind users, or the color blind on a color-using test), visual CAPTCHAs present serious problems. Because CAPTCHAs are designed to be unreadable by machines, common assistive technology tools such as screen readers cannot interpret them. Since sites may use CAPTCHAs as part of the initial registration process, or even every login, this challenge can completely block access. In certain jurisdictions, site owners could become target of litigation if they are using CAPTCHAs that discriminate against certain people with disabilities. For example, a CAPTCHA may make a site incompatible with Section 508 in the United States. In other cases, those with sight difficulties can choose to identify a word being read to them.

While providing an audio CAPTCHA allows blind users to read the text, it still hinders those who are both blind and deaf. According to sense.org.uk, about 4% of people over 60 in the UK have both vision and hearing impairments. There are about 23,000 people in the UK who have serious vision and hearing impairments. According to The National Technical Assistance Consortium for Children and Young Adults Who Are Deaf-Blind (NTAC), the number of deafblind children in the USA increased from 9,516 to 10,471 during the period 2004 to 2012. Gallaudet University quotes 1980 to 2007 estimates which suggest upwards of 35,000 fully deafblind adults in the USA. Deafblind population estimates depend heavily on the degree of impairment used in the definition.

The use of CAPTCHA thus excludes a small number of individuals from using significant subsets of such common Web-based services as PayPal, GMail, Orkut, Yahoo!, many forum and weblog systems, etc.

Even for perfectly sighted individuals, new generations of graphical CAPTCHAs, designed to overcome sophisticated recognition software, can be very hard or impossible to read.

A method of improving the CAPTCHA to ease the work with it was proposed by ProtectWebForm and was called "Smart CAPTCHA". Developers advise to combine the CAPTCHA with JavaScript support. Since it is too hard for most of spam robots to parse and execute JavaScript, using a simple script which fills the CAPTCHA fields and hides the image and the field from human eyes was proposed.

One alternative method involves displaying to the user a simple mathematical equation and requiring the user to enter the solution as verification. Although these are much easier to defeat using software, they are suitable for scenarios where graphical imagery is not appropriate, and they provide a much higher level of accessibility for blind users than the image-based CAPTCHAs. These are sometimes referred to as MAPTCHAs (M = 'Mathematical'). However, these may be difficult for users with a cognitive disorder.

Other kinds of challenges, such as those that require understanding the meaning of some text (e.g., a logic puzzle, trivia question, or instructions on how to create a password) can also be used as a CAPTCHA. Again, there is little research into their resistance against countermeasures.

BACK

Knowledgebase

Categories

Categories

Origin and inventorship

Characteristics

Accessibility

Related Articles

Support

Knowledgebase

Categories

Categories

What is a CAPTCHA?

Origin and inventorship

Characteristics

Accessibility

Related Articles

Support

Generate Password