CAPTCHA Challenge Tradeoffs: Familiarity of Strings versus Degradation of Images

Authors:
Sui-Yu Wang;Jon L. Bentley
Affiliations:
Lehigh University, Bethlehem, PA, USA;Avaya Labs Research, Basking Ridge, NJ, USA
Venue:
ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 03
Year:
2006

Citing 0
Cited 5

Embedded noninteractive continuous bot detection

Computers in Entertainment (CIE) - Theoretical and Practical Computer Applications in Entertainment
Assocaptcha: designing human-friendly secure captchas using word associations

CHI '08 Extended Abstracts on Human Factors in Computing Systems
Re: CAPTCHAs: understanding CAPTCHA-solving services in an economic context

USENIX Security'10 Proceedings of the 19th USENIX conference on Security
DevilTyper: a game for CAPTCHA usability evaluation

Computers in Entertainment (CIE) - Theoretical and Practical Computer Applications in Entertainment
AniCAP: an animated 3d CAPTCHA scheme based on motion parallax

CANS'11 Proceedings of the 10th international conference on Cryptology and Network Security

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is a well documented fact that, for human readers, familiar text is more legible than unfamiliar text. Current-generation computer vision systems also are able to exploit some kinds of prior knowledge of linguistic context: for example, many OCR systems can use known lexica (word-lists, such as of commonly occurring English words) to disambiguate interpretations. It is interesting that human readers can exploit various degrees of familiarity: for example, strings of characters which, while not found in dictionaries, are similar to spelled words: e.g. "pronounceable" strings, or strings made up of frequently occurring character n-grams. In contrast to this, computer vision technologies for exploiting such poorly characterized constraints (absent an explicit, complete lexicon) are not yet well developed. This gap in ability may allow us to design stronger CAPTCHAs. We measure the familiarity of challenge strings generated by four methods (described by Bentley and Mallows) and we use the ScatterType CAPTCHA to degrade challenge images. We report the results of a human legibility trial which supports the hypothesis that more familiar strings are indeed more legible in CAPTCHAs. Our measurements may enable engineering CAPTCHAs with a more uniform distribution of difficulty by balancing image degradations against familiarity.