Bubble trouble: off-line de-anonymization of bubble forms

Authors:
Joseph A. Calandrino;William Clarkson;Edward W. Felten
Affiliations:
Department of Computer Science, Princeton University;Department of Computer Science, Princeton University;Department of Computer Science, Princeton University
Venue:
SEC'11 Proceedings of the 20th USENIX conference on Security
Year:
2011

Citing 9
Cited 0

Biometric identification

Communications of the ACM
Automatic On-line Signature Verification

ACCV '98 Proceedings of the Third Asian Conference on Computer Vision-Volume I - Volume I
Preserving Privacy by De-Identifying Face Images

IEEE Transactions on Knowledge and Data Engineering
On estimating the size and confidence of a statistical audit

EVT'07 Proceedings of the USENIX Workshop on Accurate Electronic Voting Technology
Machine-assisted election auditing

EVT'07 Proceedings of the USENIX Workshop on Accurate Electronic Voting Technology
The cost of privacy: destruction of data-mining utility in anonymized data publishing

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Fingerprinting Blank Paper Using Commodity Scanners

SP '09 Proceedings of the 2009 30th IEEE Symposium on Security and Privacy
Some consequences of paper fingerprinting for elections

EVT/WOTE'09 Proceedings of the 2009 conference on Electronic voting technology/workshop on trustworthy elections
Differential privacy

ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part II

Quantified Score

Hi-index	0.05

Visualization

Abstract

Fill-in-the-bubble forms are widely used for surveys, election ballots, and standardized tests. In these and other scenarios, use of the forms comes with an implicit assumption that individuals' bubble markings themselves are not identifying. This work challenges this assumption, demonstrating that fill-in-the-bubble forms could convey a respondent's identity even in the absence of explicit identifying information. We develop methods to capture the unique features of a marked bubble and use machine learning to isolate characteristics indicative of its creator. Using surveys from more than ninety individuals, we apply these techniques and successfully reidentify individuals from markings alone with over 50% accuracy. This bubble-based analysis can have either positive or negative implications depending on the application. Potential applications range from detection of cheating on standardized tests to attacks on the secrecy of election ballots. To protect against negative consequences, we discuss mitigation techniques to remove a bubble's identifying characteristics. We suggest additional tests using longitudinal data and larger datasets to further explore the potential of our approach in realworld applications.