Uncovering Spoken Phrases in Encrypted Voice over IP Conversations

Authors:
Charles V. Wright;Lucas Ballard;Scott E. Coull;Fabian Monrose;Gerald M. Masson
Affiliations:
MIT Lincoln Laboratory;Google Inc.;University of North Carolina, Chapel Hill;University of North Carolina, Chapel Hill;Johns Hopkins University
Venue:
ACM Transactions on Information and System Security (TISSEC)
Year:
2010

Citing 15
Cited 7

The CCITT 16 kbit/s speech coding recommendation G.728

Speech Communication - Special issue on CCITT 16 kbit/s voice encoding standard
Statistical methods for speech recognition

Statistical methods for speech recognition
Statistical Identification of Encrypted Web Browsing Traffic

SP '02 Proceedings of the 2002 IEEE Symposium on Security and Privacy
Sub-Band Based Recognition of Noisy Speech

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
A CELP Variable Rate Speech Codec with Low Average Rate

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
HMM-based word alignment in statistical translation

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Tracking anonymous peer-to-peer VoIP calls on the internet

Proceedings of the 12th ACM conference on Computer and communications security
Inferring the source of encrypted HTTP connections

Proceedings of the 13th ACM conference on Computer and communications security
Finding "Who Is Talking to Whom" in VoIP Networks via Progressive Stream Clustering

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Timing analysis of keystrokes and timing attacks on SSH

SSYM'01 Proceedings of the 10th conference on USENIX Security Symposium - Volume 10
Tor: the second-generation onion router

SSYM'04 Proceedings of the 13th conference on USENIX Security Symposium - Volume 13
Language identification of encrypted VoIP traffic: Alejandra y Roberto or Alice and Bob?

SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
Devices that tell on you: privacy trends in consumer ubiquitous computing

SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
Spot Me if You Can: Uncovering Spoken Phrases in Encrypted VoIP Conversations

SP '08 Proceedings of the 2008 IEEE Symposium on Security and Privacy
Recognizing voice over IP: a robust front-end for speechrecognition on the world wide web

IEEE Transactions on Multimedia

Adaptive security and privacy for mHealth sensing

HealthSec'11 Proceedings of the 2nd USENIX conference on Health security and privacy
Adapt-lite: privacy-aware, secure, and efficient mhealth sensing

Proceedings of the 10th annual ACM workshop on Privacy in the electronic society
Tag size does matter: attacks and proofs for the TLS record protocol

ASIACRYPT'11 Proceedings of the 17th international conference on The Theory and Application of Cryptology and Information Security
Website detection using remote traffic analysis

PETS'12 Proceedings of the 12th international conference on Privacy Enhancing Technologies
Privacy in mobile technology for personal healthcare

ACM Computing Surveys (CSUR)
Identity, location, disease and more: inferring your secrets from android public resources

Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security
Wiretap-proof: what they hear is not what you speak, and what you speak they do not hear

Proceedings of the 4th ACM conference on Data and application security and privacy

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although Voice over IP (VoIP) is rapidly being adopted, its security implications are not yet fully understood. Since VoIP calls may traverse untrusted networks, packets should be encrypted to ensure confidentiality. However, we show that it is possible to identify the phrases spoken within encrypted VoIP calls when the audio is encoded using variable bit rate codecs. To do so, we train a hidden Markov model using only knowledge of the phonetic pronunciations of words, such as those provided by a dictionary, and search packet sequences for instances of specified phrases. Our approach does not require examples of the speaker’s voice, or even example recordings of the words that make up the target phrase. We evaluate our techniques on a standard speech recognition corpus containing over 2,000 phonetically rich phrases spoken by 630 distinct speakers from across the continental United States. Our results indicate that we can identify phrases within encrypted calls with an average accuracy of 50%, and with accuracy greater than 90% for some phrases. Clearly, such an attack calls into question the efficacy of current VoIP encryption standards. In addition, we examine the impact of various features of the underlying audio on our performance and discuss methods for mitigation.