Detecting near-duplicate SPITs in voice mailboxes using hashes

Authors:
Ge Zhang;Simone Fischer-Hübner
Affiliations:
Karlstad University, Karlstad, Sweden;Karlstad University, Karlstad, Sweden
Venue:
ISC'11 Proceedings of the 14th international conference on Information security
Year:
2011

Citing 7
Cited 0

Detecting image spam using visual features and near duplicate detection

Proceedings of the 17th international conference on World Wide Web
Spot Me if You Can: Uncovering Spoken Phrases in Encrypted VoIP Conversations

SP '08 Proceedings of the 2008 IEEE Symposium on Security and Privacy
Adaptive Voice Spam Control with User Behavior Analysis

HPCC '09 Proceedings of the 2009 11th IEEE International Conference on High Performance Computing and Communications
Collaborative Reputation-based Voice Spam Filtering

DEXA '09 Proceedings of the 2009 20th International Workshop on Database and Expert Systems Application
Phonotactic Reconstruction of Encrypted VoIP Conversations: Hookt on Fon-iks

SP '11 Proceedings of the 2011 IEEE Symposium on Security and Privacy
Near-Duplicate mail detection based on URL information for spam filtering

ICOIN'06 Proceedings of the 2006 international conference on Information Networking: advances in Data Communications and Wireless Networks
Progressive multi gray-leveling: a voice spam protection algorithm

IEEE Network: The Magazine of Global Internetworking

Quantified Score

Hi-index	0.00

Visualization

Abstract

Spam over Internet Telephony (SPIT) is a threat to the use of Voice of IP (VoIP) systems. One kind of SPIT can make unsolicited bulk calls to victims' voice mailboxes and then send them a prepared audio message. We detect this threat within a collaborative detection framework by comparing unknown VoIP flows with known SPIT samples since the same audio message generates VoIP flows with the same flow patterns (e.g., the sequence of packet sizes). In practice, however, these patterns are not exactly identical: (1) a VoIP flow may be unexpectedly altered by network impairments (e.g., delay jitter and packet loss); and (2) a sophisticated SPITer may dynamically generate each flow. For example, the SPITer employs a Text-To-Speech (TTS) synthesis engine to generate a speech audio instead of using a pre-recorded one. Thus, we measure the similarity among flows using local-sensitive hash algorithms. A close distance between the hash digest of flow x and a known SPIT suggests that flow x probably belongs the same bulk of the known SPIT. Finally, we also experimentally study the detection performance of the hash algorithms.