SpamWatcher: a streaming social network analytic on the IBM wire-speed processor

Authors:
Qiong Zou;Buğra Gedik;Kun Wang
Affiliations:
IBM Corporation, China Research Lab, Beijing, China;IBM Corporation, Watson Research Center, New York, NY, USA;IBM Corporation, China Research Lab, Beijing, China
Venue:
Proceedings of the 5th ACM international conference on Distributed event-based system
Year:
2011

Citing 22
Cited 0

Matrix multiplication via arithmetic progressions

STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
Finding and Counting Given Length Cycles (Extended Abstract)

ESA '94 Proceedings of the Second Annual European Symposium on Algorithms
Finding a minimum circuit in a graph

STOC '77 Proceedings of the ninth annual ACM symposium on Theory of computing
Fine-grain multi-thread processor architecture for massively parallel processing

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Mambo: a full system simulator for the PowerPC architecture

ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
Leveraging Social Networks to Fight Spam

Computer
Design, implementation, and evaluation of the linear road bnchmark on the stream processing core

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Fast and memory-efficient regular expression matching for deep packet inspection

Proceedings of the 2006 ACM/IEEE symposium on Architecture for networking and communications systems
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
SPC: a distributed, scalable platform for data mining

Proceedings of the 4th international workshop on Data mining standards, services and platforms
The end of an architectural era: (it's time for a complete rewrite)

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Challenges and experience in prototyping a multi-modal stream analytic and monitoring application on System S

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
SPADE: the system s declarative stream processing engine

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
H-store: a high-performance, distributed main memory transaction processing system

Proceedings of the VLDB Endowment
A code generation approach to optimizing high-performance distributed data stream processing

Proceedings of the 18th ACM conference on Information and knowledge management
Tools and strategies for debugging distributed stream processing applications

Software—Practice & Experience
Visualizing large-scale streaming applications

Information Visualization
Efficient algorithms for large-scale local triangle counting

ACM Transactions on Knowledge Discovery from Data (TKDD)
Introduction to the wire-speed processor and architecture

IBM Journal of Research and Development
Clustering coefficient queries on massive dynamic social networks

WAIM'10 Proceedings of the 11th international conference on Web-age information management
From a stream of relational queries to distributed stream processing

Proceedings of the VLDB Endowment
Finding, counting and listing all triangles in large graphs, an experimental study

WEA'05 Proceedings of the 4th international conference on Experimental and Efficient Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

The proliferation of mobile devices, coupled with continuous connectivity, has resulted in a world where massive amounts of data is being produced, on a daily basis, as a result of online interactions between people. These interactions are often captured as relationships in a social network graph, by service providers such as mobile carriers or social web applications. Social network analysis is becoming a common technique for extracting business intelligence from social network graphs in order to improve customer experience and provide better service. Some applications in this domain require processing massive data flows with high throughput and low-latency, in order to deliver timely results. SpamWatcher is a streaming social network analysis application that fits this description. It is used for real-time filtering of short messages in mobile communications, with the goal of preventing spam. The ever increasing volume of mobile users and rates of messages make realtime detection of spam a challenging problem with respect to performance and scalability. In this paper, we present a solution for the SpamWatcher application using the IBM wire-speed processor - a system-on-a-chip with specialized co-processors and integrated network I/O. This solution goes beyond the state-of-the-art by (i) using a novel implementation technique that takes advantage of the pattern matching accelerator to minimize the latency of spam detection, and (ii) employing hardware primitives to reduce the overhead caused by thread synchronization in order to achieve good scalability with respect to number of cores used. Furthermore, the solution is implemented on System S - a commercial-grade stream processing middleware. We evaluate our approach using real-world data sets and experimentally demonstrate the substantial performance improvements it achieves compared to previously published results.