SpamWatcher: a streaming social network analytic on the IBM wire-speed processor

  • Authors:
  • Qiong Zou;Buğra Gedik;Kun Wang

  • Affiliations:
  • IBM Corporation, China Research Lab, Beijing, China;IBM Corporation, Watson Research Center, New York, NY, USA;IBM Corporation, China Research Lab, Beijing, China

  • Venue:
  • Proceedings of the 5th ACM international conference on Distributed event-based system
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The proliferation of mobile devices, coupled with continuous connectivity, has resulted in a world where massive amounts of data is being produced, on a daily basis, as a result of online interactions between people. These interactions are often captured as relationships in a social network graph, by service providers such as mobile carriers or social web applications. Social network analysis is becoming a common technique for extracting business intelligence from social network graphs in order to improve customer experience and provide better service. Some applications in this domain require processing massive data flows with high throughput and low-latency, in order to deliver timely results. SpamWatcher is a streaming social network analysis application that fits this description. It is used for real-time filtering of short messages in mobile communications, with the goal of preventing spam. The ever increasing volume of mobile users and rates of messages make realtime detection of spam a challenging problem with respect to performance and scalability. In this paper, we present a solution for the SpamWatcher application using the IBM wire-speed processor - a system-on-a-chip with specialized co-processors and integrated network I/O. This solution goes beyond the state-of-the-art by (i) using a novel implementation technique that takes advantage of the pattern matching accelerator to minimize the latency of spam detection, and (ii) employing hardware primitives to reduce the overhead caused by thread synchronization in order to achieve good scalability with respect to number of cores used. Furthermore, the solution is implemented on System S - a commercial-grade stream processing middleware. We evaluate our approach using real-world data sets and experimentally demonstrate the substantial performance improvements it achieves compared to previously published results.