MOVE: A Large Scale Keyword-Based Content Filtering and Dissemination System

  • Authors:
  • Weixiong Rao;Lei Chen;Pan Hui;Sasu Tarkoma

  • Affiliations:
  • -;-;-;-

  • Venue:
  • ICDCS '12 Proceedings of the 2012 IEEE 32nd International Conference on Distributed Computing Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Web 2.0 era is characterized by the emergence of a very large amount of live content. A real time and fine grained content filtering approach can precisely keep users up-to-date the information that they are interested. The key of the approach is to offer a scalable match algorithm. One might treat the content match as a special kind of content search, and resort to the classic algorithm [5]. However, due to blind flooding, [5] cannot be simply adapted for scalable content match. To increase the throughput of scalable match, we propose an adaptive approach to allocate (i.e, replicate and partition) filters. The allocation is based on our observation on real datasets: most users prefer to use short queries, consisting of around 2-3 terms per query, and web content typically contains tens and even thousands of terms per article. Thus, by reducing the number of processed documents, we can reduce the latency of matching large articles with filters, and have chance to achieve higher throughput. We implement our approach on an open source project, Apache Cassandra. The experiment with real datasets shows that our approach can achieve around folds of better throughput than two counterpart state-of-the-arts solutions.