AFilter: adaptable XML filtering with prefix-caching suffix-clustering

  • Authors:
  • K. Selçuk Candan;Wang-Pin Hsiung;Songting Chen;Junichi Tatemura;Divyakant Agrawal

  • Affiliations:
  • NEC Laboratories America, Cupertino, CA;NEC Laboratories America, Cupertino, CA;NEC Laboratories America, Cupertino, CA;NEC Laboratories America, Cupertino, CA;NEC Laboratories America, Cupertino, CA

  • Venue:
  • VLDB '06 Proceedings of the 32nd international conference on Very large data bases
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

XML message filtering problem involves searching for instances of a given, potentially large, set of patterns in a continuous stream of XML messages. Since the messages arrive continuously, it is essential that the filtering rate matches the data arrival rate. Therefore, the given set of filter patterns needs to be indexed appropriately to enable real-time processing of the streaming XML data. In this paper, we propose AFilter, an adaptable, and thus scalable, path expression filtering approach. AFilter has a base memory requirement linear in filter expression and data size. Furthermore, when additional memory is available, AFilter can exploit prefix commonalities in the set of filter expressions using a loosely-coupled prefix caching mechanism as opposed to tightly-coupled active state representation of alternative approaches. Unlike existing systems, AFilter can also exploit suffix-commonalities across filter expressions, while simultaneously leveraging the prefix-commonalities through the cache. Finally, AFilter uses a triggering mechanism to prevent excessive consumption of resources by delaying processing until a trigger condition is observed. Experiment results show that AFilter provides significantly better scalability and runtime performance when compared to state of the art filtering systems.