Fast XML document filtering by sequencing twig patterns

  • Authors:
  • Joonho Kwon;Praveen Rao;Bongki Moon;Sukho Lee

  • Affiliations:
  • Advanced Institutes of Convergence Technology, Gyeonggi-do, Korea;University of Missouri-Kansas City, Kansas City, MO;University of Arizona, AZ;Seoul National University, Seoul, Korea

  • Venue:
  • ACM Transactions on Internet Technology (TOIT)
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

XML-enabled publish-subscribe (pub-sub) systems have emerged as an increasingly important tool for e-commerce and Internet applications. In a typical pub-sub system, subscribed users specify their interests in a profile expressed in the XPath language. Each new data content is then matched against the user profiles so that the content is delivered only to the interested subscribers. As the number of subscribed users and their profiles can grow very large, the scalability of the service is critical to the success of pub-sub systems. In this article, we propose a novel scalable filtering system called iFiST that transforms user profiles of a twig pattern expressed in XPath into sequences using the Prüfer's method. Consequently, instead of breaking a twig pattern into multiple linear paths and matching them separately, FiST performs holistic matching of twig patterns with each incoming document in a bottom-up fashion. FiST organizes the sequences into a dynamic hash-based index for efficient filtering, and exploits the commonality among user profiles to enable shared processing during the filtering phase. We demonstrate that the holistic matching approach reduces filtering cost and memory consumption, thereby improving the scalability of FiST.