Scalable Keyword Search on Large Data Streams

  • Authors:
  • Lu Qin;Jeffrey Xu Yu;Lijun Chang;Yufei Tao

  • Affiliations:
  • -;-;-;-

  • Venue:
  • ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

It is widely realized that the integration of information retrieval (IR) and database (DB) techniques provides users with a broad range of high quality services. A new challenging issue along the same direction is IR-styled m-keyword query processing in a RDBMS framework over an open-ended relational data stream. The capability of supporting m-keyword queries over a relational data stream makes it possible for users to monitor events, that are implicitly interrelated, over a relational data stream in a timely manner. In brief, the problem is to find all connected trees whose size is less than or equal to a user-given threshold in terms of number of nodes for a m-keyword query, {k1, k2, · · · , km}, over a relational data stream on a database schema GS. The difficulty of the problem is related to the number of costly joins to be processed over time, which is affected by the parameters such as the number of keywords (m), the maximum size of connected trees (Tmax), as well as the complexity of the database schema when it is viewed as a schema graph (GS). In this paper, we propose a new demand-driven approach to process such a query over a high speed data stream. We show that we can significantly reduce the number of intermediate results when processing joins over a data stream, and therefore can achieve high efficiency.