Scalable keyword search on large data streams

Authors:
Lu Qin;Jeffrey Xu Yu;Lijun Chang
Affiliations:
The Chinese University of Hong Kong, Hong Kong, China;The Chinese University of Hong Kong, Hong Kong, China;The Chinese University of Hong Kong, Hong Kong, China
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2011

Citing 28
Cited 3

Using Semi-Joins to Solve Relational Queries

Journal of the ACM (JACM)
The SIFT information dissemination system

ACM Transactions on Database Systems (TODS)
Filtering algorithms and implementation for very fast publish/subscribe systems

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Approximate join processing over data streams

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
DBXplorer: A System for Keyword-Based Search over Relational Databases

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Keyword Searching and Browsing in Databases using BANKS

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Static optimization of conjunctive queries with sliding windows over infinite streams

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
PIPES: a public infrastructure for processing and exploring streams

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Bidirectional expansion for keyword search on graph databases

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Approximating StreamingWindow Joins Under CPU Limitations

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Finding and approximating top-k answers in keyword proximity search

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Effective keyword search in relational databases

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Continuous keyword search on multiple text streams

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Spark: top-k keyword query in relational databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
BLINKS: ranked keyword searches on graphs

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Keyword search on relational data streams

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Discover: keyword search in relational databases

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Efficient IR-style keyword search over relational databases

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Memory-limited execution of windowed stream joins

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Objectrank: authority-based keyword search in databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Schema-Aware Keyword Search over XML Streams

CIT '07 Proceedings of the 7th IEEE International Conference on Computer and Information Technology
Authority-based keyword search in databases

ACM Transactions on Database Systems (TODS)
EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Keyword proximity search in complex data graphs

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Keyword search on external memory data graphs

Proceedings of the VLDB Endowment
Efficient Algorithms for Skyline Top-K Keyword Queries on XML Streams

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Scalable Keyword Search on Large Data Streams

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Querying Communities in Relational Databases

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering

Efficient fuzzy full-text type-ahead search

The VLDB Journal — The International Journal on Very Large Data Bases
Scalable top-k keyword search in relational databases

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part II
Automatic faceted navigation

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is widely recognized that the integration of information retrieval (IR) and database (DB) techniques provides users with a broad range of high quality services. Along this direction, IR-styled m-keyword query processing over a relational database in an rdbms framework has been well studied. It finds all hidden interconnected tuple structures, for example connected trees that contain keywords and are interconnected by sequences of primary/foreign key relationships among tuples. A new challenging issue is how to monitor events that are implicitly interrelated over an open-ended relational data stream for a user-given m-keyword query. Such a relational data stream is a sequence of tuple insertion/deletion operations. The difficulty of the problem is related to the number of costly joins to be processed over time when tuples are inserted and/or deleted. Such cost is mainly affected by three parameters, namely, the number of keywords, the maximum size of interconnected tuple structures, and the complexity of the database schema when it is viewed as a schema graph. In this paper, we propose new approaches. First, we propose a novel algorithm to efficiently determine all the joins that need to be processed for answering an m-keyword query. Second, we propose a new demand-driven approach to process such a query over a high speed relational data stream. We show that we can achieve high efficiency by significantly reducing the number of intermediate results when processing joins over a relational data stream. The proposed new techniques allow us to achieve high scalability in terms of both query plan generation and query plan execution. We conducted extensive experimental studies using synthetic data and real data to simulate a relational data stream. Our approach significantly outperforms existing algorithms.