Information discovery across multiple streams

Authors:
Vagelis Hristidis;Oscar Valdivia;Michail Vlachos;Philip S. Yu
Affiliations:
School of Computing and Information Sciences, Florida International University, United States;School of Computing and Information Sciences, Florida International University, United States;IBM T.J. Watson Research Center, United States;Dept. of Comp. Science, University of Illinois, Chicago, United States
Venue:
Information Sciences: an International Journal
Year:
2009

Citing 29
Cited 4

The SIFT information dissemination system

ACM Transactions on Database Systems (TODS)
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Filtering algorithms and implementation for very fast publish/subscribe systems

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Characterizing memory requirements for queries over continuous data streams

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The Computational Complexity of High-Dimensional Correlation Search

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Proximity Search in Databases

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Bursty and hierarchical structure in streams

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Approximate join processing over data streams

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Detection of complex temporal patterns over data streams

Information Systems - Special issue: ADBIS 2002: Advances in databases and information systems
Static optimization of conjunctive queries with sliding windows over infinite streams

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Load management and high availability in the Medusa distributed stream processing system

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Holistic aggregates in a networked world: distributed tracking of approximate quantiles

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Multiple aggregations over data streams

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Parameter free bursty events detection in text streams

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Bidirectional expansion for keyword search on graph databases

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Stream window join: tracking moving objects in sensor-network databases

SSDBM '03 Proceedings of the 15th International Conference on Scientific and Statistical Database Management
Continuous keyword search on multiple text streams

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Finding recently frequent itemsets adaptively over online transactional data streams

Information Systems
Keyword search on relational data streams

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
XTREAM: An efficient multi-query evaluation on streaming XML data

Information Sciences: an International Journal
Mining correlated bursty topic patterns from coordinated text streams

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Discover: keyword search in relational databases

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Distributed top-N query processing with possibly uncooperative local systems

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Processing sliding window multi-joins in continuous queries over data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Supporting top-K join queries in relational databases

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Efficient IR-style keyword search over relational databases

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Objectrank: authority-based keyword search in databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Selectivity-sensitive shared evaluation of multiple continuous XPath queries over XML streams

Information Sciences: an International Journal
Building a configurable publish/subscribe notification service

DAIS'05 Proceedings of the 5th IFIP WG 6.1 international conference on Distributed Applications and Interoperable Systems

Continuous monitoring of skylines over uncertain data streams

Information Sciences: an International Journal
Mining frequent patterns across multiple data streams

Proceedings of the 20th ACM international conference on Information and knowledge management
A time-varying propagation model of hot topic on BBS sites and Blog networks

Information Sciences: an International Journal
Optimized weights of document keywords for auto-reply accuracy

Neurocomputing

Quantified Score

Hi-index	0.07

Visualization

Abstract

In this paper we address the issue of continuous keyword queries on multiple textual streams and explore techniques for extracting useful information from them. The paper represents, to our best knowledge, the first approach that performs keyword search on a multiplicity of textual streams. The scenario that we consider is quite intuitive; let's assume that a research or financial analyst is searching for information on a topic, continuously polling data from multiple (and possibly heterogeneous) text streams, such as RSS feeds, blogs, etc. The topic of interest can be described with the aid of several keywords. Current filtering approaches would just identify single text streams containing some of the keywords. However, it would be more flexible and powerful to search across multiple streams, which may collectively answer the analyst's question. We present such model that takes in consideration the continuous flow of text in streams and uses efficient pipelined algorithms such that results are output as soon as they are available. The proposed model is evaluated analytically and experimentally, where the Enron dataset and a variety of blog datasets are used for our experiments.