Discovering trending phrases on information streams

Authors:
Krishna Y. Kamath;James Caverlee
Affiliations:
Texas A&M University, College Station, TX, USA;Texas A&M University, College Station, TX, USA
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 8
Cited 0

Space-efficient online computation of quantile summaries

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Finding Frequent Items in Data Streams

ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Finding Repeated Elements

Finding Repeated Elements
Medians and beyond: new aggregation techniques for sensor networks

SenSys '04 Proceedings of the 2nd international conference on Embedded networked sensor systems
Approximate counts and quantiles over sliding windows

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A simpler and more efficient deterministic scheme for finding frequent items over sliding windows

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Methods for finding frequent items in data streams

The VLDB Journal — The International Journal on Very Large Data Bases
Efficient computation of frequent and top-k elements in data streams

ICDT'05 Proceedings of the 10th international conference on Database Theory

Quantified Score

Hi-index	0.03

Visualization

Abstract

We study the problem of efficient discovery of trending phrases from high-volume text streams -- be they sequences of Twitter messages, email messages, news articles, or other time-stamped text documents. Most existing approaches return top-k trending phrases. But, this approach neither guarantees that the top-k phrases returned are all trending, nor that all trending phrases are returned. In addition, the value of k is difficult to set and is indifferent to stream dynamics. Hence, we propose an approach that identifies all the trending phrases in a stream and is flexible to the changing stream properties.