Methods for mining frequent items in data streams: an overview

Authors:
Hongyan Liu;Yuan Lin;Jiawei Han
Affiliations:
Tsinghua University, School of Economics and Management, 100084, Beijing, China;University of Washington, Information School, 98195-2840, Seattle, WA, USA;University of Illinois at Urbana-Champaign, Department of Computer Science, 61801, Urbana, IL, USA
Venue:
Knowledge and Information Systems
Year:
2011

Citing 0
Cited 6

MOA-TweetReader: real-time analysis in Twitter streaming data

DS'11 Proceedings of the 14th international conference on Discovery science
A randomized algorithm for finding frequent elements in streams using o(loglogn) space

ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
Looking for a structural characterization of the sparseness measure of (frequent closed) itemset contexts

Information Sciences: an International Journal
Finding patterns in large star schemas at the right aggregation level

MDAI'12 Proceedings of the 9th international conference on Modeling Decisions for Artificial Intelligence
Pushing constraints into data streams

Proceedings of the 2nd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Mining stable patterns in multiple correlated databases

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many real-world applications, information such as web click data, stock ticker data, sensor network data, phone call records, and traffic monitoring data appear in the form of data streams. Online monitoring of data streams has emerged as an important research undertaking. Estimating the frequency of the items on these streams is an important aggregation and summary technique for both stream mining and data management systems with a broad range of applications. This paper reviews the state-of-the-art progress on methods of identifying frequent items from data streams. It describes different kinds of models for frequent items mining task. For general models such as cash register and Turnstile, we classify existing algorithms into sampling-based, counting-based, and hashing-based categories. The processing techniques and data synopsis structure of each algorithm are described and compared by evaluation measures. Accordingly, as an extension of the general data stream model, four more specific models including time-sensitive model, distributed model, hierarchical and multi-dimensional model, and skewed data model are introduced. The characteristics and limitations of the algorithms of each model are presented, and open issues waiting for study and improvement are discussed.