Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences
A linear-time probabilistic counting algorithm for database applications
ACM Transactions on Database Systems (TODS)
Summary cache: a scalable wide-area web cache sharing protocol
IEEE/ACM Transactions on Networking (TON)
Dynamic Programming and Optimal Control
Dynamic Programming and Optimal Control
Maintaining Stream Statistics over Sliding Windows
SIAM Journal on Computing
New directions in traffic measurement and accounting
Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Frequency Estimation of Internet Packet Streams with Limited Space
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
What's hot and what's not: tracking most frequent items dynamically
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice
ACM Transactions on Computer Systems (TOCS)
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Discovery of maximum length frequent itemsets
Information Sciences: an International Journal
Probabilistic lossy counting: an efficient algorithm for finding heavy hitters
ACM SIGCOMM Computer Communication Review
Efficient single-pass frequent pattern mining using a prefix-tree
Information Sciences: an International Journal
Frequent items in streaming data: An experimental evaluation of the state-of-the-art
Data & Knowledge Engineering
Sliding window-based frequent pattern mining over data streams
Information Sciences: an International Journal
A false negative approach to mining frequent itemsets from high speed transactional data streams
Information Sciences: an International Journal
Mining frequent patterns in a varying-size sliding window of online transactional data streams
Information Sciences: an International Journal
TJJE: An efficient algorithm for top-k join on massive data
Information Sciences: an International Journal
Mining frequent items in data stream using time fading model
Information Sciences: an International Journal
Mining top-k frequent patterns over data streams sliding window
Journal of Intelligent Information Systems
Hi-index | 0.07 |
Identifying the most frequent elements in a data stream is a well known and difficult problem. Identifying the most frequent elements for each individual, especially in very large populations, is even harder. The use of fast and small memory footprint algorithms is paramount when the number of individuals is very large. In many situations such analysis needs to be performed and kept up to date in near real time. Fortunately, approximate answers are usually adequate when dealing with this problem. This paper presents a new and innovative algorithm that addresses this problem by merging the commonly used counter-based and sketch-based techniques for top-k identification. The algorithm provides the top-k list of elements, their frequency and an error estimate for each frequency value. It also provides strong guarantees on the error estimate, order of elements and inclusion of elements in the list depending on their real frequency. Additionally the algorithm provides stochastic bounds on the error and expected error estimates. Telecommunications customer's behavior and voice call data is used to present concrete results obtained with this algorithm and to illustrate improvements over previously existing algorithms.