Analysis of Hoare's FIND algorithm with median-of-three partition
Random Structures & Algorithms - Special issue: average-case analysis of algorithms
Communications of the ACM
Finding Frequent Items in Data Streams
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Frequency Estimation of Internet Packet Streams with Limited Space
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
A simple algorithm for finding frequent elements in streams and bags
ACM Transactions on Database Systems (TODS)
What's hot and what's not: tracking most frequent items dynamically
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice
ACM Transactions on Computer Systems (TOCS)
Dynamically maintaining frequent items over a data stream
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Finding hierarchical heavy hitters in data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Duplicate detection in click streams
WWW '05 Proceedings of the 14th international conference on World Wide Web
Using association rules for fraud detection in web advertising networks
VLDB '05 Proceedings of the 31st international conference on Very large data bases
An integrated efficient solution for computing frequent and top-k elements in data streams
ACM Transactions on Database Systems (TODS)
Data streams: algorithms and applications
Foundations and Trends® in Theoretical Computer Science
Finding hierarchical heavy hitters in network measurement system
Proceedings of the 2007 ACM symposium on Applied computing
Fast data stream algorithms using associative memories
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Sketching probabilistic data streams
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Quality-Aware Sampling and Its Applications in Incremental Data Mining
IEEE Transactions on Knowledge and Data Engineering
Finding hierarchical heavy hitters in streaming data
ACM Transactions on Knowledge Discovery from Data (TKDD)
Probabilistic lossy counting: an efficient algorithm for finding heavy hitters
ACM SIGCOMM Computer Communication Review
Extracting k most important groups from data efficiently
Data & Knowledge Engineering
The VLDB Journal — The International Journal on Very Large Data Bases
FIDS: Monitoring Frequent Items over Distributed Data Streams
MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
DELAY: A Lazy Approach for Mining Frequent Patterns over High Speed Data Streams
ADMA '07 Proceedings of the 3rd international conference on Advanced Data Mining and Applications
Separator: Sifting Hierarchical Heavy Hitters Accurately from Data Streams
ADMA '07 Proceedings of the 3rd international conference on Advanced Data Mining and Applications
Clustering Distributed Sensor Data Streams
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Finding frequent items in data streams
Proceedings of the VLDB Endowment
Mining top-k Hot Melody Structures over online music query streams
Pattern Recognition Letters
Feature-preserved sampling over streaming data
ACM Transactions on Knowledge Discovery from Data (TKDD)
A sliding window method for finding top-k path traversal patterns over streaming Web click-sequences
Expert Systems with Applications: An International Journal
CLIC: client-informed caching for storage servers
FAST '09 Proccedings of the 7th conference on File and storage technologies
HIDS: a multifunctional generator of hierarchical data streams
ACM SIGMIS Database
Interactive mining of top-K frequent closed itemsets from data streams
Expert Systems with Applications: An International Journal
Expert Systems with Applications: An International Journal
Measuring evolving data streams' behavior through their intrinsic dimension
New Generation Computing
Space-optimal heavy hitters with strong error bounds
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Estimating the confidence of conditional functional dependencies
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Clustering over Evolving Data Streams Based on Online Recent-Biased Approximation
Knowledge Acquisition: Approaches, Algorithms and Applications
Finding the frequent items in streams of data
Communications of the ACM - A View of Parallel Computing
An evaluation study of clustering algorithms in the scope of user communities assessment
Computers & Mathematics with Applications
Methods for finding frequent items in data streams
The VLDB Journal — The International Journal on Very Large Data Bases
Discovering correlated items in data streams
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Event-based lossy compression for effective and efficient OLAP over data streams
Data & Knowledge Engineering
Aggregate computation over data streams
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Mining top-K frequent itemsets through progressive sampling
Data Mining and Knowledge Discovery
Mining discriminative items in multiple data streams
World Wide Web
Space-optimal heavy hitters with strong error bounds
ACM Transactions on Database Systems (TODS)
TOPSIL-Miner: an efficient algorithm for mining top-K significant itemsets over data streams
Knowledge and Information Systems
Private and continual release of statistics
ICALP'10 Proceedings of the 37th international colloquium conference on Automata, languages and programming: Part II
Lightweight problem determination in DBMSs using data stream analysis techniques
Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative Research
Clustering distributed sensor data streams using local processing and reduced communication
Intelligent Data Analysis - Ubiquitous Knowledge Discovery
A practical approach to portscan detection in very high-speed links
PAM'11 Proceedings of the 12th international conference on Passive and active measurement
Mining frequent itemsets over distributed data streams by continuously maintaining a global synopsis
Data Mining and Knowledge Discovery
Space-efficient tracking of persistent items in a massive data stream
Proceedings of the 5th ACM international conference on Distributed event-based system
Private and Continual Release of Statistics
ACM Transactions on Information and System Security (TISSEC)
Mining top-k regular-frequent itemsets using database partitioning and support estimation
Expert Systems with Applications: An International Journal
MOA-TweetReader: real-time analysis in Twitter streaming data
DS'11 Proceedings of the 14th international conference on Discovery science
Discovering trending phrases on information streams
Proceedings of the 20th ACM international conference on Information and knowledge management
Error-adaptive and time-aware maintenance of frequency counts over data streams
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Tracking distributed aggregates over time-based sliding windows
SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
SCALLA: A Platform for Scalable One-Pass Analytics Using MapReduce
ACM Transactions on Database Systems (TODS)
Zips: mining compressing sequential patterns in streams
Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics
A thin monitoring layer for top-k aggregation queries over a database
Proceedings of the 7th International Workshop on Ranking in Databases
Automated signature extraction for high volume attacks
ANCS '13 Proceedings of the ninth ACM/IEEE symposium on Architectures for networking and communications systems
FaRNet: Fast recognition of high-dimensional patterns from big network traffic data
Computer Networks: The International Journal of Computer and Telecommunications Networking
Mining top-k frequent patterns over data streams sliding window
Journal of Intelligent Information Systems
Hi-index | 0.01 |
We propose an integrated approach for solving both problems of finding the most popular k elements, and finding frequent elements in a data stream. Our technique is efficient and exact if the alphabet under consideration is small. In the more practical large alphabet case, our solution is space efficient and reports both top-k and frequent elements with tight guarantees on errors. For general data distributions, our top-k algorithm can return a set of k′ elements, where k′ ≈ k, which are guaranteed to be the top-k' elements; and we use minimal space for calculating frequent elements. For realistic Zipfian data, our space requirement for the frequent elements problem decreases dramatically with the parameter of the distribution; and for top-k queries, we ensure that only the top-k elements, in the correct order, are reported. Our experiments show significant space reductions with no loss in accuracy.