Optimal histograms for limiting worst-case error propagation in the size of join results
ACM Transactions on Database Systems (TODS)
Randomized algorithms
Balancing histogram optimality and practicality for query result size estimation
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Communication complexity
New sampling-based summary statistics for improving approximate query answers
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Tracking join and self-join sizes in limited storage
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The space complexity of approximating the frequency moments
Journal of Computer and System Sciences
Synopsis data structures for massive data sets
External memory algorithms
Even strongly universal hashing is pretty fast
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Fast, small-space algorithms for approximate histogram maintenance
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Querying and mining data streams: you only get one look a tutorial
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Data Structures and Algorithms
Data Structures and Algorithms
New directions in traffic measurement and accounting
Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Data streams: algorithms and applications
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Computing Iceberg Queries Efficiently
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Fast Incremental Maintenance of Approximate Histograms
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Finding Frequent Items in Data Streams
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Frequency Estimation of Internet Packet Streams with Limited Space
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
A simple algorithm for finding frequent elements in streams and bags
ACM Transactions on Database Systems (TODS)
What's hot and what's not: tracking most frequent items dynamically
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Online Data Mining for Co-Evolving Time Sequences
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
An improved data stream summary: the count-min sketch and its applications
Journal of Algorithms
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
How to summarize the universe: dynamic maintenance of quantiles
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Research issues in data stream association rule mining
ACM SIGMOD Record
DSM-PLW: single-pass mining of path traversal patterns over streaming web click-sequences
Computer Networks: The International Journal of Computer and Telecommunications Networking - Web dynamics
CASCON '07 Proceedings of the 2007 conference of the center for advanced studies on Collaborative research
Explicit constructions for compressed sensing of sparse signals
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Finding popular categories for RFID tags
Proceedings of the 9th ACM international symposium on Mobile ad hoc networking and computing
Entity categorization over large document collections
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Online mining of frequent sets in data streams with error guarantee
Knowledge and Information Systems
Memory Efficient Algorithm for Mining Recent Frequent Items in a Stream
RSEISP '07 Proceedings of the international conference on Rough Sets and Intelligent Systems Paradigms
Efficiently Discovering Recent Frequent Items in Data Streams
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Pruning attribute values from data cubes with diamond dicing
IDEAS '08 Proceedings of the 2008 international symposium on Database engineering & applications
Adaptive shared-state sampling
Proceedings of the 8th ACM SIGCOMM conference on Internet measurement
Efficient single-pass frequent pattern mining using a prefix-tree
Information Sciences: an International Journal
Frequent items in streaming data: An experimental evaluation of the state-of-the-art
Data & Knowledge Engineering
Competitive Analysis of Aggregate Max in Windowed Streaming
ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
Weighted superimposed codes and constrained integer compressed sensing
IEEE Transactions on Information Theory
Compressed sensing with probabilistic measurements: a group testing solution
Allerton'09 Proceedings of the 47th annual Allerton conference on Communication, control, and computing
An online framework for catching top spreaders and scanners
Computer Networks: The International Journal of Computer and Telecommunications Networking
Fast Manhattan sketches in data streams
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Action prediction of opponents in MMORPG using data stream mining approach with heuristic motions
ISTASC'10 Proceedings of the 10th WSEAS international conference on Systems theory and scientific computation
Efficiently decodable non-adaptive group testing
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Increasing availability of industrial systems through data stream mining
Computers and Industrial Engineering
Bounds for nonadaptive group tests to estimate the amount of defectives
COCOA'10 Proceedings of the 4th international conference on Combinatorial optimization and applications - Volume Part II
Efficiently decodable error-correcting list disjunct matrices and applications
ICALP'11 Proceedings of the 38th international colloquim conference on Automata, languages and programming - Volume Part I
Data-driven modeling and analysis of online social networks
WAIM'11 Proceedings of the 12th international conference on Web-age information management
EStream: online mining of frequent sets with precise error guarantee
DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
Randomized group testing both query-optimal and minimal adaptive
SOFSEM'12 Proceedings of the 38th international conference on Current Trends in Theory and Practice of Computer Science
DBToaster: higher-order delta processing for dynamic, frequently fresh views
Proceedings of the VLDB Endowment
Noise-resilient group testing: Limitations and constructions
Discrete Applied Mathematics
WADS'07 Proceedings of the 10th international conference on Algorithms and Data Structures
CR-PRECIS: a deterministic summary structure for update data streams
ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
Scalable identification and measurement of heavy-hitters
Computer Communications
Identifying streaming frequent items in ad hoc time windows
Data & Knowledge Engineering
An efficient FPRAS type group testing procedure to approximate the number of defectives
Journal of Combinatorial Optimization
Hi-index | 0.06 |
Most database management systems maintain statistics on the underlying relation. One of the important statistics is that of the “hot items” in the relation: those that appear many times (most frequently, or more than some threshold). For example, end-biased histograms keep the hot items as part of the histogram and are used in selectivity estimation. Hot items are used as simple outliers in data mining, and in anomaly detection in many applications.We present new methods for dynamically determining the hot items at any time in a relation which is undergoing deletion operations as well as inserts. Our methods maintain small space data structures that monitor the transactions on the relation, and, when required, quickly output all hot items without rescanning the relation in the database. With user-specified probability, all hot items are correctly reported. Our methods rely on ideas from “group testing.” They are simple to implement, and have provable quality, space, and time guarantees. Previously known algorithms for this problem that make similar quality and performance guarantees cannot handle deletions, and those that handle deletions cannot make similar guarantees without rescanning the database. Our experiments with real and synthetic data show that our algorithms are accurate in dynamically tracking the hot items independent of the rate of insertions and deletions.