Optimal histograms for limiting worst-case error propagation in the size of join results
ACM Transactions on Database Systems (TODS)
Randomized algorithms
Balancing histogram optimality and practicality for query result size estimation
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
The space complexity of approximating the frequency moments
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
New sampling-based summary statistics for improving approximate query answers
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Tracking join and self-join sizes in limited storage
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Fast, small-space algorithms for approximate histogram maintenance
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Data Structures and Algorithms
Data Structures and Algorithms
Computing Iceberg Queries Efficiently
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Fast Incremental Maintenance of Approximate Histograms
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Finding Frequent Items in Data Streams
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Frequency Estimation of Internet Packet Streams with Limited Space
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
A simple algorithm for finding frequent elements in streams and bags
ACM Transactions on Database Systems (TODS)
Online Data Mining for Co-Evolving Time Sequences
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
How to summarize the universe: dynamic maintenance of quantiles
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Identifying frequent items in sliding windows over on-line packet streams
Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
Sketch-based change detection: methods, evaluation, and applications
Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
Dynamically maintaining frequent items over a data stream
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Cost-efficient mining techniques for data streams
ACSW Frontiers '04 Proceedings of the second workshop on Australasian information security, Data Mining and Web Intelligence, and Software Internationalisation - Volume 32
Diamond in the rough: finding Hierarchical Heavy Hitters in multi-dimensional data
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Approximation techniques for spatial data
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Scalable dissemination: what's hot and what's not
Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Online identification of hierarchical heavy hitters: algorithms, evaluation, and applications
Proceedings of the 4th ACM SIGCOMM conference on Internet measurement
Simulating the Webgraph: A Comparative Analysis of Models
Computing in Science and Engineering
Finding (Recently) Frequent Items in Distributed Data Streams
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Approximate counts and quantiles over sliding windows
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
estWin: Online data stream mining of recent frequent itemsets by sliding window method
Journal of Information Science
Duplicate detection in click streams
WWW '05 Proceedings of the 14th international conference on World Wide Web
What's hot and what's not: tracking most frequent items dynamically
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
A robust system for accurate real-time summaries of internet traffic
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Holistic aggregates in a networked world: distributed tracking of approximate quantiles
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Fast and approximate stream mining of quantiles and frequencies using graphics processors
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
An improved data stream summary: the count-min sketch and its applications
Journal of Algorithms
Summarizing and mining inverse distributions on data streams via dynamic inverse sampling
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Using association rules for fraud detection in web advertising networks
VLDB '05 Proceedings of the 31st international conference on Very large data bases
ACM SIGMOD Record
Ranking flows from sampled traffic
CoNEXT '05 Proceedings of the 2005 ACM conference on Emerging network experiment and technology
Maintaining significant stream statistics over sliding windows
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
The hunting of the bump: on maximizing statistical discrepancy
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
What's new: finding significant differences in network data streams
IEEE/ACM Transactions on Networking (TON)
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A simpler and more efficient deterministic scheme for finding frequent items over sliding windows
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Finding global icebergs over distributed data sets
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Online clustering of parallel data streams
Data & Knowledge Engineering
An integrated efficient solution for computing frequent and top-k elements in data streams
ACM Transactions on Database Systems (TODS)
Data streams: algorithms and applications
Foundations and Trends® in Theoretical Computer Science
Proceedings of the 6th ACM SIGCOMM conference on Internet measurement
Mining evolving data streams for frequent patterns
Pattern Recognition
Overlaps help: Improved bounds for group testing with interval queries
Discrete Applied Mathematics
Fast data stream algorithms using associative memories
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Improving sketch reconstruction accuracy using linear least squares method
IMC '05 Proceedings of the 5th ACM SIGCOMM conference on Internet Measurement
Estimating entropy over data streams
ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
Answering ad hoc aggregate queries from data streams using prefix aggregate trees
Knowledge and Information Systems
Finding hierarchical heavy hitters in data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Finding hierarchical heavy hitters in streaming data
ACM Transactions on Knowledge Discovery from Data (TKDD)
Proof-infused streams: enabling authentication of sliding window queries on streams
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Approximate continuous querying over distributed streams
ACM Transactions on Database Systems (TODS)
Intelligent Data Analysis - Knowlegde Discovery from Data Streams
Approximate mining of frequent patterns on streams
Intelligent Data Analysis - Knowlegde Discovery from Data Streams
Efficient instance-based learning on data streams
Intelligent Data Analysis
Processing top k queries from samples
CoNEXT '06 Proceedings of the 2006 ACM CoNEXT conference
Finding frequent items in probabilistic data
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Interactive mining of frequent itemsets over arbitrary time intervals in a data stream
ADC '08 Proceedings of the nineteenth conference on Australasian database - Volume 75
Extracting k most important groups from data efficiently
Data & Knowledge Engineering
A survey on algorithms for mining frequent itemsets over data streams
Knowledge and Information Systems
Mining frequent items in a stream using flexible windows
Intelligent Data Analysis - Knowledge Discovery from Data Streams
Processing top-k queries from samples
Computer Networks: The International Journal of Computer and Telecommunications Networking
FIDS: Monitoring Frequent Items over Distributed Data Streams
MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Separator: Sifting Hierarchical Heavy Hitters Accurately from Data Streams
ADMA '07 Proceedings of the 3rd international conference on Advanced Data Mining and Applications
Clustering Distributed Sensor Data Streams
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
SLEUTH: Single-pubLisher attack dEtection Using correlaTion Hunting
Proceedings of the VLDB Endowment
ODMCA: An adaptive data mining control algorithm in multicarrier networks
Computer Communications
Mining frequent closed itemsets from a landmark window over online data streams
Computers & Mathematics with Applications
Mining non-derivable frequent itemsets over data stream
Data & Knowledge Engineering
HIDS: a multifunctional generator of hierarchical data streams
ACM SIGMIS Database
Data Mining and Knowledge Discovery
Optimal tracking of distributed heavy hitters and quantiles
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Learning from Data Streams: Synopsis and Change Detection
Proceedings of the 2008 conference on STAIRS 2008: Proceedings of the Fourth Starting AI Researchers' Symposium
Small synopses for group-by query verification on outsourced data streams
ACM Transactions on Database Systems (TODS)
A frequent pattern based framework for event detection in sensor network stream data
Proceedings of the Third International Workshop on Knowledge Discovery from Sensor Data
Harnessing the strengths of anytime algorithms for constant data streams
Data Mining and Knowledge Discovery
Evaluating top-k queries over incomplete data streams
Proceedings of the 18th ACM conference on Information and knowledge management
Incorporating prediction models in the SelfLet framework: a plugin approach
Proceedings of the Fourth International ICST Conference on Performance Evaluation Methodologies and Tools
Sampling-based stream mining for network risk management
JSAI'06 Proceedings of the 20th annual conference on New frontiers in artificial intelligence
Discovering correlated items in data streams
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
An efficient algorithm for instance-based learning on data streams
ICDM'07 Proceedings of the 7th industrial conference on Advances in data mining: theoretical aspects and applications
Finding frequent elements in non-bursty streams
ESA'07 Proceedings of the 15th annual European conference on Algorithms
Finding frequent items in data streams using ESBF
PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
2-stage fault tolerant interval group testing
ISAAC'07 Proceedings of the 18th international conference on Algorithms and computation
CLAIM: an efficient method for relaxed frequent closed itemsets mining over stream data
DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Aggregate computation over data streams
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Fast private norm estimation and heavy hitters
TCC'08 Proceedings of the 5th conference on Theory of cryptography
Approximate sparse recovery: optimizing time and measurements
Proceedings of the forty-second ACM symposium on Theory of computing
A new data streaming method for locating hosts with large connection degree
GLOBECOM'09 Proceedings of the 28th IEEE conference on Global telecommunications
Finding top-k elements in data streams
Information Sciences: an International Journal
Estimating top-k destinations in data streams
IPMU'10 Proceedings of the Computational intelligence for knowledge-based systems design, and 13th international conference on Information processing and management of uncertainty
Lightweight problem determination in DBMSs using data stream analysis techniques
Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative Research
Clustering distributed sensor data streams using local processing and reduced communication
Intelligent Data Analysis - Ubiquitous Knowledge Discovery
Distributed frequent items detection on uncertain data
ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
Space-efficient tracking of persistent items in a massive data stream
Proceedings of the 5th ACM international conference on Distributed event-based system
Combinatorial algorithms for compressed sensing
SIROCCO'06 Proceedings of the 13th international conference on Structural Information and Communication Complexity
Dynamically mining frequent patterns over online data streams
ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
Overlaps help: improved bounds for group testing with interval queries
COCOON'05 Proceedings of the 11th annual international conference on Computing and Combinatorics
GCC'05 Proceedings of the 4th international conference on Grid and Cooperative Computing
Fast approximate wavelet tracking on streams
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Efficient monitoring of personalized hot news over Web 2.0 streams
Computer Science - Research and Development
Efficient computation of frequent and top-k elements in data streams
ICDT'05 Proceedings of the 10th international conference on Database Theory
Error-adaptive and time-aware maintenance of frequency counts over data streams
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Statistical supports for frequent itemsets on data streams
MLDM'05 Proceedings of the 4th international conference on Machine Learning and Data Mining in Pattern Recognition
User subjectivity in change modeling of streaming itemsets
ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
Adaptive spatial partitioning for multidimensional data streams
ISAAC'04 Proceedings of the 15th international conference on Algorithms and Computation
False-Negative frequent items mining from data streams with bursting
DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
A false negative approach to mining frequent itemsets from high speed transactional data streams
Information Sciences: an International Journal
Data stream synopsis using saintetiq
FQAS'06 Proceedings of the 7th international conference on Flexible Query Answering Systems
A false negative maximal frequent itemset mining algorithm over stream
ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
International Journal of Sensor Networks
Virtual indexing based methods for estimating node connection degrees
Computer Networks: The International Journal of Computer and Telecommunications Networking
Sketch-based querying of distributed sliding-window data streams
Proceedings of the VLDB Endowment
Approximate Sparse Recovery: Optimizing Time and Measurements
SIAM Journal on Computing
Measuring and fingerprinting click-spam in ad networks
Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
Measuring and fingerprinting click-spam in ad networks
ACM SIGCOMM Computer Communication Review - Special october issue SIGCOMM '12
A variant of non-adaptive group testing and its application in pay-television via internet
ICT-EurAsia'13 Proceedings of the 2013 international conference on Information and Communication Technology
Sketching via hashing: from heavy hitters to compressed sensing to sparse fourier transform
Proceedings of the 32nd symposium on Principles of database systems
Sketch-based geometric monitoring of distributed stream queries
Proceedings of the VLDB Endowment
A methodological overview on anomaly detection
DataTraffic Monitoring and Analysis
Stream mining on univariate uncertain data
Applied Intelligence
Hi-index | 0.00 |
Most database management systems maintain statistics on the underlying relation. One of the important statistics is that of the "hot items" in the relation: those that appear many times (most frequently, or more than some threshold). For example, end-biased histograms keep the hot items as part of the histogram and are used in selectivity estimation. Hot items are used as simple outliers in data mining, and in anomaly detection in networking applications.We present a new algorithm for dynamically determining the hot items at any time in the relation that is undergoing deletion operations as well as inserts. Our algorithm maintains a small space data structure that monitors the transactions on the relation, and when required, quickly outputs all hot items, without rescanning the relation in the database. With user-specified probability, it is able to report all hot items. Our algorithm relies on the idea of "group testing", is simple to implement, and has provable quality, space and time guarantees. Previously known algorithms for this problem that make similar quality and performance guarantees can not handle deletions, and those that handle deletions can not make similar guarantees without rescanning the database. Our experiments with real and synthetic data shows that our algorithm is remarkably accurate in dynamically tracking the hot items independent of the rate of insertions and deletions.