Dynamically maintaining frequent items over a data stream

Authors:
Cheqing Jin;Weining Qian;Chaofeng Sha;Jeffrey X. Yu;Aoying Zhou
Affiliations:
Fudan University, P.R.C;Fudan University, P.R.C;Fudan University, P.R.C;The Chinese University of Hong Kong;Fudan University, P.R.C
Venue:
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Year:
2003

Citing 16
Cited 48

Optimal Semijoins for Distributed Database Systems

IEEE Transactions on Software Engineering
PERF join: an alternative to two-way semijoin and bloomjoin

CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
The space complexity of approximating the frequency moments

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
New sampling-based summary statistics for improving approximate query answers

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Summary cache: a scalable wide-area web cache sharing protocol

IEEE/ACM Transactions on Networking (TON)
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
New directions in traffic measurement and accounting

Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Computing Iceberg Queries Efficiently

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Finding Frequent Items in Data Streams

ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Frequency Estimation of Internet Packet Streams with Limited Space

ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
A simple algorithm for finding frequent elements in streams and bags

ACM Transactions on Database Systems (TODS)
What's hot and what's not: tracking most frequent items dynamically

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Spectral bloom filters

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Duplicate detection in click streams

WWW '05 Proceedings of the 14th international conference on World Wide Web
Fast and approximate stream mining of quantiles and frequencies using graphics processors

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Maintaining significant stream statistics over sliding windows

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Research issues in data stream association rule mining

ACM SIGMOD Record
A simpler and more efficient deterministic scheme for finding frequent items over sliding windows

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
An integrated efficient solution for computing frequent and top-k elements in data streams

ACM Transactions on Database Systems (TODS)
A data stream language and system designed for power and extensibility

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Mining evolving data streams for frequent patterns

Pattern Recognition
Towards a new approach for mining frequent itemsets on data stream

Journal of Intelligent Information Systems
Discovering during-temporal patterns (DTPs) in large temporal databases

Expert Systems with Applications: An International Journal
Statistical supports for mining sequential patterns and improving the incremental update process on data streams

Intelligent Data Analysis - Knowlegde Discovery from Data Streams
Finding frequent items in probabilistic data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Interactive mining of frequent itemsets over arbitrary time intervals in a data stream

ADC '08 Proceedings of the nineteenth conference on Australasian database - Volume 75
Designing an inductive data stream management system: the stream mill experience

SSPS '08 Proceedings of the 2nd international workshop on Scalable stream processing system
A survey on algorithms for mining frequent itemsets over data streams

Knowledge and Information Systems
Efficiently Discovering Recent Frequent Items in Data Streams

SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Finding Frequent Items in a Turnstile Data Stream

COCOON '08 Proceedings of the 14th annual international conference on Computing and Combinatorics
RETRACTED: Efficient mining of temporal emerging itemsets from data streams

Expert Systems with Applications: An International Journal
SLEUTH: Single-pubLisher attack dEtection Using correlaTion Hunting

Proceedings of the VLDB Endowment
Mining frequent closed itemsets from a landmark window over online data streams

Computers & Mathematics with Applications
Frequent items in streaming data: An experimental evaluation of the state-of-the-art

Data & Knowledge Engineering
Mining non-derivable frequent itemsets over data stream

Data & Knowledge Engineering
A novel hash-based approach for mining frequent itemsets over data streams requiring less memory space

Data Mining and Knowledge Discovery
Measuring evolving data streams' behavior through their intrinsic dimension

New Generation Computing
Mining frequent itemsets in data streams using the weighted sliding window model

Expert Systems with Applications: An International Journal
Sampling-based stream mining for network risk management

JSAI'06 Proceedings of the 20th annual conference on New frontiers in artificial intelligence
Discovering correlated items in data streams

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Approximately mining recently representative patterns on data streams

PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
Finding frequent items in data streams using ESBF

PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
Aggregate computation over data streams

APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Mining frequent patterns from network flows for monitoring network

Expert Systems with Applications: An International Journal
Efficient term cloud generation for streaming web content

ICWE'10 Proceedings of the 10th international conference on Web engineering
Mining frequent itemsets over distributed data streams by continuously maintaining a global synopsis

Data Mining and Knowledge Discovery
MHUI-max: An efficient algorithm for discovering high-utility itemsets from data streams

Journal of Information Science
Processing frequent items over distributed data streams

APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
Dynamically mining frequent patterns over online data streams

ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
An approximate approach for mining recently frequent itemsets from data streams

DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
Efficient computation of frequent and top-k elements in data streams

ICDT'05 Proceedings of the 10th international conference on Database Theory
Error-adaptive and time-aware maintenance of frequency counts over data streams

WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Statistical supports for frequent itemsets on data streams

MLDM'05 Proceedings of the 4th international conference on Machine Learning and Data Mining in Pattern Recognition
Suppressing redundancy in wireless sensor network traffic

DCOSS'10 Proceedings of the 6th IEEE international conference on Distributed Computing in Sensor Systems
A false negative maximal frequent itemset mining algorithm over stream

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches

Foundations and Trends in Databases
Efficient algorithms for mining maximal high utility itemsets from data streams with different models

Expert Systems with Applications: An International Journal
Line speed accurate superspreader identification using dynamic error compensation

Computer Communications
Identifying streaming frequent items in ad hoc time windows

Data & Knowledge Engineering
Mining Top-K Rank Frequent Patterns in Data Streams A Tree Based Approach with Ternary Function and Ternary Feature Vector

Proceedings of the Second International Conference on Innovative Computing and Cloud Computing
Mining frequent items in data stream using time fading model

Information Sciences: an International Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

It is challenge to maintain frequent items over a data stream, with a small bounded memory, in a dynamic environment where both insertion/deletion of items are allowed. In this paper, we propose a new novel algorithm, called hCount, which can handle both insertion and deletion of items with a much less memory space than the best reported algorithm. Our algorithm is also superior in terms of precision, recall and processing time. In addition, our approach does not request the preknowledge on the size of range for a data stream, and can handle range extension dynamically. Given a little modification, algorithm hCount can be improved to hCount*, which even owns significantly better performance than before.