False positive or false negative: mining frequent itemsets from high speed transactional data streams

Authors:
Jeffery Xu Yu;Zhihong Chong;Hongjun Lu;Aoying Zhou
Affiliations:
The Chinese University of Hong Kong, Hong Kong, China;Fudan University, Shanghai, China;The Hong Kong University of Science and Technology, Hong Kong, China;Fudan University, Shanghai, China
Venue:
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Year:
2004

Citing 12
Cited 60

Random sampling with a reservoir

ACM Transactions on Mathematical Software (TOMS)
Probabilistic counting algorithms for data base applications

Journal of Computer and System Sciences
The space complexity of approximating the frequency moments

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Maintaining stream statistics over sliding windows: (extended abstract)

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Finding Frequent Items in Data Streams

ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Frequency Estimation of Internet Packet Streams with Limited Space

ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
What's hot and what's not: tracking most frequent items dynamically

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
An Approximate L1-Difference Algorithm for Massive Data Streams

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Spectral bloom filters

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Using association rules for fraud detection in web advertising networks

VLDB '05 Proceedings of the 31st international conference on Very large data bases
An Algorithm for In-Core Frequent Itemset Mining on Streaming Data

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Research issues in data stream association rule mining

ACM SIGMOD Record
Online mining of frequent query trees over XML data streams

Proceedings of the 15th international conference on World Wide Web
Online Random Shuffling of Large Database Tables

IEEE Transactions on Knowledge and Data Engineering
Quality-Aware Sampling and Its Applications in Incremental Data Mining

IEEE Transactions on Knowledge and Data Engineering
Answering ad hoc aggregate queries from data streams using prefix aggregate trees

Knowledge and Information Systems
Discovering frequent sets from data streams with CPU constraint

AusDM '07 Proceedings of the sixth Australasian conference on Data mining and analytics - Volume 70
Mining top-k frequent patterns in the presence of the memory constraint

The VLDB Journal — The International Journal on Very Large Data Bases
Power-law relationship and self-similarity in the itemset support distribution: analysis and applications

The VLDB Journal — The International Journal on Very Large Data Bases
A survey on algorithms for mining frequent itemsets over data streams

Knowledge and Information Systems
Mining frequent items in a stream using flexible windows

Intelligent Data Analysis - Knowledge Discovery from Data Streams
Online mining of frequent sets in data streams with error guarantee

Knowledge and Information Systems
DELAY: A Lazy Approach for Mining Frequent Patterns over High Speed Data Streams

ADMA '07 Proceedings of the 3rd international conference on Advanced Data Mining and Applications
Efficient Mining of Frequent Itemsets from Data Streams

BNCOD '08 Proceedings of the 25th British national conference on Databases: Sharing Data, Information and Knowledge
Efficient algorithms for stream mining of constrained frequent patterns in a limited memory environment

IDEAS '08 Proceedings of the 2008 international symposium on Database engineering & applications
DSM-FI: an efficient algorithm for mining frequent itemsets in data streams

Knowledge and Information Systems
Maintaining frequent closed itemsets over a sliding window

Journal of Intelligent Information Systems
Feature-preserved sampling over streaming data

ACM Transactions on Knowledge Discovery from Data (TKDD)
Efficient algorithms for incremental maintenance of closed sequential patterns in large databases

Data & Knowledge Engineering
Incremental updates of closed frequent itemsets over continuous data streams

Expert Systems with Applications: An International Journal
Efficient query processing on graph databases

ACM Transactions on Database Systems (TODS)
Frequent items in streaming data: An experimental evaluation of the state-of-the-art

Data & Knowledge Engineering
Interactive mining of top-K frequent closed itemsets from data streams

Expert Systems with Applications: An International Journal
A novel hash-based approach for mining frequent itemsets over data streams requiring less memory space

Data Mining and Knowledge Discovery
Sliding window-based frequent pattern mining over data streams

Information Sciences: an International Journal
Which Is Better for Frequent Pattern Mining: Approximate Counting or Sampling?

DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Efficient itemset generator discovery over a stream sliding window

Proceedings of the 18th ACM conference on Information and knowledge management
Mining frequent itemsets in time-varying data streams

Proceedings of the 18th ACM conference on Information and knowledge management
Approximate Frequent Itemset Discovery from Data Stream

AI*IA '09: Proceedings of the XIth International Conference of the Italian Association for Artificial Intelligence Reggio Emilia on Emergent Perspectives in Artificial Intelligence
Finding frequent items in data streams using ESBF

PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
CLAIM: an efficient method for relaxed frequent closed itemsets mining over stream data

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
A new algorithm for mining global frequent itemsets in a stream

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 5
A test paradigm for detecting changes in transactional data streams

DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
Open user schema guided evaluation of streaming RDF queries

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A comparison between approximate counting and sampling methods for frequent pattern mining on data streams

Intelligent Data Analysis
Mining informative rule set for prediction over a sliding window

ACIIDS'10 Proceedings of the Second international conference on Intelligent information and database systems: Part II
On dense pattern mining in graph streams

Proceedings of the VLDB Endowment
A method of extracting malicious expressions in bulletin board systems by using context analysis

Information Processing and Management: an International Journal
Discovery of frequent patterns in transactional data streams

Transactions on large-scale data- and knowledge-centered systems II
Discovery of frequent patterns in transactional data streams

Transactions on large-scale data- and knowledge-centered systems II
Mining frequent itemsets over distributed data streams by continuously maintaining a global synopsis

Data Mining and Knowledge Discovery
A generic approach for mining indirect association rules in data streams

IEA/AIE'11 Proceedings of the 24th international conference on Industrial engineering and other applications of applied intelligent systems conference on Modern approaches in applied intelligence - Volume Part I
Frequent pattern mining from time-fading streams of uncertain data

DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
Search method of time sensitive frequent itemsets in data streams

CIARP'06 Proceedings of the 11th Iberoamerican conference on Progress in Pattern Recognition, Image Analysis and Applications
DAPSS: exact subsequence matching for data streams

DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
Maintaining frequent itemsets over high-speed data streams

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
EStream: online mining of frequent sets with precise error guarantee

DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
Adaptive load shedding for mining frequent patterns from data streams

DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
Error-adaptive and time-aware maintenance of frequency counts over data streams

WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
False-Negative frequent items mining from data streams with bursting

DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
A false negative approach to mining frequent itemsets from high speed transactional data streams

Information Sciences: an International Journal
A false negative maximal frequent itemset mining algorithm over stream

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
A sliding window-based false-negative approach for ubiquitous data stream analysis

International Journal of Communication Systems
RDF pattern matching using sortable views

Proceedings of the 21st ACM international conference on Information and knowledge management
An adaptive algorithm for finding frequent sets in landmark windows

SUM'12 Proceedings of the 6th international conference on Scalable Uncertainty Management
Evaluation of RDF queries via equivalence

Frontiers of Computer Science: Selected Publications from Chinese Universities
Identifying streaming frequent items in ad hoc time windows

Data & Knowledge Engineering
Mining frequent itemsets in a stream

Information Systems
Efficient frequent itemset mining methods over time-sensitive streams

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of finding frequent items has been recently studied over high speed data streams. However, mining frequent itemsets from transactional data streams has not been well addressed yet in terms of its bounds of memory consumption. The main difficulty is due to the nature of the exponential explosion of itemsets. Given a domain of I unique items, the possible number of itemsets can be up to 2I - 1. When the length of data streams approaches to a very large number N, the possibility of an itemset to be frequent becomes larger and difficult to track with limited memory. However, the real killer of effective frequent itemset mining is that most of existing algorithms are false-positive oriented. That is, they control memory consumption in the counting processes by an error parameter ε, and allow items with support below the specified minimum support s but above s-ε counted as frequent ones. Such false-positive items increase the number of false-positive frequent itemsets exponentially, which may make the problem computationally intractable with bounded memory consumption. In this paper, we developed algorithms that can effectively mine frequent item(set)s from high speed transactional data streams with a bound of memory consumption. While our algorithms are false-negative oriented, that is, certain frequent itemsets may not appear in the results, the number of false-negative itemsets can be controlled by a predefined parameter so that desired recall rate of frequent itemsets can be guaranteed. We developed algorithms based on Chernoff bound. Our extensive experimental studies show that the proposed algorithms have high accuracy, require less memory, and consume less CPU time. They significantly outperform the existing false-positive algorithms.