Systematic data selection to mine concept-drifting data streams

Authors:
Wei Fan
Affiliations:
IBM T.J.Watson Research, Hawthorne, NY
Venue:
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2004

Citing 15
Cited 47

Mining high-speed data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Space-efficient online computation of quantile summaries

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Mining time-changing data streams

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
A streaming ensemble algorithm (SEA) for large-scale classification

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Continually evaluating similarity-based pattern queries on a streaming time series

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Random Forests

Machine Learning
Continuous queries over data streams

ACM SIGMOD Record
Clustering data streams

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
A framework for diagnosing changes in evolving data streams

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Is random model better? On its accuracy and efficiency

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Mining concept-drifting data streams using ensemble classifiers

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Multi-dimensional regression analysis of time-series data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
StreamMiner: a classifier ensemble-based engine to mine concept-drifting data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
On the optimality of probability estimation by random decision trees

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence

Time weight collaborative filtering

Proceedings of the 14th ACM international conference on Information and knowledge management
Tracking concept drifting with an online-optimized incremental learning framework

Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval
System approach to intrusion detection using hidden Markov model

Proceedings of the 2006 international conference on Wireless communications and mobile computing
Recency-based collaborative filtering

ADC '06 Proceedings of the 17th Australasian Database Conference - Volume 49
Adaptive Clustering for Multiple Evolving Streams

IEEE Transactions on Knowledge and Data Engineering
An automatic construction and organization strategy for ensemble learning on data streams

ACM SIGMOD Record
Distributed and control theoretic approach to intrusion detection

IWCMC '07 Proceedings of the 2007 international conference on Wireless communications and mobile computing
Using classifier ensembles to label spatially disjoint data

Information Fusion
Dynamic integration of classifiers for handling concept drift

Information Fusion
StreamMiner: a classifier ensemble-based engine to mine concept-drifting data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Boosting classifiers for drifting concepts

Intelligent Data Analysis - Knowlegde Discovery from Data Streams
An active learning system for mining time-changing data streams

Intelligent Data Analysis
Learning in Environments with Unknown Dynamics: Towards more Robust Concept Learners

The Journal of Machine Learning Research
Knowledge transfer via multiple model local structure mapping

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Higher order mining

ACM SIGKDD Explorations Newsletter
Peer to peer botnet detection for cyber-security: a data mining approach

Proceedings of the 4th annual workshop on Cyber security and information intelligence research: developing strategies to meet the cyber security and information intelligence challenges ahead
An Efficient and Sensitive Decision Tree Approach to Mining Concept-Drifting Data Streams

Informatica
Classifying Evolving Data Streams Using Dynamic Streaming Random Forests

DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
Mining decision rules on data streams in the presence of concept drifts

Expert Systems with Applications: An International Journal
An adaptive personalized news dissemination system

Journal of Intelligent Information Systems
A Multi-partition Multi-chunk Ensemble Technique to Classify Concept-Drifting Data Streams

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
An Aggregate Ensemble for Mining Concept Drifting Data Streams with Noise

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
A framework for flexible clustering of multiple evolving data streams

International Journal of Advanced Intelligence Paradigms
Flexible decision tree for data stream classification in the presence of concept change, noise and missing values

Data Mining and Knowledge Discovery
Combining Time and Space Similarity for Small Size Learning under Concept Drift

ISMIS '09 Proceedings of the 18th International Symposium on Foundations of Intelligent Systems
Lacking Labels in the Stream: Classifying Evolving Stream Data with Few Labels

ISMIS '09 Proceedings of the 18th International Symposium on Foundations of Intelligent Systems
Enhancing recommender systems under volatile userinterest drifts

Proceedings of the 18th ACM conference on Information and knowledge management
Dynamic security policy learning

Proceedings of the first ACM workshop on Information security governance
Statistical Instance-Based Ensemble Pruning for Multi-class Problems

ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part I
Quick adaptation to changing concepts by sensitive detection

IEA/AIE'07 Proceedings of the 20th international conference on Industrial, engineering, and other applications of applied intelligent systems
Online evolutionary collaborative filtering

Proceedings of the fourth ACM conference on Recommender systems
Adapting neighborhood and matrix factorization models for context aware recommendation

Proceedings of the Workshop on Context-Aware Movie Recommendation
Classification and novel class detection of data streams in a dynamic feature space

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Robust ensemble learning for mining noisy data streams

Decision Support Systems
Detecting and ordering salient regions

Data Mining and Knowledge Discovery
Editorial: Classifying text streams by keywords using classifier ensemble

Data & Knowledge Engineering
Cloud-based malware detection for evolving data streams

ACM Transactions on Management Information Systems (TMIS)
Accuracy updated ensemble for data streams with concept drift

HAIS'11 Proceedings of the 6th international conference on Hybrid artificial intelligent systems - Volume Part II
Beating the baseline prediction in food sales: How intelligent an intelligent predictor is?

Expert Systems with Applications: An International Journal
Classifying noisy data streams

FSKD'06 Proceedings of the Third international conference on Fuzzy Systems and Knowledge Discovery
Learning with local drift detection

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Improving the performance of data stream classifiers by mining recurring contexts

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Temporal evolution and local patterns

LPD'04 Proceedings of the 2004 international conference on Local Pattern Detection
Establishing fraud detection patterns based on signatures

ICDM'06 Proceedings of the 6th Industrial Conference on Data Mining conference on Advances in Data Mining: applications in Medicine, Web Mining, Marketing, Image and Signal Mining
Detecting change via competence model

ICCBR'10 Proceedings of the 18th international conference on Case-Based Reasoning Research and Development
A new method of mining data streams using harmony search

Journal of Intelligent Information Systems
Recentness biased learning for time series forecasting

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

One major problem of existing methods to mine data streams is that it makes ad hoc choices to combine most recent data with some amount of old data to search the new hypothesis. The assumption is that the additional old data always helps produce a more accurate hypothesis than using the most recent data only. We first criticize this notion and point out that using old data blindly is not better than "gambling"; in other words, it helps increase the accuracy only if we are "lucky." We discuss and analyze the situations where old data will help and what kind of old data will help. The practical problem on choosing the right example from old data is due to the formidable cost to compare different possibilities and models. This problem will go away if we have an algorithm that is extremely efficient to compare all sensible choices with little extra cost. Based on this observation, we propose a simple, efficient and accurate cross-validation decision tree ensemble method.