A similarity-based approach for data stream classification

Authors:
Dayrelis Mena-Torres;Jesús S. Aguilar-Ruiz
Affiliations:
University of Pinar del Rio "Hermanos Saiz Montes de Oca", Road Marti, No. 272, Pinar del Rio, Cuba;University "Pablo de Olavide", Road Utrera, km 1, 41013 Sevilla, Spain
Venue:
Expert Systems with Applications: An International Journal
Year:
2014

Citing 38
Cited 0

Toward memory-based reasoning

Communications of the ACM - Special issue on parallelism
Instance-Based Learning Algorithms

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features

Machine Learning
Tolerating Concept and Sampling Shift in Lazy Learning UsingPrediction Error Context Switching

Artificial Intelligence Review - Special issue on lazy learning
Mining high-speed data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
RIONA: A Classifier Combining Rule Induction and k-NN Method with Automated Selection of Optimal Neighbourhood

ECML '02 Proceedings of the 13th European Conference on Machine Learning
Detecting Concept Drift with Support Vector Machines

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Accurate decision trees for mining high-speed data streams

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Learning drifting concepts: Example selection vs. example weighting

Intelligent Data Analysis
Efficient instance-based learning on data streams

Intelligent Data Analysis
A systematic analysis of performance measures for classification tasks

Information Processing and Management: an International Journal
Flexible decision tree for data stream classification in the presence of concept change, noise and missing values

Data Mining and Knowledge Discovery
Adaptive Learning from Evolving Data Streams

IDA '09 Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII
Improved heterogeneous distance functions

Journal of Artificial Intelligence Research
Fuzzy-UCS: a Michigan-style learning fuzzy-classifier system for supervised learning

IEEE Transactions on Evolutionary Computation
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
MOA: Massive Online Analysis

The Journal of Machine Learning Research
λ-Perceptron: An adaptive classifier for data streams

Pattern Recognition
Robust ensemble learning for mining noisy data streams

Decision Support Systems
A Double-Window-Based Classification Algorithm for Concept Drifting Data Streams

GRC '10 Proceedings of the 2010 IEEE International Conference on Granular Computing
Atypicity detection in data streams: A self-adjusting approach

Intelligent Data Analysis - Ubiquitous Knowledge Discovery
Classification model for data streams based on similarity

IEA/AIE'11 Proceedings of the 24th international conference on Industrial engineering and other applications of applied intelligent systems conference on Modern approaches in applied intelligence - Volume Part I
Distributed processing of continuous sliding-window k-NN queries for data stream filtering

World Wide Web
Mining data streams with concept drifts using genetic algorithm

Artificial Intelligence Review
Continuous Monitoring of Distributed Data Streams over a Time-Based Sliding Window

Algorithmica
Learning from concept drifting data streams with unlabeled data

Neurocomputing
FLEXFIS: A Robust Incremental Learning Approach for Evolving Takagi–Sugeno Fuzzy Models

IEEE Transactions on Fuzzy Systems
Nearest neighbor pattern classification

IEEE Transactions on Information Theory
Scalable and efficient multi-label classification for evolving data streams

Machine Learning
Data stream classification with artificial endocrine system

Applied Intelligence
Evolving fuzzy pattern trees for binary classification on data streams

Information Sciences: an International Journal
Mining neighbor-based patterns in data streams

Information Systems
An adaptive ensemble classifier for mining concept drifting data streams

Expert Systems with Applications: An International Journal
Learning from data streams with only positive and unlabeled data

Journal of Intelligent Information Systems
Pattern discovery in data streams under the time warping distance

The VLDB Journal — The International Journal on Very Large Data Bases
Sliding window based weighted maximal frequent pattern mining over data streams

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	12.05

Visualization

Abstract

Incremental learning techniques have been used extensively to address the data stream classification problem. The most important issue is to maintain a balance between accuracy and efficiency, i.e., the algorithm should provide good classification performance with a reasonable time response. This work introduces a new technique, named Similarity-based Data Stream Classifier (SimC), which achieves good performance by introducing a novel insertion/removal policy that adapts quickly to the data tendency and maintains a representative, small set of examples and estimators that guarantees good classification rates. The methodology is also able to detect novel classes/labels, during the running phase, and to remove useless ones that do not add any value to the classification process. Statistical tests were used to evaluate the model performance, from two points of view: efficacy (classification rate) and efficiency (online response time). Five well-known techniques and sixteen data streams were compared, using the Friedman's test. Also, to find out which schemes were significantly different, the Nemenyi's, Holm's and Shaffer's tests were considered. The results show that SimC is very competitive in terms of (absolute and streaming) accuracy, and classification/updating time, in comparison to several of the most popular methods in the literature.