Discretization from data streams: applications to histograms and data mining

Authors:
João Gama;Carlos Pinto
Affiliations:
Univ. do Porto, Porto, Portugal;Univ. do Algarve, Porto, Portugal
Venue:
Proceedings of the 2006 ACM symposium on Applied computing
Year:
2006

Citing 9
Cited 13

On the Handling of Continuous-Valued Attributes in Decision Tree Generation

Machine Learning
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Data-streams and histograms

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data

Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data
Fast incremental maintenance of approximate histograms

ACM Transactions on Database Systems (TODS)
Probabilistic discovery of time series motifs

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Khiops: A Statistical Discretization Method of Continuous Attributes

Machine Learning
Necessary and Sufficient Pre-processing in Numerical Range Discretization

Knowledge and Information Systems

Clustering Distributed Sensor Data Streams

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Learning from Data Streams: Synopsis and Change Detection

Proceedings of the 2008 conference on STAIRS 2008: Proceedings of the Fourth Starting AI Researchers' Symposium
Change detection in learning histograms from data streams

EPIA'07 Proceedings of the aritficial intelligence 13th Portuguese conference on Progress in artificial intelligence
Maintaining optimal multi-way splits for numerical attributes in data streams

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Stable rankings for different effort models

Automated Software Engineering
Clustering distributed sensor data streams using local processing and reduced communication

Intelligent Data Analysis - Ubiquitous Knowledge Discovery
The inductive software engineering manifesto: principles for industrial data mining

Proceedings of the International Workshop on Machine Learning Technologies in Software Engineering
Monitoring incremental histogram distribution for change detection in data streams

Sensor-KDD'08 Proceedings of the Second international conference on Knowledge Discovery from Sensor Data
Kernel-based selective ensemble learning for streams of trees

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Learning figures with the Hausdorff metric by fractals--towards computable binary classification

Machine Learning
Data stream clustering: A survey

ACM Computing Surveys (CSUR)
A lossy counting based approach for learning on streams of graphs on a budget

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Finding conclusion stability for selecting the best effort predictor in software effort estimation

Automated Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we propose a new method to perform incremental discretization. The basic idea is to perform the task in two layers. The first layer receives the sequence of input data and keeps some statistics on the data using many more intervals than required. Based on the statistics stored by the first layer, the second layer creates the final discretization. The proposed architecture processes streaming examples in a single scan, in constant time and space even for infinite sequences of examples. We experimentally demonstrate that incremental discretization is able to maintain the performance of learning algorithms in comparison to a batch discretization. The proposed method is much more appropriate in incremental learning, and in problems where data flows continuously, as in most of the recent data mining applications.