SECRET: a scalable linear regression tree algorithm

Authors:
Alin Dobra;Johannes Gehrke
Affiliations:
Cornell University, Ithaca, NY;Cornell University, Ithaca, NY
Venue:
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2002

Citing 7
Cited 19

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Employing linear regression in regression tree leaves

ECAI '92 Proceedings of the 10th European conference on Artificial intelligence
C4.5: programs for machine learning

C4.5: programs for machine learning
Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey

Data Mining and Knowledge Discovery
Error Estimators for Pruning Regression Trees

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Functional Models for Regression Tree Leaves

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
RainForest - A Framework for Fast Decision Tree Construction of Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases

Top-Down Induction of Model Trees with Regression and Splitting Nodes

IEEE Transactions on Pattern Analysis and Machine Intelligence
Incremental learning of linear model trees

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Incremental Learning of Linear Model Trees

Machine Learning
A simple regression based heuristic for learning model trees

Intelligent Data Analysis
A hybrid neural network model for rule generation and its application to process fault detection and diagnosis

Engineering Applications of Artificial Intelligence
Scalable look-ahead linear regression trees

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Soft decision trees: A genetically optimized cluster oriented approach

Expert Systems with Applications: An International Journal
Learning Model Trees from Data Streams

DS '08 Proceedings of the 11th International Conference on Discovery Science
An evolutionary algorithm for global induction of regression trees

ICAISC'10 Proceedings of the 10th international conference on Artifical intelligence and soft computing: Part II
Globally induced model trees: an evolutionary approach

PPSN'10 Proceedings of the 11th international conference on Parallel problem solving from nature: Part I
Classifier acceleration by imitation

ACCV'10 Proceedings of the 10th Asian conference on Computer vision - Volume Part IV
Learning model trees from evolving data streams

Data Mining and Knowledge Discovery
Temporal multi-hierarchy smoothing for estimating rates of rare events

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
An evolutionary algorithm for global induction of regression trees with multivariate linear models

ISMIS'11 Proceedings of the 19th international conference on Foundations of intelligent systems
Mining tolerance regions with model trees

ISMIS'06 Proceedings of the 16th international conference on Foundations of Intelligent Systems
Scalable regression tree learning on Hadoop using OpenPlanet

Proceedings of third international workshop on MapReduce and its Applications Date
Hinging hyperplane models for multiple predicted variables

SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Using turning point detection to obtain better regression trees

MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
Multivariate convex regression with adaptive partitioning

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Developing regression models for large datasets that are both accurate and easy to interpret is a very important data mining problem. Regression trees with linear models in the leaves satisfy both these requirements, but thus far, no truly scalable regression tree algorithm is known. This paper proposes a novel regression tree construction algorithm (SECRET) that produces trees of high quality and scales to very large datasets. At every node, SECRET uses the EM algorithm for Gaussian mixtures to find two clusters in the data and to locally transform the regression problem into a classification problem based on closeness to these clusters. Goodness of split measures, like the gini gain, can then be used to determine the split variable and the split point much like in classification tree construction. Scalability of the algorithm can be achieved by employing scalable versions of the EM and classification tree construction algorithms. An experimental evaluation on real and artificial data shows that SECRET has accuracy comparable to other linear regression tree algorithms but takes orders of magnitude less computation time for large datasets.