High-Performance Commercial Data Mining: A Multistrategy Machine Learning Application

Authors:
William H. Hsu;Michael Welge;Tom Redman;David Clutter
Affiliations:
Department of Computing and Information Sciences, Kansas State University, Manhattan, KS 66506&semi/ Automated Learning Group, National Center for Supercomputing Applications (NCSA), Champaign, IL ...;Automated Learning Group, National Center for Supercomputing Applications (NCSA), Champaign, IL 61820. welge@ncsa.uiuc.edu;Automated Learning Group, National Center for Supercomputing Applications (NCSA), Champaign, IL 61820. redman@ncsa.uiuc.edu;Automated Learning Group, National Center for Supercomputing Applications (NCSA), Champaign, IL 61820. clutter@ncsa.uiuc.edu
Venue:
Data Mining and Knowledge Discovery
Year:
2002

Citing 21
Cited 3

Instance-Based Learning Algorithms

Machine Learning
Vector quantization and signal compression

Vector quantization and signal compression
Genetic programming: on the programming of computers by means of natural selection

Genetic programming: on the programming of computers by means of natural selection
Using Genetic Algorithms for Concept Learning

Machine Learning - Special issue on genetic algorithms
Estimating attributes: analysis and extensions of RELIEF

ECML-94 Proceedings of the European conference on machine learning on Machine Learning
Artificial intelligence: a modern approach

Artificial intelligence: a modern approach
Wrappers for performance enhancement and oblivious decision graphs

Wrappers for performance enhancement and oblivious decision graphs
Practical reusable UNIX software

Practical reusable UNIX software
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
How to build a Beowulf: a guide to the implementation and application of PC clusters

How to build a Beowulf: a guide to the implementation and application of PC clusters
A Multistrategy Approach to Classifier Learning from Time Series

Machine Learning - Special issue on multistrategy learning
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
Bayesian Learning for Neural Networks

Bayesian Learning for Neural Networks
Change of Representation and Inductive Bias

Change of Representation and Inductive Bias
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
The Mythical Man-Month: Essays on Softw

The Mythical Man-Month: Essays on Softw
Machine Learning

Machine Learning
Learning Logical Definitions from Relations

Machine Learning
Induction of Decision Trees

Machine Learning
Time Series Learning With Probabilistic Network Composites

Time Series Learning With Probabilistic Network Composites
Knowledge-guided constructive induction

Knowledge-guided constructive induction

Control of inductive bias in supervised learning using evolutionary computation: a wrapper-based approach

Data mining
Visualizing concept drift

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
SoPhIA: a unified architecture for knowledge discovery

AWIC'05 Proceedings of the Third international conference on Advances in Web Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an application of inductive concept learning and interactive visualization techniques to a large-scale commercial data mining project. This paper focuses on design and configuration of high-level optimization systems (wrappers) for relevance determination and constructive induction, and on integrating these wrappers with elicited knowledge on attribute relevance and synthesis. In particular, we discuss decision support issues for the application (cost prediction for automobile insurance markets in several states) and report experiments using iD2K, a Java-based visual programming system for data mining and information visualization, and several commercial and research tools. We describe exploratory clustering, descriptive statistics, and supervised decision tree learning in this application, focusing on a parallel genetic algorithm (GA) system, iJenesis, which is used to implement relevance determination (attribute subset selection). Deployed on several high-performance network-of-workstation systems (Beowulf clusters), iJenesis achieves a linear speedup, due to a high degree of task parallelism. Its test set accuracy is significantly higher than that of decision tree inducers alone and is comparable to that of the best extant search-space based wrappers.