A Heuristic Method for Selecting Support Features from Large Datasets

Authors:
Hong Seo Ryoo;In-Yong Jang
Affiliations:
Division of Information Management Engineering, Korea university, 1, 5-Ka, Anam-Dong, Seongbuk-Ku, Seoul, 136-713, Korea;Division of Information Management Engineering, Korea university, 1, 5-Ka, Anam-Dong, Seongbuk-Ku, Seoul, 136-713, Korea
Venue:
AAIM '07 Proceedings of the 3rd international conference on Algorithmic Aspects in Information and Management
Year:
2007

Citing 11
Cited 0

Integer and combinatorial optimization

Integer and combinatorial optimization
Optimal solution of set covering/partitioning problems using dual heuristics

Management Science
Support-Vector Networks

Machine Learning
A Lagrangian-based heuristic for large-scale set covering problems

Mathematical Programming: Series A and B - Special issue on computational integer programming
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Advanced Scout: Data Mining and Knowledge Discovery in NBA Data

Data Mining and Knowledge Discovery
The Surgical Separation of Sets

Journal of Global Optimization
An Implementation of Logical Analysis of Data

IEEE Transactions on Knowledge and Data Engineering
A Heuristic Method for the Set Covering Problem

Operations Research
Training Support Vector Machines: an Application to Face Detection

CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Optimal robust non-unique probe selection using Integer Linear Programming

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

For feature selection in machine learning, set covering (SC) is most suited, for it selects support features for data under analysis based on the individual and the collective roles of the candidate features. However, the SC-based feature selection requires the complete pair-wise comparisons of the members of the different classes in a dataset, and this renders the meritorious SC principle impracticable for selecting support features from a large number of data.Introducing the notion of implicit SC-based feature selection, this paper presents a feature selection procedure that is equivalent to the standard SC-based feature selection procedure in supervised learning but with the memory requirement that is multiple orders of magnitude less than the counterpart. With experiments on six large machine learning datasets, we demonstrate the usefulness of the proposed implicit SC-based feature selection scheme in large-scale supervised data analysis.