Using ambiguity measure feature selection algorithm for support vector machine classifier

Authors:
Saket S. R. Mengle;Nazli Goharian
Affiliations:
Illinois Institute of Technology, Chicago, Illinois;Illinois Institute of Technology, Chicago, Illinois
Venue:
Proceedings of the 2008 ACM symposium on Applied computing
Year:
2008

Citing 10
Cited 1

Machine learning, neural and statistical classification

Machine learning, neural and statistical classification
Support-Vector Networks

Machine Learning
Making large-scale support vector machine learning practical

Advances in kernel methods
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A scalability analysis of classifiers in text categorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Feature selection using linear classifier weights: interaction with classification models

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
An Empirical Study of Feature Selection for Text Categorization based on Term Weightage

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
OCFS: optimal orthogonal centroid feature selection for text categorization

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A novel feature selection algorithm for text categorization

Expert Systems with Applications: An International Journal

On document splitting in passage detection

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the ever-increasing number of documents on the web, digital libraries, news sources, etc., the need of a text classifier that can classify massive amount of data is becoming more critical and difficult. The major problem in text classification is the high dimensionality of feature space. The Support Vector Machine (SVM) classifier is shown to perform consistently better than other text classification algorithms. However, the time taken for training a SVM model is more than other algorithms. We explore the use of the Ambiguity Measure (AM) feature selection method that uses only the most unambiguous keywords to predict the category of a document. Our analysis shows that AM reduces the training time by more than 50% than the scenario when no feature selection is used, while maintaining the accuracy of the text classifier equivalent to or better than using the whole feature set. We empirically show the effectiveness of our approach in outperforming seven different feature selection methods using two standard benchmark datasets.