An enhanced ACO algorithm to select features for text categorization and its parallelization

Authors:
M. Janaki Meena;K. R. Chandran;A. Karthik;A. Vijay Samuel
Affiliations:
Department of CSE, PSG College of Technology, Coimbatore, Tamil Nadu 641004, India;Department of IT, PSG College of Technology, Coimbatore, Tamil Nadu 641004, India;Department of CSE, PSG College of Technology, Coimbatore, Tamil Nadu 641004, India;Department of CSE, PSG College of Technology, Coimbatore, Tamil Nadu 641004, India
Venue:
Expert Systems with Applications: An International Journal
Year:
2012

Citing 16
Cited 0

The ant colony optimization meta-heuristic

New ideas in optimization
MAX-MIN Ant system

Future Generation Computer Systems
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Study of Some Properties of Ant-Q

PPSN IV Proceedings of the 4th International Conference on Parallel Problem Solving from Nature
Toward Integrating Feature Selection Algorithms for Classification and Clustering

IEEE Transactions on Knowledge and Data Engineering
Scoring and Selecting Terms for Text Categorization

IEEE Intelligent Systems
Evolving Feature Selection

IEEE Intelligent Systems
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Text Clustering with Feature Selection by Using Statistical Data

IEEE Transactions on Knowledge and Data Engineering
Text feature selection using ant colony optimization

Expert Systems with Applications: An International Journal
Feature selection for text classification with Naïve Bayes

Expert Systems with Applications: An International Journal
Review: A review of ant algorithms

Expert Systems with Applications: An International Journal
Using Ant Colony Optimization algorithm for solving project management problems

Expert Systems with Applications: An International Journal
AntNet: distributed stigmergetic control for communications networks

Journal of Artificial Intelligence Research
Hadoop: The Definitive Guide

Hadoop: The Definitive Guide
Ant colony system: a cooperative learning approach to the traveling salesman problem

IEEE Transactions on Evolutionary Computation

Quantified Score

Hi-index	12.05

Visualization

Abstract

Feature selection is an indispensable preprocessing step for effective analysis of high dimensional data. It removes irrelevant features, improves the predictive accuracy and increases the comprehensibility of the model constructed by the classifiers sensitive to features. Finding an optimal feature subset for a problem in an outsized domain becomes intractable and many such feature selection problems have been shown to be NP-hard. Optimization algorithms are frequently designed for NP-hard problems to find nearly optimal solutions with a practical time complexity. This paper formulates the text feature selection problem as a combinatorial problem and proposes an Ant Colony Optimization (ACO) algorithm to find the nearly optimal solution for the same. It differs from the earlier algorithm by Aghdam et al. by including a heuristic function based on statistics and a local search. The algorithm aims at determining a solution that includes 'n' distinct features for each category. Optimization algorithms based on wrapper models show better results but the processes involved in them are time intensive. The availability of parallel architectures as a cluster of machines connected through fast Ethernet has increased the interest on parallelization of algorithms. The proposed ACO algorithm was parallelized and demonstrated with a cluster formed with a maximum of six machines. Documents from 20 newsgroup benchmark dataset were used for experimentation. Features selected by the proposed algorithm were evaluated using Naive bayes classifier and compared with the standard feature selection techniques. It was observed that the performance of the classifier had been improved with the features selected by the enhanced ACO and local search. Error of the classifier decreases over iterations and it was observed that the number of positive features increases with the number of iterations.