A parallel ACO algorithm to select terms to categorise longer documents

Authors:
M. Janaki Meena;K. R. Chandran;A. Karthik;A. Vijay Samuel
Affiliations:
Department of CSE, PSG College of Technology, Coimbatore - 641004, Tamilnadu, India.;Department of IT, PSG College of Technology, Coimbatore - 641004, Tamilnadu, India.;Department of CSE, PSG College of Technology, Coimbatore - 641004, Tamilnadu, India.;Department of CSE, PSG College of Technology, Coimbatore - 641004, Tamilnadu, India
Venue:
International Journal of Computational Science and Engineering
Year:
2011

Citing 13
Cited 0

The ant colony optimization meta-heuristic

New ideas in optimization
MAX-MIN Ant system

Future Generation Computer Systems
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Study of Some Properties of Ant-Q

PPSN IV Proceedings of the 4th International Conference on Parallel Problem Solving from Nature
Ant Colony Optimization

Ant Colony Optimization
Toward Integrating Feature Selection Algorithms for Classification and Clustering

IEEE Transactions on Knowledge and Data Engineering
Scoring and Selecting Terms for Text Categorization

IEEE Intelligent Systems
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Text Clustering with Feature Selection by Using Statistical Data

IEEE Transactions on Knowledge and Data Engineering
Text feature selection using ant colony optimization

Expert Systems with Applications: An International Journal
Distributional Features for Text Categorization

IEEE Transactions on Knowledge and Data Engineering
AntNet: distributed stigmergetic control for communications networks

Journal of Artificial Intelligence Research
Ant colony system: a cooperative learning approach to the traveling salesman problem

IEEE Transactions on Evolutionary Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text categorisation (TC) is the task of assigning predefined categories to text. The primary step in TC is to transform documents into a representation suitable for machine learning algorithms. Bag of Words is the most popular document representation. Most of the machine learning algorithms are sensitive to the features fed into it and are misled by the high dimensionality of text. Feature selection (FS) is an important preprocessing step to remove redundant and irrelevant terms in the training corpus. This paper proposes an ant colony optimization (ACO) algorithm to select features for categorizing longer documents whose categories are closely related. Heuristic value for each word is computed by the statistical dependency of the term to a category and its compactness value. Compactness of a term indicates its spread in a document. Experiments were conducted with documents from 20 newsgroup and Reuters-21578 benchmarks. The selected features were fed into the naïve Bayes classifier and its performance was analysed. It was observed that the performance of the classifier improves with the features selected by the proposed method. The processes involved in algorithm are time intensive and demands parallelism. Hence the ACO algorithm was parallelised using the MapReduce programming model. The parallel algorithm was implemented and tested with a cluster of six machines formed using Hadoop.