The ant colony optimization meta-heuristic
New ideas in optimization
Future Generation Computer Systems
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Study of Some Properties of Ant-Q
PPSN IV Proceedings of the 4th International Conference on Parallel Problem Solving from Nature
Ant Colony Optimization
Toward Integrating Feature Selection Algorithms for Classification and Clustering
IEEE Transactions on Knowledge and Data Engineering
Scoring and Selecting Terms for Text Categorization
IEEE Intelligent Systems
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Text Clustering with Feature Selection by Using Statistical Data
IEEE Transactions on Knowledge and Data Engineering
Text feature selection using ant colony optimization
Expert Systems with Applications: An International Journal
Distributional Features for Text Categorization
IEEE Transactions on Knowledge and Data Engineering
AntNet: distributed stigmergetic control for communications networks
Journal of Artificial Intelligence Research
Ant colony system: a cooperative learning approach to the traveling salesman problem
IEEE Transactions on Evolutionary Computation
Hi-index | 0.00 |
Text categorisation (TC) is the task of assigning predefined categories to text. The primary step in TC is to transform documents into a representation suitable for machine learning algorithms. Bag of Words is the most popular document representation. Most of the machine learning algorithms are sensitive to the features fed into it and are misled by the high dimensionality of text. Feature selection (FS) is an important preprocessing step to remove redundant and irrelevant terms in the training corpus. This paper proposes an ant colony optimization (ACO) algorithm to select features for categorizing longer documents whose categories are closely related. Heuristic value for each word is computed by the statistical dependency of the term to a category and its compactness value. Compactness of a term indicates its spread in a document. Experiments were conducted with documents from 20 newsgroup and Reuters-21578 benchmarks. The selected features were fed into the naïve Bayes classifier and its performance was analysed. It was observed that the performance of the classifier improves with the features selected by the proposed method. The processes involved in algorithm are time intensive and demands parallelism. Hence the ACO algorithm was parallelised using the MapReduce programming model. The parallel algorithm was implemented and tested with a cluster of six machines formed using Hadoop.