Feature annotation for text categorization
Proceedings of the CUBE International Information Technology Conference
Hi-index | 0.00 |
Due to the increasing number of documents in digital form, the automated text categorization (TC) has become more and more promising in the last ten years. A TC system can automatically assign a document with the most suitable category, but the reason for such an assignment is usually unknown by users. To make the TC system be interpretable, it is necessary to select a group of keywords, or termed a keyword combination, to describe each text category. In this paper, we propose a novel algorithm, keyword combination extraction based on ant colony optimization (KCEACO), to search the optimal keyword combination of a target category. By extending the traditional feature selection techniques, an evaluation function is designed for evaluating a keyword combination. This function takes into account the relationships among different keywords. Experimental results show that KCEACO can efficiently find the optimal keyword combination from a large number of candidate combinations.