An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Hi-index | 0.00 |
In this paper we describe a new on-line document categorization strategy that can be integrated within Web applications. A salient aspect is the use of neural learning in both representation and classification tasks. Within text documents conceived as images, the regions of interest (RoI) containing information meaningful for categorization are identified with the support of a supervised neural network. Text within RoI is represented according to a simple solution that consider the first K words in the text and code them properly. A Kohonen Self-Organizing Map (SOM) is applied to cluster documents that are subsequently labelled by applying a simple majority voting mechanism. Solutions adopted were evaluated by conducting experiments within the context of on-line price comparison services. Results obtained demontrate that the overall classification strategy is able to categorize documents satisfectorily taking into account the high variability of Web pages.