Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Measuring the interestingness of articles in a limited user environment
Information Processing and Management: an International Journal
Hi-index | 0.00 |
An important step in building up the document database of a full-text retrieval system is to classify each document under one or more classes according to the topical domains that the document discusses. This is commonly referred to as classification. Automatic classification attempts to replace human classifiers by using computers to automate this process. Automatic classification has two major components: (1) the classification scheme which defines the available classes under which a document can be classified and their inter-relationships; and (2) the classification algorithm which defines the rules and procedures for assigning one or more classes defined in the classification scheme to a document.In this paper, we present an automatic classification approach called ACTION. The design goal of ACTION is to achieve the appropriate balance between specificity and exhaustivity, which are important metrics for assessing an automatic classification approach. The key idea of ACTION is a scheme for measuring the significance of each keyword in a given document. The scheme not only takes into account the occurrence frequency of a keyword, but also the logical relationships between the available classes.