Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Generating logical expressions from positive and negative examples via a branch-and-bound approach
Computers and Operations Research
Overview of the second text retrieval conference (TREC-2)
TREC-2 Proceedings of the second conference on Text retrieval conference
Term-relevance computations and perfect retrieval performance
Information Processing and Management: an International Journal
Optimization of relevance feedback weights
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
A vector space model for automatic indexing
Communications of the ACM
ACM SIGIR Forum
Text Information Retrieval Systems
Text Information Retrieval Systems
The Cluster Dissection and Analysis Theory FORTRAN Programs Examples
The Cluster Dissection and Analysis Theory FORTRAN Programs Examples
Information Retrieval
Statistical Analysis for Engineers and Scientists: A Computer-Based Approach (IBM)
Statistical Analysis for Engineers and Scientists: A Computer-Based Approach (IBM)
Computers and Operations Research
Automatic Information Organization and Retrieval.
Automatic Information Organization and Retrieval.
The text retrieval conferences (TRECS)
TIPSTER '98 Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998
Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques (Massive Computing)
Mathematical and Computer Modelling: An International Journal
An approach to guided learning of boolean functions
Mathematical and Computer Modelling: An International Journal
Adaptive anti-spam filtering for agglutinative languages: a special case for Turkish
Pattern Recognition Letters
Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
Information Sciences: an International Journal
A fuzzy ontological knowledge document clustering methodology
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Hi-index | 0.00 |
This paper proposes a new approach for classifying text documents into two disjoint classes. The new approach is based on extracting patterns, in the form of two logical expressions, which are defined on various features (indexing terms) of the documents. The pattern extraction is aimed at providing descriptions (in the form of two logical expressions) of the two classes of positive and negative examples. This is achieved by means of a data mining approach, called One Clause At a Time (OCAT), which is based on mathematical logic. The application of a logic-based approach to text document classification is critical when one wishes to be able to justify why a particular document has been assigned to one class versus the other class. This situation occurs, for instance, in declassifying documents that have been previously considered important to national security and thus are currently being kept as secret. Some computational experiments have investigated the effectiveness of the OCAT-based approach and compared it to the well-known vector space model (VSM). These tests also have investigated finding the best indexing terms that could be used in making these classification decisions. The results of these computational experiments on a sample of 2897 text documents from the TIPSTER collection indicate that the first approach has many advantages over the VSM approach for solving this type of text document classification problem. Moreover, a guided strategy for the OCAT-based approach is presented for deciding which document one needs to consider next while building the training example sets.