Associative text categorization exploiting negated words

Authors:
Elena Baralis;Paolo Garza
Affiliations:
Politecnico di Torino, Corso Duca degli Abruzzi, Torino, Italy;Politecnico di Torino, Corso Duca degli Abruzzi, Torino, Italy
Venue:
Proceedings of the 2006 ACM symposium on Applied computing
Year:
2006

Citing 15
Cited 3

Expert network: effective and efficient learning from human decisions in text categorization and retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Context-sensitive learning methods for text categorization

ACM Transactions on Information Systems (TOIS)
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Growing decision trees on support-less association rules

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Theory of dependence values

ACM Transactions on Database Systems (TODS)
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Mining for Strong Negative Associations in a Large Database of Customer Transactions

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A Lazy Approach to Pruning Classification Rules

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Text Document Categorization by Term Association

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
An associative classifier based on positive and negative rules

Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery

Learning rules with negation for text categorization

Proceedings of the 2007 ACM symposium on Applied computing
Classification inductive rule learning with negated features

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
GAMoN: Discovering M-of-N{¬,∨} hypotheses for text classification by a lattice-based Genetic Algorithm

Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Associative classification has been recently applied to text document categorization. However, differently from classification of structured data, the quality of the generated classifier is rather low. This effect is mainly due to the poor precision of generated rules.To increase the precision of associative classifiers we propose the use of classification rules including negated words, i.e. words that the considered document should not contain. Rules are in the form "If a document includes words A and B, but not word Z, then it belongs to class C1". Mining classification rules with negated words becomes quickly intractable when decreasing the support threshold. We tackle this problem by means of an opportunistic approach, where negated words are only generated to specialize rules that may wrongly classify training documents. Hence precision is increased, without losing recall.Experiments on the Reuters corpus show that our classifier based on negated words achieves good precision and recall results, while yielding an easily interpretable model typical of associative classifiers.