Second Order Features for Maximising Text Classification Performance

Authors:
Bhavani Raskutti;Herman L. Ferrá;Adam Kowalczyk
Affiliations:
-;-;-
Venue:
EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Year:
2001

Citing 13
Cited 3

Evaluating text categorization

HLT '91 Proceedings of the workshop on Speech and Natural Language
An evaluation of phrasal and clustered representations on a text categorization task

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Techniques for automatically correcting words in text

ACM Computing Surveys (CSUR)
Overview of the second text retrieval conference (TREC-2)

TREC-2 Proceedings of the second conference on Text retrieval conference
Highlights: language- and domain-independent automatic indexing terms for abstracting

Journal of the American Society for Information Science
The nature of statistical learning theory

The nature of statistical learning theory
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Maximizing Text-Mining Performance

IEEE Intelligent Systems
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature Engineering for Text Classification

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Feature selection and feature extraction for text categorization

HLT '91 Proceedings of the workshop on Speech and Natural Language

Learner's Self-Assessment: A Case Study of SVM for Information Retrieval

AI '01 Proceedings of the 14th Australian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
AutoPCS: A Phrase-Based Text Categorization System for Similar Texts

APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
Modeling perspective using adaptor grammars

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The paper demonstrates that the addition of automatically selected word-pairs substantially increases the accuracy of text classification which is contrary to most previously reported research. The wordpairs are selected automatically using a technique based on frequencies of n-grams (sequences of characters), which takes into account both the frequencies of word-pairs as well as the context in which they occur. These improvements are reported for two different classifiers, support vector machines (SVM) and k-nearest neighbours (kNN), and two different text corpora. For the first of them, a collection of articles from PC Week magazine, the addition of word-pairs increases micro-averaged breakeven accuracy by more than 6% point from a baseline accuracy (without pairs) of around 40%. For second one, the standard Reuters benchmark, SVM classifier using augmentation with pairs outperforms all previously reported results.