Evaluating text categorization
HLT '91 Proceedings of the workshop on Speech and Natural Language
An evaluation of phrasal and clustered representations on a text categorization task
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Techniques for automatically correcting words in text
ACM Computing Surveys (CSUR)
Overview of the second text retrieval conference (TREC-2)
TREC-2 Proceedings of the second conference on Text retrieval conference
Highlights: language- and domain-independent automatic indexing terms for abstracting
Journal of the American Society for Information Science
The nature of statistical learning theory
The nature of statistical learning theory
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
An introduction to support Vector Machines: and other kernel-based learning methods
An introduction to support Vector Machines: and other kernel-based learning methods
Maximizing Text-Mining Performance
IEEE Intelligent Systems
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature Engineering for Text Classification
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Feature selection and feature extraction for text categorization
HLT '91 Proceedings of the workshop on Speech and Natural Language
Learner's Self-Assessment: A Case Study of SVM for Information Retrieval
AI '01 Proceedings of the 14th Australian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
AutoPCS: A Phrase-Based Text Categorization System for Similar Texts
APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
Modeling perspective using adaptor grammars
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Hi-index | 0.00 |
The paper demonstrates that the addition of automatically selected word-pairs substantially increases the accuracy of text classification which is contrary to most previously reported research. The wordpairs are selected automatically using a technique based on frequencies of n-grams (sequences of characters), which takes into account both the frequencies of word-pairs as well as the context in which they occur. These improvements are reported for two different classifiers, support vector machines (SVM) and k-nearest neighbours (kNN), and two different text corpora. For the first of them, a collection of articles from PC Week magazine, the addition of word-pairs increases micro-averaged breakeven accuracy by more than 6% point from a baseline accuracy (without pairs) of around 40%. For second one, the standard Reuters benchmark, SVM classifier using augmentation with pairs outperforms all previously reported results.