An evaluation of phrasal and clustered representations on a text categorization task
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Feature Engineering for Text Classification
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Phrase-based Text Representation for Managing the Web Documents
ITCC '03 Proceedings of the International Conference on Information Technology: Computers and Communications
The effect of threshold values on association rule based classification accuracy
Data & Knowledge Engineering
Threshold tuning for improved classification association rule mining
PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Document-Base Extraction for Single-Label Text Classification
DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
A Hybrid Statistical Data Pre-processing Approach for Language-Independent Text Classification
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
A parametric methodology for text classification
Journal of Information Science
Hybrid DIAAF/RS: statistical textual feature selection for language-independent text classification
ICDM'10 Proceedings of the 10th industrial conference on Advances in data mining: applications and theoretical aspects
Hybrid and interactive domain-specific translation for multilingual access to digital libraries
NLP4DL'09/AT4DL'09 Proceedings of the 2009 international conference on Advanced language technologies for digital libraries
Hi-index | 0.00 |
Algorithms for text classification generally involve two stages, the first of which aims to identify textual elements (words and/or phrases) that may be relevant to the classification process. This stage often involves an analysis of the text that is both language-specific and possibly domain-specific, and may also be computationally costly. In this paper we examine a number of alternative keyword-generation methods and phrase-construction strategies that identify key words and phrases by simple, language-independent statistical properties. We present results that demonstrate that these methods can produce good classification accuracy, with the best results being obtained using a phrase-based approach.