Statistical Identification of Key Phrases for Text Classification

Authors:
Frans Coenen;Paul Leng;Robert Sanderson;Yanbo J. Wang
Affiliations:
Department of Computer Science, The University of Liverpool, Ashton Building, Ashton Street, Liverpool L69 3BX, United Kingdom;Department of Computer Science, The University of Liverpool, Ashton Building, Ashton Street, Liverpool L69 3BX, United Kingdom;Department of Computer Science, The University of Liverpool, Ashton Building, Ashton Street, Liverpool L69 3BX, United Kingdom;Department of Computer Science, The University of Liverpool, Ashton Building, Ashton Street, Liverpool L69 3BX, United Kingdom
Venue:
MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Year:
2007

Citing 7
Cited 5

An evaluation of phrasal and clustered representations on a text categorization task

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Feature Engineering for Text Classification

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Phrase-based Text Representation for Managing the Web Documents

ITCC '03 Proceedings of the International Conference on Information Technology: Computers and Communications
The effect of threshold values on association rule based classification accuracy

Data & Knowledge Engineering
Threshold tuning for improved classification association rule mining

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Document-Base Extraction for Single-Label Text Classification

DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
A Hybrid Statistical Data Pre-processing Approach for Language-Independent Text Classification

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
A parametric methodology for text classification

Journal of Information Science
Hybrid DIAAF/RS: statistical textual feature selection for language-independent text classification

ICDM'10 Proceedings of the 10th industrial conference on Advances in data mining: applications and theoretical aspects
Hybrid and interactive domain-specific translation for multilingual access to digital libraries

NLP4DL'09/AT4DL'09 Proceedings of the 2009 international conference on Advanced language technologies for digital libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

Algorithms for text classification generally involve two stages, the first of which aims to identify textual elements (words and/or phrases) that may be relevant to the classification process. This stage often involves an analysis of the text that is both language-specific and possibly domain-specific, and may also be computationally costly. In this paper we examine a number of alternative keyword-generation methods and phrase-construction strategies that identify key words and phrases by simple, language-independent statistical properties. We present results that demonstrate that these methods can produce good classification accuracy, with the best results being obtained using a phrase-based approach.