Statistical Identification of Key Phrases for Text Classification

  • Authors:
  • Frans Coenen;Paul Leng;Robert Sanderson;Yanbo J. Wang

  • Affiliations:
  • Department of Computer Science, The University of Liverpool, Ashton Building, Ashton Street, Liverpool L69 3BX, United Kingdom;Department of Computer Science, The University of Liverpool, Ashton Building, Ashton Street, Liverpool L69 3BX, United Kingdom;Department of Computer Science, The University of Liverpool, Ashton Building, Ashton Street, Liverpool L69 3BX, United Kingdom;Department of Computer Science, The University of Liverpool, Ashton Building, Ashton Street, Liverpool L69 3BX, United Kingdom

  • Venue:
  • MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Algorithms for text classification generally involve two stages, the first of which aims to identify textual elements (words and/or phrases) that may be relevant to the classification process. This stage often involves an analysis of the text that is both language-specific and possibly domain-specific, and may also be computationally costly. In this paper we examine a number of alternative keyword-generation methods and phrase-construction strategies that identify key words and phrases by simple, language-independent statistical properties. We present results that demonstrate that these methods can produce good classification accuracy, with the best results being obtained using a phrase-based approach.