Document classification system based on HMM word map

  • Authors:
  • Tsimboukakis Nikolaos;Tambouratzis George

  • Affiliations:
  • Institute for Language and Speech Processing, Athens, Greece;Institute for Language and Speech Processing, Athens, Greece

  • Venue:
  • CSTST '08 Proceedings of the 5th international conference on Soft computing as transdisciplinary science and technology
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this article, a system based on Hidden Markov Models (HMM) for document organization is presented. The purpose of the system is the classification of a document collection in terms of document content. The system possesses a two-level hybrid connectionist architecture that comprises (i) an automatically created word map using a HMM, which functions as a feature extraction module and (ii) a supervised MLP-based classifier, which provides the final classification result. A series of experiments, which have been performed on Modern Greek text-only documents, is presented. These experiments illustrate the effectiveness of the proposed system.