Clustering polish texts with latent semantic analysis

  • Authors:
  • Marcin Kuta;Jacek Kitowski

  • Affiliations:
  • Institute of Computer Science, AGH University of Science and Technology, Kraków, Poland;Institute of Computer Science, AGH University of Science and Technology, Kraków, Poland

  • Venue:
  • ICAISC'10 Proceedings of the 10th international conference on Artifical intelligence and soft computing: Part II
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The document clustering is an important technique of Natural Language Processing (NLP). The paper presents performance of partitional and agglomerative algorithms applied to clustering large number of Polish newspaper articles. We investigate different representations of the documents. The focus of the paper is on the applicability of the Latent Semantic Analysis to such clustering for Polish.