Clustering polish texts with latent semantic analysis

Authors:
Marcin Kuta;Jacek Kitowski
Affiliations:
Institute of Computer Science, AGH University of Science and Technology, Kraków, Poland;Institute of Computer Science, AGH University of Science and Technology, Kraków, Poland
Venue:
ICAISC'10 Proceedings of the 10th international conference on Artifical intelligence and soft computing: Part II
Year:
2010

Citing 3
Cited 0

TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Hierarchical Clustering Algorithms for Document Datasets

Data Mining and Knowledge Discovery
Application of stacked methods to part-of-speech tagging of polish

PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

The document clustering is an important technique of Natural Language Processing (NLP). The paper presents performance of partitional and agglomerative algorithms applied to clustering large number of Polish newspaper articles. We investigate different representations of the documents. The focus of the paper is on the applicability of the Latent Semantic Analysis to such clustering for Polish.