Recent trends in hierarchic document clustering: a critical review
Information Processing and Management: an International Journal
Comparison of hierarchic agglomerative clustering methods for document retrieval
The Computer Journal
Information retrieval
Viewing morphology as an inference process
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Stemming algorithms: a case study for detailed evaluation
Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
Stemming methodologies over individual query words for an Arabic information retrieval system
Journal of the American Society for Information Science
A stemming procedure and stopword list for general French corpora
Journal of the American Society for Information Science
Machine Learning
Hierarchical Clustering Using Non-Greedy Principal Direction Divisive Partitioning
Information Retrieval
Using graded relevance assessments in IR evaluation
Journal of the American Society for Information Science and Technology
Cluster Analysis
Light stemming approaches for the French, Portuguese, German and Hungarian languages
Proceedings of the 2006 ACM symposium on Applied computing
Searching strategies for the Hungarian language
Information Processing and Management: an International Journal
A novel Arabic lemmatization algorithm
Proceedings of the second workshop on Analytics for noisy unstructured text data
A Mixed Method Lemmatization Algorithm Using a Hierarchy of Linguistic Identities (HOLI)
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
A lemmatization method for Mongolian and its application to indexing for information retrieval
Information Processing and Management: an International Journal
Indexing and stemming approaches for the Czech language
Information Processing and Management: an International Journal
Indexing and searching strategies for the Russian language
Journal of the American Society for Information Science and Technology
Comparative Study of Indexing and Search Strategies for the Hindi, Marathi, and Bengali Languages
ACM Transactions on Asian Language Information Processing (TALIP)
Implementation of a new method for stemming in Persian language
Proceedings of the International Conference on Web Intelligence, Mining and Semantics
Four stemmers and a funeral: stemming in hungarian at CLEF 2005
CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Tools for nominalization: an alternative for lexical normalization
PROPOR'06 Proceedings of the 7th international conference on Computational Processing of the Portuguese Language
Athena: text mining based discovery of scientific workflows in disperse repositories
RED'10 Proceedings of the Third international conference on Resource Discovery
Clustering and categorization of Brazilian portuguese legal documents
PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language
Clustering a very large number of textual unstructured customers' reviews in english
AIMSA'12 Proceedings of the 15th international conference on Artificial Intelligence: methodology, systems, and applications
Hi-index | 0.00 |
Stemming and lemmatization were compared in the clustering of Finnish text documents. Since Finnish is a highly inflectional and agglutinative language, we hypothesized that lemmatization, involving splitting of the compound words, would be more appropriate normalization approach than the straightforward stemming. The relevance of the documents were evaluated with a four-point relevance assessment scale, which was collapsed into binary one by considering all the relevant and only the highly relevant documents relevant, respectively. Experiments with four hierarchical clustering methods supported the hypothesis. The stringent relevance scale showed that lemmatization allowed the single and complete linkage methods to recover especially the highly relevant documents better than stemming. In comparison with stemming, lemmatization together with the average linkage and Ward's methods produced higher precision. We conclude that lemmatization is a better word normalization method than stemming, when Finnish text documents are clustered for information retrieval.