Information retrieval: data structures and algorithms
Information retrieval: data structures and algorithms
Unsupervised learning of the morphology of a natural language
Computational Linguistics
ACM Transactions on Asian Language Information Processing (TALIP)
A Bayesian model for morpheme and paradigm identification
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Knowledge-free induction of inflectional morphologies
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Unsupervised segmentation of words using prior distributions of morph length and frequency
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
YASS: Yet another suffix stripper
ACM Transactions on Information Systems (TOIS)
Multilingual noise-robust supervised morphological analysis using the WordFrame model
SIGMorPhon '04 Proceedings of the 7th Meeting of the ACL Special Interest Group in Computational Phonology: Current Themes in Computational Phonology and Morphology
Morphology induction from term clusters
CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Large-coverage root lexicon extraction for Hindi
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Assas-Band, an affix-exception-list based Urdu stemmer
ALR7 Proceedings of the 7th Workshop on Asian Language Resources
Analysis and evaluation of stemming algorithms: a case study with Assamese
Proceedings of the International Conference on Advances in Computing, Communications and Informatics
An improved stemming approach using HMM for a highly inflectional language
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Hi-index | 0.00 |
Stemmers are used to convert inflected words into their root or stem. Stem does not necessarily correspond to linguistic root of a word. Stemming improve performance by reducing morphologically variants into same words. This paper presents an approach is to develop unsupervised Hindi stemmer. This paper focus on the development of an unsupervised stemmer for Hindi and evaluation of approach using manually segmented words. We evaluate our approach on 1000-1000 words randomly extracted words (only) from Hindi WordNet1 data base. The training data has been constructed by extracting 106403 words extracted from EMILLE2 corpus. The observed accuracy was found to be 89.9% after applying some heuristic measures. The F-score was 94.96%. As the algorithm does not require any language specific information, it can be applied to other Indian languages as well. We also evaluate the effect of stemmer in terms of reducing size of index for Hindi information retrieval task. The results have been compared with light weight stemmer [10] and UMass stemmer [17]. Test run shows that our stemmer outperforms both the stemmer.