Presenting results of experimental retrieval comparisons
Information Processing and Management: an International Journal - Special issue on evaluation issues in information retrieval
Stemming algorithms: a case study for detailed evaluation
Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
Corpus-based stemming using cooccurrence of word variants
ACM Transactions on Information Systems (TOIS)
On designing an automated Malaysian stemmer for the Malay language (poster session)
IRAL '00 Proceedings of the fifth international workshop on on Information retrieval with Asian languages
Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
A novel method for stemmer generation based on hidden markov models
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Stemming and lemmatization in the clustering of finnish text documents
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Extracting loanwords from Mongolian corpora and producing a Japanese-Mongolian bilingual dictionary
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Towards the lemmatisation of polish nominal syntactic groups using a shallow grammar
SIIS'11 Proceedings of the 2011 international conference on Security and Intelligent Information Systems
A software tool for building a statistical prefix processor
Proceedings of the Fifth Balkan Conference in Informatics
Hi-index | 0.00 |
In Mongolian, two different alphabets are used, Cyrillic and Mongolian. In this paper, we focus solely on the Mongolian language using the Cyrillic alphabet, in which a content word can be inflected when concatenated with one or more suffixes. Identifying the original form of content words is crucial for natural language processing and information retrieval. We propose a lemmatization method for Mongolian. The advantage of our lemmatization method is that it does not rely on noun dictionaries, enabling us to lemmatize out-of-dictionary words. We also apply our method to indexing for information retrieval. We use newspaper articles and technical abstracts in experiments that show the effectiveness of our method. Our research is the first significant exploration of the effectiveness of lemmatization for information retrieval in Mongolian.