A lemmatization method for Mongolian and its application to indexing for information retrieval

Authors:
Badam-Osor Khaltar;Atsushi Fujii
Affiliations:
Graduate School of Library, Information and Media Studies, University of Tsukuba, 1-2 Kasuga, Tsukuba 305-8550, Japan;Graduate School of Library, Information and Media Studies, University of Tsukuba, 1-2 Kasuga, Tsukuba 305-8550, Japan
Venue:
Information Processing and Management: an International Journal
Year:
2009

Citing 8
Cited 2

Presenting results of experimental retrieval comparisons

Information Processing and Management: an International Journal - Special issue on evaluation issues in information retrieval
Stemming algorithms: a case study for detailed evaluation

Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
Corpus-based stemming using cooccurrence of word variants

ACM Transactions on Information Systems (TOIS)
On designing an automated Malaysian stemmer for the Malay language (poster session)

IRAL '00 Proceedings of the fifth international workshop on on Information retrieval with Asian languages
Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
A novel method for stemmer generation based on hidden markov models

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Stemming and lemmatization in the clustering of finnish text documents

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Extracting loanwords from Mongolian corpora and producing a Japanese-Mongolian bilingual dictionary

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics

Towards the lemmatisation of polish nominal syntactic groups using a shallow grammar

SIIS'11 Proceedings of the 2011 international conference on Security and Intelligent Information Systems
A software tool for building a statistical prefix processor

Proceedings of the Fifth Balkan Conference in Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

In Mongolian, two different alphabets are used, Cyrillic and Mongolian. In this paper, we focus solely on the Mongolian language using the Cyrillic alphabet, in which a content word can be inflected when concatenated with one or more suffixes. Identifying the original form of content words is crucial for natural language processing and information retrieval. We propose a lemmatization method for Mongolian. The advantage of our lemmatization method is that it does not rely on noun dictionaries, enabling us to lemmatize out-of-dictionary words. We also apply our method to indexing for information retrieval. We use newspaper articles and technical abstracts in experiments that show the effectiveness of our method. Our research is the first significant exploration of the effectiveness of lemmatization for information retrieval in Mongolian.