Information retrieval: data structures and algorithms
Information retrieval: data structures and algorithms
Viewing morphology as an inference process
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval
Fuzzy translation of cross-lingual spelling variants
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Modeling and learning multilingual inflectional morphology in a minimally supervised framework
Modeling and learning multilingual inflectional morphology in a minimally supervised framework
Word normalization and decompounding in mono- and bilingual IR
Information Retrieval
Is 1 noun worth 2 adjectives?: measuring relative feature utility
Information Processing and Management: an International Journal
YASS: Yet another suffix stripper
ACM Transactions on Information Systems (TOIS)
Analysis of long queries in a large scale search log
Proceedings of the 2009 workshop on Web Search Click Data
A probabilistic model for guessing base forms of new words by analogy
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Is a morphologically complex language really that complex in full-text retrieval?
FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
IRFC'12 Proceedings of the 5th conference on Multidisciplinary Information Retrieval
Translation techniques in cross-language information retrieval
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
We present a dictionary- and corpus-independent statistical lemmatizer StaLe that deals with the out-of-vocabulary (OOV) problem of dictionary-based lemmatization by generating candidate lemmas for any inflected word forms. StaLe can be applied with little effort to languages lacking linguistic resources. We show the performance of StaLe both in lemmatization tasks alone and as a component in an IR system using several datasets and query types in four high resource languages. StaLe is competitive, reaching 88-108 % of gold standard performance of a commercial lemmatizer in IR experiments. Despite competitive performance, it is compact, efficient and fast to apply to new languages.