Online information retrieval: concepts, principles, and techniques
Online information retrieval: concepts, principles, and techniques
Viewing morphology as an inference process
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Stemming algorithms: a case study for detailed evaluation
Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
Statistical inference in retrieval effectiveness evaluation
Information Processing and Management: an International Journal
Corpus-based stemming using cooccurrence of word variants
ACM Transactions on Information Systems (TOIS)
Finding information on the World Wide Web: the retrieval effectiveness of search engines
Information Processing and Management: an International Journal
A stemming procedure and stopword list for general French corpora
Journal of the American Society for Information Science
Experimentation as a way of life: Okapi at TREC
Information Processing and Management: an International Journal - The sixth text REtrieval conference (TREC-6)
ACM SIGIR Forum
Probabilistic models of information retrieval based on measuring the divergence from randomness
ACM Transactions on Information Systems (TOIS)
Cross-Language Evaluation Forum: Objectives, Results, Achievements
Information Retrieval
Character N-Gram Tokenization for European Language Text Retrieval
Information Retrieval
How Effective is Stemming and Decompounding for German Text Retrieval?
Information Retrieval
Stemming and lemmatization in the clustering of finnish text documents
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Accessing Multilingual Information Repositories: 6th Workshop of the Cross-Language Evaluation Forum, CLEF 2005,Vienna, Austria, 21-23 September, 2005, ... Papers (Lecture Notes in Computer Science)
Comparative Evaluation of Multilingual Information Access Systems: 4th Workshop of the Cross-Language Evaluation Forum, CLEF 2003, Trondheim, Norway, August ... Papers (Lecture Notes in Computer Science)
Is a morphologically complex language really that complex in full-text retrieval?
FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Statistical and comparative evaluation of various indexing and search models
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Current research issues and trends in non-English Web searching
Information Retrieval
Ad hoc retrieval with the Persian language
CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
ACM Transactions on Asian Language Information Processing (TALIP)
A novel corpus-based stemming algorithm using co-occurrence statistics
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
GRAS: An effective and efficient stemming algorithm for information retrieval
ACM Transactions on Information Systems (TOIS)
A fuzzy ranking approach for improving search results in Turkish as an agglutinative language
Expert Systems with Applications: An International Journal
A hybrid approach for extracting informative content from web pages
Information Processing and Management: an International Journal
Hi-index | 0.00 |
This paper reports on the underlying IR problems encountered when dealing with the complex morphology and compound constructions found in the Hungarian language. It describes evaluations carried out on two general stemming strategies for this language, and also demonstrates that a light stemming approach could be quite effective. Based on searches done on the CLEF test collection, we find that a more aggressive suffix-stripping approach may produce better MAP. When compared to an IR scheme without stemming or one based on only a light stemmer, we find the differences to be statistically significant. When compared with probabilistic, vector-space and language models, we find that the Okapi model results in the best retrieval effectiveness. The resulting MAP is found to be about 35% better than the classical tf idf approach, particularly for very short requests. Finally, we demonstrate that applying an automatic decompounding procedure for both queries and documents significantly improves IR performance (+10%), compared to word-based indexing strategies.