Word association norms, mutual information, and lexicography
Computational Linguistics
Viewing morphology as an inference process
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Corpus-based stemming using cooccurrence of word variants
ACM Transactions on Information Systems (TOIS)
An algorithm for suffix stripping
Readings in information retrieval
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Improving the effectiveness of information retrieval with local context analysis
ACM Transactions on Information Systems (TOIS)
CLEF Experiments at Maryland: Statistical Stemming and Backoff Translation
CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Unsupervised learning of the morphology of a natural language
Computational Linguistics
A probabilistic model for stemmer generation
Information Processing and Management: an International Journal - Special issue: An Asian digital libraries perspective
The TREC robust retrieval track
ACM SIGIR Forum
Flexible pseudo-relevance feedback via selective sampling
ACM Transactions on Asian Language Information Processing (TALIP)
MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
Light stemming approaches for the French, Portuguese, German and Hungarian languages
Proceedings of the 2006 ACM symposium on Applied computing
Estimation and use of uncertainty in pseudo-relevance feedback
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Context sensitive stemming for web search
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
YASS: Yet another suffix stripper
ACM Transactions on Information Systems (TOIS)
A comparison of statistical significance tests for information retrieval evaluation
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Addressing morphological variation in alphabetic languages
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Indexing and stemming approaches for the Czech language
Information Processing and Management: an International Journal
Comparative Study of Indexing and Search Strategies for the Hindi, Marathi, and Bengali Languages
ACM Transactions on Asian Language Information Processing (TALIP)
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
ACM Transactions on Asian Language Information Processing (TALIP)
A novel corpus-based stemming algorithm using co-occurrence statistics
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
GRAS: An effective and efficient stemming algorithm for information retrieval
ACM Transactions on Information Systems (TOIS)
Hi-index | 0.00 |
Stemming is a widely used technique in information retrieval systems to address the vocabulary mismatch problem arising out of morphological phenomena. The major shortcoming of the commonly used stemmers is that they accept the morphological variants of the query words without considering their thematic coherence with the given query, which leads to poor performance. Moreover, for many queries, such approaches also produce retrieval performance that is poorer than no stemming, thereby degrading the robustness. The main goal of this article is to present corpus-based fully automatic stemming algorithms which address these issues. A set of experiments on six TREC collections and three other non-English collections containing news and web documents shows that the proposed query-based stemming algorithms consistently and significantly outperform four state of the art strong stemmers of completely varying principles. Our experiments also confirm that the robustness of the proposed query-based stemming algorithms are remarkably better than the existing strong baselines.