Online information retrieval: concepts, principles, and techniques
Online information retrieval: concepts, principles, and techniques
Statistical inference in retrieval effectiveness evaluation
Information Processing and Management: an International Journal
A stemming procedure and stopword list for general French corpora
Journal of the American Society for Information Science
Experimentation as a way of life: Okapi at TREC
Information Processing and Management: an International Journal - The sixth text REtrieval conference (TREC-6)
ACM SIGIR Forum
Probabilistic models of information retrieval based on measuring the divergence from randomness
ACM Transactions on Information Systems (TOIS)
Introduction to Information Retrieval
Introduction to Information Retrieval
Advances in Multilingual and Multimodal Information Retrieval: 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, Budapest, Hungary, September 19-21, 2007, Revised Selected Papers
The R Book
Statistical and comparative evaluation of various indexing and search models
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
New event detection and topic tracking in Turkish
Journal of the American Society for Information Science and Technology
Authorship Attribution Based on Specific Vocabulary
ACM Transactions on Information Systems (TOIS)
On the effect of stopword removal for SMS-Based FAQ retrieval
NLDB'12 Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems
An empirical evaluation of stop word removal in statistical machine translation
EACL 2012 Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
Hi-index | 0.00 |
In this brief communication, we evaluate the use of two stopword lists for the English language (one comprising 571 words and another with 9) and compare them with a search approach accounting for all word forms. We show that through implementing the original Okapi form or certain ones derived from the Divergence from Randomness (DFR) paradigm, significantly lower performance levels may result when using short or no stopword lists. For other DFR models and a revised Okapi implementation, performance differences between approaches using short or long stopword lists or no list at all are usually not statistically significant. Similar conclusions can be drawn when using other natural languages such as French, Hindi, or Persian. © 2010 Wiley Periodicals, Inc.