When stopword lists make the difference

Authors:
Ljiljana Dolamic;Jacques Savoy
Affiliations:
Computer Science Department, University of Neuchâtel, 2009 Neuchâtel, Switzerland;Computer Science Department, University of Neuchâtel, 2009 Neuchâtel, Switzerland
Venue:
Journal of the American Society for Information Science and Technology
Year:
2010

Citing 10
Cited 5

Online information retrieval: concepts, principles, and techniques

Online information retrieval: concepts, principles, and techniques
Statistical inference in retrieval effectiveness evaluation

Information Processing and Management: an International Journal
A stemming procedure and stopword list for general French corpora

Journal of the American Society for Information Science
Experimentation as a way of life: Okapi at TREC

Information Processing and Management: an International Journal - The sixth text REtrieval conference (TREC-6)
A stop list for general text

ACM SIGIR Forum
Probabilistic models of information retrieval based on measuring the divergence from randomness

ACM Transactions on Information Systems (TOIS)
Introduction to Information Retrieval

Introduction to Information Retrieval
Advances in Multilingual and Multimodal Information Retrieval: 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, Budapest, Hungary, September 19-21, 2007, Revised Selected Papers

Advances in Multilingual and Multimodal Information Retrieval: 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, Budapest, Hungary, September 19-21, 2007, Revised Selected Papers
The R Book

The R Book
Statistical and comparative evaluation of various indexing and search models

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology

New event detection and topic tracking in Turkish

Journal of the American Society for Information Science and Technology
Accuracy of inter-researcher similarity measures based on topical and social clues

Scientometrics
Authorship Attribution Based on Specific Vocabulary

ACM Transactions on Information Systems (TOIS)
On the effect of stopword removal for SMS-Based FAQ retrieval

NLDB'12 Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems
An empirical evaluation of stop word removal in statistical machine translation

EACL 2012 Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this brief communication, we evaluate the use of two stopword lists for the English language (one comprising 571 words and another with 9) and compare them with a search approach accounting for all word forms. We show that through implementing the original Okapi form or certain ones derived from the Divergence from Randomness (DFR) paradigm, significantly lower performance levels may result when using short or no stopword lists. For other DFR models and a revised Okapi implementation, performance differences between approaches using short or long stopword lists or no list at all are usually not statistically significant. Similar conclusions can be drawn when using other natural languages such as French, Hindi, or Persian. © 2010 Wiley Periodicals, Inc.