Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Information Retrieval
Posting compression in dynamic retrieval environments
SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Incremental relevance feedback
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Trigrams as index element in full text retrieval: observations and experimental results
CSC '93 Proceedings of the 1993 ACM conference on Computer science
A document retrieval model based on term frequency ranks
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Collecting user access patterns for building user profiles and collaborative filtering
IUI '99 Proceedings of the 4th international conference on Intelligent user interfaces
Supporting classroom information management with SCOUT
ACM-SE 37 Proceedings of the 37th annual Southeast regional conference (CD-ROM)
The use of phrases from query texts in information retrieval (poster session)
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchical indexing and document matching in BoW
Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
A feature mining based approach for the classification of text documents into disjoint classes
Information Processing and Management: an International Journal
Cross-language information retrieval: experiments based on CLEF 2000 corpora
Information Processing and Management: an International Journal
SQL text parsing for information retrieval
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Improving Efficiency and Relevance Ranking in Information Retrieval
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Technical issues of cross-language information retrieval: a review
Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
Information Processing and Management: an International Journal
On the strength of hyperclique patterns for text categorization
Information Sciences: an International Journal
Searching strategies for the Hungarian language
Information Processing and Management: an International Journal
Document retrieval for question answering: a quantitative evaluation of text preprocessing
Proceedings of the ACM first Ph.D. workshop in CIKM
Current research issues and trends in non-English Web searching
Information Retrieval
Entropy-Based Static Index Pruning
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Indexing and stemming approaches for the Czech language
Information Processing and Management: an International Journal
Indonesian-Japanese CLIR using only limited resource
CLIIR '06 Proceedings of the Workshop on How Can Computational Linguistics Improve Information Retrieval?
When stopword lists make the difference
Journal of the American Society for Information Science and Technology
Re-ranking Documents Based on Query-Independent Document Specificity
FQAS '09 Proceedings of the 8th International Conference on Flexible Query Answering Systems
Information Processing and Management: an International Journal
Static pruning of terms in inverted files
ECIR'07 Proceedings of the 29th European conference on IR research
Viewing term proximity from a different perspective
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Comparative Study of Indexing and Search Strategies for the Hindi, Marathi, and Bengali Languages
ACM Transactions on Asian Language Information Processing (TALIP)
Source code indexing for automated tracing
Proceedings of the 6th International Workshop on Traceability in Emerging Forms of Software Engineering
A text-based decision support system for financial sequence prediction
Decision Support Systems
Query transitive translation using IR score for indonesian-japanese CLIR
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Semantic Processing of Legal Texts
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
The influence of collocation segmentation and top 10 items to keyword assignment performance
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Authorship Attribution Based on Specific Vocabulary
ACM Transactions on Information Systems (TOIS)
Detecting weak signals for long-term business opportunities using text mining of Web news
Expert Systems with Applications: An International Journal
On the effect of stopword removal for SMS-Based FAQ retrieval
NLDB'12 Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems
Translation techniques in cross-language information retrieval
ACM Computing Surveys (CSUR)
An empirical evaluation of stop word removal in statistical machine translation
EACL 2012 Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
A user term visualization analysis based on a social question and answer log
Information Processing and Management: an International Journal
Semantic Approach to Web-Based Discovery of Unknowns to Enhance Intelligence Gathering
International Journal of Information Retrieval Research
Hi-index | 0.00 |
A stop list, or negative dictionary is a device used in automatic indexing to filter out words that would make poor index terms. Traditionally stop lists are supposed to have included only the most frequently occurring words. In practice, however, stop lists have tended to include infrequently occurring words, and have not included many frequently occurring words. Infrequently occurring words seem to have been included because stop list compilers have not, for whatever reason, consulted empirical studies of word frequencies. Frequently occurring words seem to have been left out for the same reason, and also because many of them might still be important as index terms.This paper reports an exercise in generating a stop list for general text based on the Brown corpus of 1,014,000 words drawn from a broad range of literature in English. We start with a list of tokens occurring more than 300 times in the Brown corpus. From this list of 278 words, 32 are culled on the grounds that they are too important as potential index terms. Twenty-six words are then added to the list in the belief that they may occur very frequently in certain kinds of literature. Finally, 149 words are added to the list because the finite state machine based filter in which this list is intended to be used is able to filter them at almost no cost. The final product is a list of 421 stop words that should be maximally efficient and effective in filtering the most frequently occurring and semantically neutral words in general literature in English.