Statistical inference in retrieval effectiveness evaluation
Information Processing and Management: an International Journal
Experimentation as a way of life: Okapi at TREC
Information Processing and Management: an International Journal - The sixth text REtrieval conference (TREC-6)
Lucene in Action (In Action series)
Lucene in Action (In Action series)
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
An Introduction to Language Processing with Perl and Prolog: An Outline of Theories, Implementation, and Application with Special Consideration of English, French, and German (Cognitive Technologies)
Searching in Medline: Query expansion and manual indexing evaluation
Information Processing and Management: an International Journal
DEXA '09 Proceedings of the 2009 20th International Workshop on Database and Expert Systems Application
CLEF'04 Proceedings of the 5th conference on Cross-Language Evaluation Forum: multilingual Information Access for Text, Speech and Images
Toward a model of domain-specific search
Proceedings of the 10th Conference on Open Research Areas in Information Retrieval
Hi-index | 0.00 |
The default implementation in Lucene, an open-source search engine, is the well-known vector-space model with tf idf weighting. The objective of this paper is to propose and evaluate additional techniques that can be adapted to this search model, in order to meet the particular needs of domainspecific information retrieval (IR). In this paper, we suggest certain specificity measures derived from either information theory or corpus-based linguistics. As an additional feature we suggest accounting for the number of search terms that a query and retrieved documents have in common. To integrate these methods we design and implement four extensions to the classical tf idf model and then evaluate the new IR models by applying them to four different domain-specific collections and comparing them to results found by a probabilistic retrieval model. The results tend to demonstrate that the adapted vector-space models clearly outperform the baseline approach (tf idf) and that performance levels obtained even surpass those found in the Okapi model.