Exploring criteria for successful query expansion in the genomic domain

Authors:
Nicola Stokes;Yi Li;Lawrence Cavedon;Justin Zobel
Affiliations:
NICTA Victoria Research Lab, Department of Computer Science and Software Engineering, The University of Melbourne, Melbourne, Australia;NICTA Victoria Research Lab, Department of Computer Science and Software Engineering, The University of Melbourne, Melbourne, Australia;NICTA Victoria Research Lab, Department of Computer Science and Software Engineering, The University of Melbourne, Melbourne, Australia;NICTA Victoria Research Lab, Department of Computer Science and Software Engineering, The University of Melbourne, Melbourne, Australia
Venue:
Information Retrieval
Year:
2009

Citing 14
Cited 13

The vocabulary problem in human-system communication

Communications of the ACM
Lexical ambiguity and information retrieval

ACM Transactions on Information Systems (TOIS)
OHSUMED: an interactive retrieval evaluation and new large test collection for research

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Retrieving with Good Sense

Information Retrieval
The role of context in question answering systems

CHI '03 Extended Abstracts on Human Factors in Computing Systems
Language Modeling for Information Retrieval

Language Modeling for Information Retrieval
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
A survey on the use of relevance feedback for information access systems

The Knowledge Engineering Review
Knowledge-based query expansion to support scenario-specific retrieval of medical free text

Proceedings of the 2005 ACM symposium on Applied computing
ADAM: another database of abbreviations in MEDLINE

Bioinformatics
Argumentative feedback: a linguistically-motivated term expansion for information retrieval

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Knowledge-intensive conceptual retrieval and passage extraction of biomedical literature

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
The influence of basic tokenization on biomedical document retrieval

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

TREC genomics special issue overview

Information Retrieval
William Hersh: Information retrieval: a health and biomedical perspective, 3rd ed

Information Retrieval
Towards effective genomic information retrieval: The impact of query complexity and expansion strategies

Journal of Information Science
Conceptual language models for domain-specific retrieval

Information Processing and Management: an International Journal
A cross-lingual framework for monolingual biomedical information retrieval

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Managing misspelled queries in IR applications

Information Processing and Management: an International Journal
AskHERMES: An online question answering system for complex clinical questions

Journal of Biomedical Informatics
Combining global and local semantic contexts for improving biomedical information retrieval

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Towards a context sensitive approach to searching information based on domain specific knowledge sources

Web Semantics: Science, Services and Agents on the World Wide Web
Factors affecting the effectiveness of biomedical document indexing and retrieval based on terminologies

Artificial Intelligence in Medicine
Inferring conceptual relationships to improve medical records search

Proceedings of the 10th Conference on Open Research Areas in Information Retrieval
Building optimal information systems automatically: configuration space exploration for biomedical information systems

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Explicitly integrating MeSH thesaurus help into health information retrieval systems: An empirical user study

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Query Expansion is commonly used in Information Retrieval to overcome vocabulary mismatch issues, such as synonymy between the original query terms and a relevant document. In general, query expansion experiments exhibit mixed results. Overall TREC Genomics Track results are also mixed; however, results from the top performing systems provide strong evidence supporting the need for expansion. In this paper, we examine the conditions necessary for optimal query expansion performance with respect to two system design issues: IR framework and knowledge source used for expansion. We present a query expansion framework that improves Okapi baseline passage MAP performance by 185%. Using this framework, we compare and contrast the effectiveness of a variety of biomedical knowledge sources used by TREC 2006 Genomics Track participants for expansion. Based on the outcome of these experiments, we discuss the success factors required for effective query expansion with respect to various sources of term expansion, such as corpus-based cooccurrence statistics, pseudo-relevance feedback methods, and domain-specific and domain-independent ontologies and databases. Our results show that choice of document ranking algorithm is the most important factor affecting retrieval performance on this dataset. In addition, when an appropriate ranking algorithm is used, we find that query expansion with domain-specific knowledge sources provides an equally substantive gain in performance over a baseline system.