Using concept-based indexing to improve language modeling approach to genomic IR

Authors:
Xiaohua Zhou;Xiaodan Zhang;Xiaohua Hu
Affiliations:
College of Information Science & Technology, Drexel University, Philadelphia, PA;College of Information Science & Technology, Drexel University, Philadelphia, PA;College of Information Science & Technology, Drexel University, Philadelphia, PA
Venue:
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Year:
2006

Citing 13
Cited 6

Word sense disambiguation and information retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Learning Information Extraction Rules for Semi-Structured and Free Text

Machine Learning - Special issue on natural language learning
Information retrieval as statistical translation

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone

SIGDOC '86 Proceedings of the 5th annual international conference on Systems documentation
A probabilistic model of information retrieval: development and comparative experiments

Information Processing and Management: an International Journal
Document language models, query models, and risk minimization for information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A Multi-Level Text Mining Method to Extract Biological Relationships

CSB '02 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Information extraction from biomedical literature: methodology, evaluation and an application

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
A study of smoothing methods for language models applied to information retrieval

ACM Transactions on Information Systems (TOIS)
Mining knowledge from text using information extraction

ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Converting Semi-structured Clinical Medical Records into Information and Knowledge

ICDEW '05 Proceedings of the 21st International Conference on Data Engineering Workshops
CRYSTAL inducing a conceptual dictionary

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

Context-sensitive semantic smoothing for the language modeling approach to genomic IR

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
MaxMatcher: biological concept extraction using approximate dictionary lookup

PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
Concept-Based Information Retrieval Using Explicit Semantic Analysis

ACM Transactions on Information Systems (TOIS)
Quantifying the impact of concept recognition on biomedical information retrieval

Information Processing and Management: an International Journal
Progress in information retrieval

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Factors affecting the effectiveness of biomedical document indexing and retrieval based on terminologies

Artificial Intelligence in Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

Genomic IR, characterized by its highly specific information need, severe synonym and polysemy problem, long term name and rapid growing literature size, is challenging IR community. In this paper, we are focused on addressing the synonym and polysemy issue within the language model framework. Unlike the ways translation model and traditional query expansion techniques approach this issue, we incorporate concept-based indexing into a basic language model for genomic IR. In particular, we adopt UMLS concepts as indexing and searching terms. A UMLS concept stands for a unique meaning in the biomedicine domain; a set of synonymous terms will share same concept ID. Therefore, the new approach makes the document ranking effective while maintaining the simplicity of language models. A comparative experiment on the TREC 2004 Genomics Track data shows significant improvements are obtained by incorporating concept-based indexing into a basic language model. The MAP (mean average precision) is significantly raised from 29.17% (the baseline system) to 36.94%. The performance of the new approach is also significantly superior to the mean (21.72%) of official runs participated in TREC 2004 Genomics Track and is comparable to the performance of the best run (40.75%). Most official runs including the best run extensively use various query expansion and pseudo-relevance feedback techniques while our approach does nothing except for the incorporation of concept-based indexing, which evidences the view that semantic smoothing, i.e. the incorporation of synonym and sense information into the language models, is a more standard approach to achieving the effects traditional query expansion and pseudo-relevance feedback techniques target.