A corpus analysis approach for automatic query expansion and its extension to multiple databases

Authors:
Susan Gauch;Jianying Wang;Satya Mahesh Rachakonda
Affiliations:
University of Kansas;University of Kansas;University of Kansas
Venue:
ACM Transactions on Information Systems (TOIS)
Year:
1999

Citing 9
Cited 25

Word association norms, mutual information, and lexicography

Computational Linguistics
Search improvement via automatic query reformulation

ACM Transactions on Information Systems (TOIS) - Special issue on research and development in information retrieval
Relevance feedback revisited

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Experiments in automatic statistical thesaurus construction

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Use of syntactic context to produce term association lists for text retrieval

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Concept based query expansion

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Query expansion using lexical-semantic relations

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Query expansion using local and global document analysis

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Implementation of the SMART Information Retrieval System

Implementation of the SMART Information Retrieval System

Incorporating quality metrics in centralized/distributed information retrieval on the World Wide Web

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Exploiting a controlled vocabulary to improve collection selection and retrieval effectiveness

Proceedings of the tenth international conference on Information and knowledge management
Automatic query wefinement using lexical affinities with maximal information gain

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Exploiting Manual Indexing to Improve Collection Selection and Retrieval Effectiveness

Information Retrieval
A Methodology to Retrieve Text Documents from Multiple Databases

IEEE Transactions on Knowledge and Data Engineering
Assessing the term independence assumption in blind relevance feedback

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Enhancing semantic digital library query using a content and service inference model (CSIM)

Information Processing and Management: an International Journal
SDQE: towards automatic semantic query optimization in P2P systems

Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
Keyphrase extraction-based query expansion in digital libraries

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Updating ontologies in the legal domain

ICAIL '05 Proceedings of the 10th international conference on Artificial intelligence and law
P-TAG: large scale automatic generation of personalized annotation tags for the web

Proceedings of the 16th international conference on World Wide Web
SLOQUE: slot-based query expansion for complex questions

Proceedings of the ACM first workshop on CyberInfrastructure: information management in eScience
On the conceptual tag refinement

Proceedings of the 2008 ACM symposium on Applied computing
Learning semantic relatedness from term discrimination information

Expert Systems with Applications: An International Journal
Managing Word Mismatch Problems in Information Retrieval: A Topic-Based Query Expansion Approach

Journal of Management Information Systems
Correcting queries for XML

Information Systems
Correcting queries for XML

Information Systems
Bootstrapping distributional feature vector quality

Computational Linguistics
SDQE: towards automatic semantic query optimization in P2P systems

Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
Using evidence based content trust model for spam detection

Expert Systems with Applications: An International Journal
Directional distributional similarity for lexical inference

Natural Language Engineering
Combining linguistic indexes to improve the performances of information retrieval systems: a machine learning based solution

Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
A Survey of Automatic Query Expansion in Information Retrieval

ACM Computing Surveys (CSUR)
A novel resource description based approach for clustering peers

WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
Using NLP techniques to identify legal ontology components: concepts and relations

Law and the Semantic Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Searching online text collections can be both rewarding and frustrating. While valuable information can be found, typically many irrelevant documents are also retrieved, while many relevant ones are missed. Terminology mismatches between the user's query and document contents are a main cause of retrieval failures. Expanding a user's query with related words can improve search performances, but finding and using related words is an open problem. This research uses corpus analysis techniques to automatically discover similar words directly from the contents of the databases which are not tagged with part-of-speech labels. Using these similarities, user queries are automatically expanded, resulting in conceptual retrieval rather than requiring exact word matches between queries and documents. We are able to achieve a 7.6% improvement for TREC 5 queries and up to a 28.5% improvement on the narrow-domain Cystic Fibrosis collection. This work has been extended to multidatabase collections where each subdatabase has a collection-specific similarity matrix associated with it. If the best matrix is selected, substantial search improvements are possible. Various techniques to select the appropriate matrix for a particular query are analyzed, and a 4.8% improvement in the results is validated.