Term clustering of syntactic phrases
SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Using WordNet to disambiguate word senses for text retrieval
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
OHSUMED: an interactive retrieval evaluation and new large test collection for research
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Text retrieval using inference in semantic metanetworks
Text retrieval using inference in semantic metanetworks
Evaluating evaluation measure stability
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Efficient string matching: an aid to bibliographic search
Communications of the ACM
A vector space model for automatic indexing
Communications of the ACM
Information Retrieval
Introduction to the special issue on word sense disambiguation: the state of the art
Computational Linguistics - Special issue on word sense disambiguation
Noun-phrase analysis in unrestricted text for information retrieval
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Semantic indexing using WordNet senses
RANLPIR '00 Proceedings of the ACL-2000 workshop on Recent advances in natural language processing and information retrieval: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 11
A robust knowledge-based plant searching strategy
Expert Systems with Applications: An International Journal
Expert Systems with Applications: An International Journal
Enhancing search results of concept annotated documents
IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
Parallel rare term vector replacement: Fast and effective dimensionality reduction for text
Journal of Parallel and Distributed Computing
An Integrated Query Relaxation Approach Adopting Data Abstraction and Fuzzy Relation
Journal of Database Management
Semantic smoothing for text clustering
Knowledge-Based Systems
Hi-index | 0.00 |
Objective: To develop a document indexing scheme that improves the retrieval effectiveness for free-text medical documents. Design: The phrase-based vector space model (VSM) uses multi-word phrases as indexing terms. Each phrase consists of a concept in the unified medical language system (UMLS) and its corresponding component word stems. The similarity between concepts are defined by their relations in a hypernym hierarchy derived from UMLS. After defining the similarity between two phrases by their stem overlaps and the similarity between the concepts they represent, we define the similarity between two documents as the cosine of the angle between their corresponding phrase vectors. This paper reports the development and the validation of the phrase-based VSM. Measurement: We compare the retrieval effectiveness of different vector space models using two standard test collections, OHSUMED and Medlars. OHSUMED contains 105 queries and 14,430 documents, and Medlars contains 30 queries and 1033 documents. Each document in the test collections is judged by human experts to be either relevant or non-relevant to each query. The retrieval effectiveness is measured by precision and recall. Results: The phrase-based VSM is significantly more effective than the current gold standard-the stem-based VSM. Such significant retrieval effectiveness improvements are observed in both the exhaustive search and cluster-based document retrievals. Conclusion: The phrase-based VSM is a better indexing scheme than the stem-based VSM. Medical document retrieval using the phrase-based VSM is significantly more effective than that using the stem-based VSM.