Exploiting concept clusters for content-based information retrieval

Authors:
Bo-Yeong Kang;Dae-Won Kim;Sang-Jo Lee
Affiliations:
Department of Computer Engineering, Kyungpook National University, Sangyuk-dong, Puk-gu, Daegu 702-701, Republic of Korea;Department of Computer Science, KAIST, 373-1, Guseong-dong, Yuseong-gu, Daejeon 305-701, Republic of Korea;Department of Computer Engineering, Kyungpook National University, Sangyuk-dong, Puk-gu, Daegu 702-701, Republic of Korea
Venue:
Information Sciences—Informatics and Computer Science: An International Journal
Year:
2005

Citing 13
Cited 12

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
A probabilistic learning approach for document indexing

ACM Transactions on Information Systems (TOIS) - Special issue on research and development in information retrieval
Combining multiple evidence from different properties of weighting schemes

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Large-scale information retrieval with latent semantic indexing

Information Sciences: an International Journal
Textual context analysis for information retrieval

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Accessing multimedia through concept clustering

Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
A semidiscrete matrix decomposition for latent semantic indexing information retrieval

ACM Transactions on Information Systems (TOIS)
Combining semantic and syntactic document classifiers to improve first story detection

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Theory of Indexing

Theory of Indexing
Information Storage and Retrieval Systems: Theory and Implementation

Information Storage and Retrieval Systems: Theory and Implementation
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Four Paradigms for Indexing Video Conferences

IEEE MultiMedia
Lexical cohesion computed by thesaural relations as an indicator of the structure of text

Computational Linguistics

DCFLA: A distributed collaborative-filtering neighbor-locating algorithm

Information Sciences: an International Journal
The uncovering of hidden structures by Latent Semantic Analysis

Information Sciences: an International Journal
Exploiting noun phrases and semantic relationships for text document clustering

Information Sciences: an International Journal
A mobile location-based information recommendation system based on GPS and WEB2.0 services

WSEAS Transactions on Computers
Enhancing search results of concept annotated documents

IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
Collaborative filtering with ordinal scale-based implicit ratings for mobile music recommendations

Information Sciences: an International Journal
Sense-based biomedical indexing and retrieval

NLDB'10 Proceedings of the Natural language processing and information systems, and 15th international conference on Applications of natural language to information systems
Concept-based learning of human behavior for customer relationship management

Information Sciences: an International Journal
Fuzzy information retrieval indexed by concept identification

TSD'05 Proceedings of the 8th international conference on Text, Speech and Dialogue
Algorithm for generating fuzzy rules for WWW document classification

ICAISC'06 Proceedings of the 8th international conference on Artificial Intelligence and Soft Computing
An adaptive approach to dealing with unstable behaviour of users in collaborative filtering systems

Journal of Information Science
Optimized weights of document keywords for auto-reply accuracy

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current approaches to index weighting for information retrieval from texts are based on statistical analysis of the texts' contents. A key shortcoming of these indexing schemes, which consider only the terms in a document, is that they cannot extract semantically exact indexes that represent the semantic content of a document. To address this issue, we proposed a new indexing formalism that considers not only the terms in a document, but also the concepts. In the proposed method, concepts are extracted by exploiting clusters of terms that are semantically related, referred to as concept clusters. Through experiments on the TREC-2 collection of Wall Street Journal documents, we show that the proposed method outperforms an indexing method based on term frequency (TF), especially in regard to the highest-ranked documents. Moreover, the index term dimension was 53.3% lower for the proposed method than for the TF-based method, which is expected to significantly reduce the document search time in a real environment.