Document indexing: a concept-based approach to term weight estimation

Authors:
Bo-Yeong Kang;Sang-Jo Lee
Affiliations:
Department of Computer Engineering, Kyungpook National University, 702-701 Sangyuk-dong, Pukgu, Daegu, Korea;Department of Computer Engineering, Kyungpook National University, 702-701 Sangyuk-dong, Pukgu, Daegu, Korea
Venue:
Information Processing and Management: an International Journal
Year:
2005

Citing 8
Cited 5

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
A probabilistic learning approach for document indexing

ACM Transactions on Information Systems (TOIS) - Special issue on research and development in information retrieval
Combining multiple evidence from different properties of weighting schemes

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Accessing multimedia through concept clustering

Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Four Paradigms for Indexing Video Conferences

IEEE MultiMedia
Lexical cohesion computed by thesaural relations as an indicator of the structure of text

Computational Linguistics
A statistical approach to mechanized encoding and searching of literary information

IBM Journal of Research and Development

Modelling image semantic descriptions from web 2.0 documents using a hybrid approach

Proceedings of the 11th International Conference on Information Integration and Web-based Applications & Services
A framework for utilising usage trends in the crawling and indexing process of search engines

International Journal of Knowledge and Web Intelligence
SemaFor: semantic document indexing using semantic forests

Proceedings of the 21st ACM international conference on Information and knowledge management
Factors affecting the effectiveness of biomedical document indexing and retrieval based on terminologies

Artificial Intelligence in Medicine
Class-indexing-based term weighting for automatic text classification

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional index weighting approaches for information retrieval from texts depend on the term frequency based analysis of the text contents. A shortcoming of these indexing schemes, which consider only the occurrences of the terms in a document, is that they have some limitations in extracting semantically exact indexes that represent the semantic content of a document. To address this issue, we developed a new indexing formalism that considers not only the terms in a document, but also the concepts. In this approach, concept clusters are defined and a concept vector space model is proposed to represent the semantic importance degrees of lexical items and concepts within a document. Through an experiment on the TREC collection of Wall Street Journal documents, we show that the proposed method outperforms an indexing method based on term frequency (TF), especially in regard to the few highest-ranked documents. Moreover, the index term dimension was 80% lower for the proposed method than for the TF-based method, which is expected to significantly reduce the document search time in a real environment.