Applying VSM and LCS to develop an integrated text retrieval mechanism

Authors:
Cheng-Shiun Tasi;Yong-Ming Huang;Chien-Hung Liu;Yueh-Min Huang
Affiliations:
Department of Engineering Science, National Cheng Kung University, No. 1 Ta-Hsueh Road, Tainan 701, Taiwan, ROC;Department of Engineering Science, National Cheng Kung University, No. 1 Ta-Hsueh Road, Tainan 701, Taiwan, ROC;Department of Network Multimedia Design, Hsing Kuo University of Management, No. 600, Sec. 3, Taijiang Blvd., Tainan 709, Taiwan, ROC;Department of Engineering Science, National Cheng Kung University, No. 1 Ta-Hsueh Road, Tainan 701, Taiwan, ROC
Venue:
Expert Systems with Applications: An International Journal
Year:
2012

Citing 14
Cited 0

An evaluation of retrieval effectiveness for a full-text document-retrieval system

Communications of the ACM
Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
A comparison of collocation-based similarity measures in query expansion

Information Processing and Management: an International Journal
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Computer Evaluation of Indexing and Text Processing

Journal of the ACM (JACM)
The String-to-String Correction Problem

Journal of the ACM (JACM)
A vector space model for automatic indexing

Communications of the ACM
Developing a new similarity measure from two different perspectives

Information Processing and Management: an International Journal
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Seeding the survey and analysis of research literature with text mining

Expert Systems with Applications: An International Journal
Generalized Needleman-Wunsch algorithm for the recognition of T-cell epitopes

Expert Systems with Applications: An International Journal
Achieving both high precision and high recall in near-duplicate detection

Proceedings of the 17th ACM conference on Information and knowledge management
A Taxonomy of Similarity Mechanisms for Case-Based Reasoning

IEEE Transactions on Knowledge and Data Engineering
Phase-only filtering for the masses (of DNA Data): a new approach to sequence alignment

IEEE Transactions on Signal Processing - Part II

Quantified Score

Hi-index	12.05

Visualization

Abstract

Text retrieval has received a lot of attention in computer science. In the text retrieval field, the most widely-adopted similarity technique is using vector space models (VSM) to evaluate the weight of terms and using Cosine, Jaccard or Dice to measure the similarity between the query and the texts. However, these similarity techniques do not consider the effect of the sequence of the information. In this paper, we propose an integrated text retrieval (ITR) mechanism that takes the advantage of both VSM and longest common subsequence (LCS) algorithm. The key idea of the ITR mechanism is to use LCS to re-evaluate the weight of terms, so that the sequence and weight relationships between the query and the texts can be considered simultaneously. The results of mathematical analysis show that the ITR mechanism can increase the similarity on Jaccard and Dice similarity measurements when a sequential relationship exists between the query and the texts.