An evaluation of retrieval effectiveness for a full-text document-retrieval system
Communications of the ACM
Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
A comparison of collocation-based similarity measures in query expansion
Information Processing and Management: an International Journal
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Computer Evaluation of Indexing and Text Processing
Journal of the ACM (JACM)
The String-to-String Correction Problem
Journal of the ACM (JACM)
A vector space model for automatic indexing
Communications of the ACM
Developing a new similarity measure from two different perspectives
Information Processing and Management: an International Journal
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Seeding the survey and analysis of research literature with text mining
Expert Systems with Applications: An International Journal
Generalized Needleman-Wunsch algorithm for the recognition of T-cell epitopes
Expert Systems with Applications: An International Journal
Achieving both high precision and high recall in near-duplicate detection
Proceedings of the 17th ACM conference on Information and knowledge management
A Taxonomy of Similarity Mechanisms for Case-Based Reasoning
IEEE Transactions on Knowledge and Data Engineering
Phase-only filtering for the masses (of DNA Data): a new approach to sequence alignment
IEEE Transactions on Signal Processing - Part II
Hi-index | 12.05 |
Text retrieval has received a lot of attention in computer science. In the text retrieval field, the most widely-adopted similarity technique is using vector space models (VSM) to evaluate the weight of terms and using Cosine, Jaccard or Dice to measure the similarity between the query and the texts. However, these similarity techniques do not consider the effect of the sequence of the information. In this paper, we propose an integrated text retrieval (ITR) mechanism that takes the advantage of both VSM and longest common subsequence (LCS) algorithm. The key idea of the ITR mechanism is to use LCS to re-evaluate the weight of terms, so that the sequence and weight relationships between the query and the texts can be considered simultaneously. The results of mathematical analysis show that the ITR mechanism can increase the similarity on Jaccard and Dice similarity measurements when a sequential relationship exists between the query and the texts.