Towards efficient similar sentences extraction

Authors:
Yanhui Gu;Zhenglu Yang;Miyuki Nakano;Masaru Kitsuregawa
Affiliations:
Institute of Industrial Science, The University of Tokyo, Japan;Institute of Industrial Science, The University of Tokyo, Japan;Institute of Industrial Science, The University of Tokyo, Japan;Institute of Industrial Science, The University of Tokyo, Japan
Venue:
IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning
Year:
2012

Citing 16
Cited 0

Approximate string-matching with q-grams and maximal matches

Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
A linear space algorithm for computing maximal common subsequences

Communications of the ACM
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Interactive Data Analysis: The Control Project

Computer
Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL

EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Question answering passage retrieval using dependency relations

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A web-based kernel function for measuring the similarity of short text snippets

Proceedings of the 15th international conference on World Wide Web
Sentence Similarity Based on Semantic Nets and Corpus Statistics

IEEE Transactions on Knowledge and Data Engineering
Semantic text similarity using corpus-based word similarity and string similarity

ACM Transactions on Knowledge Discovery from Data (TKDD)
Query-sensitive mutual reinforcement chain and its application in query-oriented multi-document summarization

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Optimizing relevance and revenue in ad search: a query substitution approach

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Corpus-based and knowledge-based measures of text semantic similarity

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Similarity measures for short segments of text

ECIR'07 Proceedings of the 29th European conference on IR research
Text relatedness based on a word thesaurus

Journal of Artificial Intelligence Research
Caching query-biased snippets for efficient retrieval

Proceedings of the 14th International Conference on Extending Database Technology
Efficient searching top-k semantic similar words

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three

Quantified Score

Hi-index	0.00

Visualization

Abstract

Similar sentences extraction is an essential issue for many applications, such as natural language processing, Web page retrieval, question-answer model, and so forth. Although there are many studies exploring on this issue, most of them focus on how to improve the effectiveness aspect. In this paper, we address the efficiency issue, i.e., for a given sentence collection, how to efficiently discover the top-k semantic similar sentences to a query. The issue is very important for real applications because the data becomes huge and the existing state-of-the-art strategies cannot satisfy the users' performance requirement. We propose efficient strategies to tackle the problem based on a general framework. Extensive experimental evaluations demonstrate that the efficiency of our proposal outperforms the state-of-the-art approach.