Batch text similarity search with MapReduce

  • Authors:
  • Rui Li;Li Ju;Zhuo Peng;Zhiwei Yu;Chaokun Wang

  • Affiliations:
  • School of Software, Tsinghua University and Tsinghua National Laboratory for Information Science and Technology and Key Laboratory for Information System Security, Ministry of Education, Beijing, ...;Department of Information Engineering, Henan College of Finance and Taxation, Zhengzhou, China;School of Software, Tsinghua University, Beijing, China;Department of Computer Science and Technology, Tsinghua University;School of Software, Tsinghua University and Tsinghua National Laboratory for Information Science and Technology and Key Laboratory for Information System Security, Ministry of Education, Beijing, ...

  • Venue:
  • APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Batch text similarity search aims to find the similar texts according to users' batch text queries. It is widely used in the real world such as plagiarism check, and attracts more and more attention with the emergence of abundant texts on the web. Existing works, such as FuzzyJoin, can neither support the variation of thresholds, nor support the online batch text similarity search. In this paper, a two-stage algorithm is proposed. It can effectively resolve the problem of batch text similarity search based on inverted index structures. Experimental results on real datasets show the efficiency and expansibility of our method.