Computing term similarity by large probabilistic isA knowledge

  • Authors:
  • Peipei Li;Haixun Wang;Kenny Q. Zhu;Zhongyuan Wang;Xindong Wu

  • Affiliations:
  • Hefei University of Technology, Hefei city, China;Microsoft Research Asia, Bei Jing, China;Shanghai Jiao Tong University, Shang Hai, China;Renmin University of China, Microsoft Research Asia, Bei Jing, China;University of Vermont, Vermont, USA

  • Venue:
  • Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Computing semantic similarity between two terms is essential for a variety of text analytics and understanding applications. However, existing approaches are more suitable for semantic similarity between words rather than the more general multi-word expressions (MWEs), and they do not scale very well. Therefore, we propose a lightweight and effective approach for semantic similarity using a large scale semantic network automatically acquired from billions of web documents. Given two terms, we map them into the concept space, and compare their similarity there. Furthermore, we introduce a clustering approach to orthogonalize the concept space in order to improve the accuracy of the similarity measure. Extensive studies demonstrate that our approach can accurately compute the semantic similarity between terms with MWEs and ambiguity, and significantly outperforms 12 competing methods.