Facilitating pattern discovery for relation extraction with semantic-signature-based clustering

  • Authors:
  • Yunyao Li;Vivian Chu;Sebastian Blohm;Huaiyu Zhu;Howard Ho

  • Affiliations:
  • IBM Research - Almaden, San Jose, CA, USA;IBM Research - Almaden, San Jose, CA, USA;Microsoft Corporation, München, Germany;IBM Research - Almaden, San Jose, CA, USA;IBM Research - Almaden, San Jose, CA, USA

  • Venue:
  • Proceedings of the 20th ACM international conference on Information and knowledge management
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Hand-crafted textual patterns have been the mainstay device of practical relation extraction for decades. However, there has been little work on reducing the manual effort involved in the discovery of effective textual patterns for relation extraction. In this paper, we propose a clustering-based approach to facilitate the pattern discovery for relation extraction. Specifically, we define the notion of semantic signature to represent the most salient features of a textual fragment. We then propose a novel clustering algorithm based on semantic signature, S2C, and its enhancement S2C+. Experiments on two real-world data sets show that, when compared with k-means clustering, S2C and S2C+ are at least an order of magnitude faster, while generating high quality clusters that are at least comparable to the best clusters generated by k-means without requiring any manual tuning. Finally, a user study confirms that our clustering-based approach can indeed help users discover effective textual patterns for relation extraction with only a fraction of the manual effort required by the conventional approach.