A semi-supervised approach to extracting multiword entity names from user reviews

  • Authors:
  • Olga Vechtomova

  • Affiliations:
  • University of Waterloo, Waterloo, ON, Canada

  • Venue:
  • Proceedings of the 1st Joint International Workshop on Entity-Oriented and Semantic Search
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The paper describes a semi-supervised approach to extracting multiword units that belong to a specific semantic class of entities. The approach uses a small set of seed words representing the target class, and calculates distributional similarity between the candidate and seed words. We adapt a well-known document ranking function, BM25, to the task of calculating similarity between vectors of context features representing seed words and candidate words, and perform a systematic comparison to a number of distributional similarity measures. We then introduce a method for ranking multiword units by the likelihood of belonging to the target semantic class. The task used for evaluation is extraction of restaurant dish names from the corpus of 157,865 restaurant reviews.