Minimally-supervised extraction of entities from text advertisements

  • Authors:
  • Sameer Singh;Dustin Hillard;Chris Leggetter

  • Affiliations:
  • University of Massachusetts, Amherst, MA;Yahoo! Labs Silicon Valley, Santa Clara, CA;Yahoo! Labs Silicon Valley, Santa Clara, CA

  • Venue:
  • HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Extraction of entities from ad creatives is an important problem that can benefit many computational advertising tasks. Supervised and semi-supervised solutions rely on labeled data which is expensive, time consuming, and difficult to procure for ad creatives. A small set of manually derived constraints on feature expectations over unlabeled data can be used to partially and probabilistically label large amounts of data. Utilizing recent work in constraint-based semi-supervised learning, this paper injects light weight supervision specified as these "constraints" into a semi-Markov conditional random field model of entity extraction in ad creatives. Relying solely on the constraints, the model is trained on a set of unlabeled ads using an online learning algorithm. We demonstrate significant accuracy improvements on a manually labeled test set as compared to a baseline dictionary approach. We also achieve accuracy that approaches a fully supervised classifier.