An integrated discriminative probabilistic approach to information extraction

  • Authors:
  • Xiaofeng Yu;Wai Lam;Bo Chen

  • Affiliations:
  • The Chinese University of Hong Kong, Hong Kong, China;The Chinese University of Hong Kong, Hong Kong, China;The Chinese University of Hong Kong, Hong Kong, China

  • Venue:
  • Proceedings of the 18th ACM conference on Information and knowledge management
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Probabilistic graphical models for sequence data enable us to effectively deal with inherent uncertainty in many real-world domains. However, they operate on a mostly propositional level. Logic approaches, on the other hand, can compactly represent a wide variety of knowledge, especially first-order ones, but treat uncertainty only in limited ways. Therefore, combining probability and first-order logic is highly desirable for information extraction which requires uncertainty modeling as well as dependency and deeper knowledge representation. In this paper, we model both segmentations in observation sequence and relations of segments simultaneously in our proposed integrated discriminative probabilistic framework. We propose the Metropolis-Hastings, a Markov chain Monte Carlo (MCMC) algorithm for approximate Bayesian inference to find the maximum a posteriori assignment of all the variables of this model. This integrated model has several advantages over previous probabilistic graphical models, and it offers a great capability of extracting implicit relations and new relation discovery for relation extraction from encyclopedic documents, and capturing sub-structures in named entities for named entity recognition. We performed extensive experiments on the above two well-established information extraction tasks, illustrating the feasibility and promise of our approach.