Information extraction, data mining and joint inference

  • Authors:
  • Andrew McCallum

  • Affiliations:
  • University of Massachusetts, Amherst, MA

  • Venue:
  • Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Although information extraction and data mining appear together in many applications, their interface in most current systems would better be described as serial juxtaposition than as tight integration. Information extraction populates slots in a database by identifying relevant subsequences of text, but is usually not aware of the emerging patterns and regularities in the database. Data mining methods begin from a populated database, and are often unaware of where the data came from, or its inherent uncertainties. The result is that the accuracy of both suffers, and accurate mining of complex text sources has been beyond reach.In this talk I will describe work in probabilistic models that perform joint inference across multiple components of an information processing pipeline in order to avoid the brittle accumulation of errors. After briefly introducing conditional random fields, I will describe recent work in information extraction leveraging factorial state representations, entity resolution, and transfer learning, as well as scalable methods of inference and learning. I'll close with some recent work on probabilistic models for social network analysis, and a demonstration of Rexa.info, a new research paper search engine.