Information theoretic approach to information extraction

  • Authors:
  • Giambattista Amati

  • Affiliations:
  • Fondazione Ugo Bordoni, Rome, Italy

  • Venue:
  • FQAS'06 Proceedings of the 7th international conference on Flexible Query Answering Systems
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We use the hypergeometric distribution to extract relevant information from documents. The hypergeometric distribution gives the probability estimate of observing a given term frequency with respect to a prior. The lower the probability the higher the amount of information is carried by the term. Given a subset of documents, the information items are weighted by using the inversely related function of of the hypergeometric distribution. We here provide an exemplifying introduction to a topic-driven information extraction from a document collection based on the hypergeometric distribution.