Bootstrapping events and relations from text

  • Authors:
  • Tomek Strzalkowski;Ting Liu

  • Affiliations:
  • State University of New York at Albany;State University of New York at Albany

  • Venue:
  • Bootstrapping events and relations from text
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Information Extraction (IE) is a technique for automatically extracting structured data from text documents. One of the key analytical tasks is extraction of important and relevant information from textual sources. While information is plentiful and readily available, from the Internet, news services, media, etc., extracting the critical nuggets that matter to business or to national security is a cognitively demanding and time consuming task. Intelligence and business analysts spend many hours poring over endless streams of text documents pulling out reference to entities of interest (people, locations, organizations) as well as their relationships as reported in text. Such extracted "information nuggets" are then entered into a structured database for further analysis that may expose various trends or hidden relationships.In this thesis, we constructed a semi-supervised machine learning method, which we call BEAR (Bootstrapping Events and Relations from Text), that effectively exploits statistical and structural properties of natural language discourse in order to rapidly acquire rules to detect mentions of events and other complex relationships in text, extract their key attributes, and construct template-like representations. The learning process exploits descriptive and structural redundancy, which is common in language and is considered critical for achieving successful communication despite distractions, different contexts, or incompatible semantic models between a speaker/writer and a hearer/reader. We also take advantage of the high degree of referential consistency in discourse (e.g., as observed in word sense distribution, and arguably applicable to larger linguistic units), which enables the reader to efficiently correlate different forms of description across coherent spans of text.Our system has been tested on the ACE-2005 corpus, which is U.S. Government official dataset for evaluating Information Extraction technology. The final results show that BEAR has significantly improvement comparing with the base run and performs better than currently event extraction systems.