Combining relations for information extraction from free text

  • Authors:
  • Mstislav Maslennikov;Tat-Seng Chua

  • Affiliations:
  • National University of Singapore, Singapore;National University of Singapore, Singapore

  • Venue:
  • ACM Transactions on Information Systems (TOIS)
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Relations between entities of the same semantic type tend to be sparse in free texts. Therefore, combining relations is the key to effective information extraction (IE) on free text datasets with a small set of training samples. Previous approaches to bootstrapping for IE used different types of relations, such as dependency or co-occurrence, and faced the problems of paraphrasing and misalignment of instances. To cope with these problems, we propose a framework that integrates several types of relations. After extracting candidate entities, our framework evaluates relations between them at the phrasal, dependency, semantic frame, and discourse levels. For each of these levels, we build a classifier that outputs a score for relation instances. In order to integrate these scores, we propose three strategies: (1) integrate evaluation scores from each relation classifier; (2) incorporate the elimination of negatively labeled instances in a previous strategy; and (3) add cascading of extracted relations into strategy (2). Our framework improves the state-of-art results for supervised systems by 8%, 15%, 3%, and 5% on MUC4 (terrorism); MUC6 (management succession); ACE RDC 2003 (news, general types); and ACE RDC 2003 (news, specific types) domains respectively.