Full parsing approximation for information extraction via finite-state cascades

  • Authors:
  • Fabio Ciravegna;Alberto Lavelli

  • Affiliations:
  • Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield S1 4DP, UK;ITC-irst Centro per la Ricerca Scientifica e Tecnologica, via Sommarive 18, 38050 Povo (TN), Italy

  • Venue:
  • Natural Language Engineering
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a robust approach to parsing suitable for Information Extraction (IE) from texts using finite-state cascades. The approach is characterized by the construction of an approximation of the full parse tree that captures all the information relevant for IE purposes, leaving the other relations underspecified. Sequences of cascades of finite-state rules deterministically analyze the text, building unambiguous structures. Initially basic chunks are analyzed; then clauses are recognized and nested; finally modifier attachment is performed and the global parse tree is built. The parsing approach allows robust, effective and efficient analysis of real world texts. The grammar organization simplifies changes, insertion of new rules and integration of domain-oriented rules. The approach has been tested for Italian, English, and Russian. A parser based on such an approach has been implemented as part of Pinocchio, an environment for developing and running IE applications.