DiSeg 1.0: The first system for Spanish discourse segmentation

  • Authors:
  • Iria da Cunha;Eric San Juan;Juan Manuel Torres-Moreno;Marina Lloberese;Irene Castellóne

  • Affiliations:
  • Institut Universitari de Lingüística Aplicada (Universitat Pompeu Fabra): C/Roc Boronat n 138, 08018 Barcelona, Spain and Laboratoire Informatique d'Avignon (Université d'Avignon et ...;Laboratoire Informatique d'Avignon (Université d'Avignon et des Pays de Vaucluse): 339, chemin des Meinajaries, Agroparc, BP 91228, 84911 Avignon, Cedex 9, France;Laboratoire Informatique d'Avignon (Université d'Avignon et des Pays de Vaucluse): 339, chemin des Meinajaries, Agroparc, BP 91228, 84911 Avignon, Cedex 9, France and Instituto de Ingenier ...;Universitat de Barcelona: Gran Via de les Corts Catalanes n 585, 08007 Barcelona, Spain;Universitat de Barcelona: Gran Via de les Corts Catalanes n 585, 08007 Barcelona, Spain

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2012

Quantified Score

Hi-index 12.05

Visualization

Abstract

Nowadays discourse parsing is a very prominent research topic. However, there is not a discourse parser for Spanish texts. The first stage in order to develop this tool is discourse segmentation. In this work, we present DiSeg, the first discourse segmenter for Spanish, which uses the framework of Rhetorical Structure Theory and is based on lexical and syntactic rules. We describe the system and we evaluate its performance against a gold standard corpus, divided in a medical and a terminological subcorpus. We obtain promising results, which means that discourse segmentation is possible using shallow parsing.