Low-cost Named Entity Classification for Catalan: exploiting multilingual resources and unlabeled data

  • Authors:
  • Lluís Màrquez;Adrià de Gispert;Xavier Carreras;Lluís Padró

  • Affiliations:
  • Universitat Politècnica de Catalunya, Jordi Girona, Barcelona;Universitat Politècnica de Catalunya, Jordi Girona, Barcelona;Universitat Politècnica de Catalunya, Jordi Girona, Barcelona;Universitat Politècnica de Catalunya, Jordi Girona, Barcelona

  • Venue:
  • MultiNER '03 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

This work studies Named Entity Classification (NEC) for Catalan without making use of large annotated resources of this language. Two views are explored and compared, namely exploiting solely the Catalan resources, and a direct training of bilingual classification models (Spanish and Catalan), given that a large collection of annotated examples is available for Spanish. The empirical results obtained on real data point out that multilingual models clearly outperform monolingual ones, and that the resulting Catalan NEC models are easier to improve by bootstrapping on unlabelled data.