Bootstrapping Named Entity recognition for Italian Broadcast News

  • Authors:
  • Marcello Federico;Nicola Bertoldi;Vanessa Sandrini

  • Affiliations:
  • ITC-irst - Centro per la Ricerca Scientifica e Tecnologica, Trento - Italy;ITC-irst - Centro per la Ricerca Scientifica e Tecnologica, Trento - Italy;ITC-irst - Centro per la Ricerca Scientifica e Tecnologica, Trento - Italy

  • Venue:
  • EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents the development of a Named Entity (NE) recognition system for the Italian broadcast news domain. A statistical model is introduced based on a trigram language model defined on words and NE classes. The estimation of the NE model is carried out with a very little list of 2,360 manually tagged NEs and a large untagged newspaper corpus. An iterative training procedure is applied which goes through the estimation of simpler models, whose parameters are used to initialize the complete NE model. In the end, NE recognition experiments are reported, on broadcast news transcripts generated by a speech recognition system.