Spoken information extraction from Italian broadcast news

  • Authors:
  • Vanessa Sandrini;Marcello Federico

  • Affiliations:
  • ITC-irst, Centro per la Ricerca Scientifica e Tecnologica, Trento, Italy;ITC-irst, Centro per la Ricerca Scientifica e Tecnologica, Trento, Italy

  • Venue:
  • ECIR'03 Proceedings of the 25th European conference on IR research
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Current research on information extraction from spoken documents is mainly focused on the recognition of named entities, such as names of organizations, locations and persons, within transcripts automatically generated by a speech recognizer. In this work we present research carried out at ITC-irst on named entity recognition in Italian broadcast news. In particular, an original statistical named entity tagger is described which can be trained with relatively little language resources: a seed list of named entities and a large untagged text corpus. Moreover, the paper discusses and presents named entity recognition experiments with case sensitive automatic transcripts, generated by the ITC-irst speech recognizer, and by training the named entity model with seed lists of different size.