Spoken information extraction from Italian broadcast news

Authors:
Vanessa Sandrini;Marcello Federico
Affiliations:
ITC-irst, Centro per la Ricerca Scientifica e Tecnologica, Trento, Italy;ITC-irst, Centro per la Ricerca Scientifica e Tecnologica, Trento, Italy
Venue:
ECIR'03 Proceedings of the 25th European conference on IR research
Year:
2003

Citing 5
Cited 0

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Named Entity recognition without gazetteers

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Automatic semantic tagging of unknown proper names

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
University of Sheffield: description of the LaSIE system as used for MUC-6

MUC6 '95 Proceedings of the 6th conference on Message understanding
Bootstrapping Named Entity recognition for Italian Broadcast News

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current research on information extraction from spoken documents is mainly focused on the recognition of named entities, such as names of organizations, locations and persons, within transcripts automatically generated by a speech recognizer. In this work we present research carried out at ITC-irst on named entity recognition in Italian broadcast news. In particular, an original statistical named entity tagger is described which can be trained with relatively little language resources: a seed list of named entities and a large untagged text corpus. Moreover, the paper discusses and presents named entity recognition experiments with case sensitive automatic transcripts, generated by the ITC-irst speech recognizer, and by training the named entity model with seed lists of different size.