Construction and analysis of Japanese-English broadcast news corpus with named entity tags

Authors:
Tadashi Kumano;Hideki Kashioka;Hideki Tanaka;Takahiro Fukusima
Affiliations:
ATR Spoken Language Translation Research Laboratories, Hikaridai, Keihanna Science City, Kyoto, Japan;ATR Spoken Language Translation Research Laboratories, Hikaridai, Keihanna Science City, Kyoto, Japan;ATR Spoken Language Translation Research Laboratories, Hikaridai, Keihanna Science City, Kyoto, Japan;Otemon Gakuin University, Ibaraki, Osaka, Japan
Venue:
MultiNER '03 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15
Year:
2003

Citing 8
Cited 1

A program for aligning sentences in bilingual corpora

Computational Linguistics - Special issue on using large corpora: I
Text-translation alignment

Computational Linguistics - Special issue on using large corpora: I
An IR approach for translating new words from nonparallel, comparable texts

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Aligning sentences in parallel corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
High-performance bilingual text alignment using statistical and dictionary information

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Bilingual text, matching using bilingual dictionary and statistics

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Translating named entities using monolingual and bilingual resources

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Translating names and technical terms in Arabic text

Semitic '98 Proceedings of the Workshop on Computational Approaches to Semitic Languages

Acquiring bilingual named entity translations from content-aligned corpora

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing

Quantified Score

Hi-index	0.01

Visualization

Abstract

We are aiming to acquire named entity (NE) translation knowledge from nonparallel, content-aligned corpora, by utilizing NE extraction techniques. For this research, we are constructing a Japanese-English broadcast news corpus with NE tags. The tags represent not only NE class information but also coreference information within the same monolingual document and between corresponding Japanese-English document pairs. Analysis of about 1,100 annotated article pairs has shown that if NE occurrence information, such as classes, number of occurrence and occurrence order, is given for each language, it may provide a good clue for corresponding NEs across languages.