Construction and analysis of Japanese-English broadcast news corpus with named entity tags

  • Authors:
  • Tadashi Kumano;Hideki Kashioka;Hideki Tanaka;Takahiro Fukusima

  • Affiliations:
  • ATR Spoken Language Translation Research Laboratories, Hikaridai, Keihanna Science City, Kyoto, Japan;ATR Spoken Language Translation Research Laboratories, Hikaridai, Keihanna Science City, Kyoto, Japan;ATR Spoken Language Translation Research Laboratories, Hikaridai, Keihanna Science City, Kyoto, Japan;Otemon Gakuin University, Ibaraki, Osaka, Japan

  • Venue:
  • MultiNER '03 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15
  • Year:
  • 2003

Quantified Score

Hi-index 0.01

Visualization

Abstract

We are aiming to acquire named entity (NE) translation knowledge from nonparallel, content-aligned corpora, by utilizing NE extraction techniques. For this research, we are constructing a Japanese-English broadcast news corpus with NE tags. The tags represent not only NE class information but also coreference information within the same monolingual document and between corresponding Japanese-English document pairs. Analysis of about 1,100 annotated article pairs has shown that if NE occurrence information, such as classes, number of occurrence and occurrence order, is given for each language, it may provide a good clue for corresponding NEs across languages.