The Hinoki Sensebank: a large-scale word sense tagged corpus of Japanese

  • Authors:
  • Takaaki Tanaka;Francis Bond;Sanae Fujita

  • Affiliations:
  • Nippon Telegraph and Telephone Corporation;Nippon Telegraph and Telephone Corporation;Nippon Telegraph and Telephone Corporation

  • Venue:
  • LAC '06 Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Semantic information is important for precise word sense disambiguation system and the kind of semantic analysis used in sophisticated natural language processing such as machine translation, question answering, etc. There are at least two kinds of semantic information: lexical semantics for words and phrases and structural semantics for phrases and sentences. We have built a Japanese corpus of over three million words with both lexical and structural semantic information. In this paper, we focus on our method of annotating the lexical semantics, that is building a word sense tagged corpus and its properties.