Bootstrapping-Based Extraction of Dictionary Terms from Unsegmented Legal Text

  • Authors:
  • Masato Hagiwara;Yasuhiro Ogawa;Katsuhiko Toyama

  • Affiliations:
  • Graduate School of Information Science, Nagoya University, Nagoya, Japan 464-8603;Graduate School of Information Science, Nagoya University, Nagoya, Japan 464-8603;Graduate School of Information Science, Nagoya University, Nagoya, Japan 464-8603

  • Venue:
  • New Frontiers in Artificial Intelligence
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent demands for translating Japanese statutes into foreign languages necessitate the compilation of standard bilingual dictionaries. To support this costly task, we propose a bootstrapping-based lexical knowledge extraction algorithm Monaka , to automatically extract dictionary term candidates from unsegmented Japanese legal text. The algorithm is based on the Tchai algorithm and extracts reliable patterns and instances in an iterative manner, but instead uses character n -grams as contextual patterns, and introduces a special constraint to ensure proper segmentation of the extracted terms. The experimental results show that this algorithm can extract correctly segmented and important dictionary terms with higher accuracy compared to conventional methods.