A corpus-based approach to automatic compound extraction

  • Authors:
  • Keh-Yih Su;Ming-Wen Wu;Jing-Shin Chang

  • Affiliations:
  • National Tsing-Hua University Hsinchu, Taiwan, R.O.C.;Behavior Design Corporation, Hsinchu, Taiwan, R.O.C.;National Tsing-Hua University Hsinchu, Taiwan, R.O.C.

  • Venue:
  • ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
  • Year:
  • 1994

Quantified Score

Hi-index 0.00

Visualization

Abstract

An automatic compound retrieval method is proposed to extract compounds within a text message. It uses n-gram mutual information, relative frequency count and parts of speech as the features for compound extraction. The problem is modeled as a two-class classification problem based on the distributional characteristics of n-gram tokens in the compound and the non-compound clusters. The recall and precision using the proposed approach are 96.2% and 48.2% for bigram compounds and 96.6% and 39.6% for trigram compounds for a testing corpus of 49,314 words. A significant cutdown in processing time has been observed.