A robust multilingual portable phrase chunking system

  • Authors:
  • Yue-Shi Lee;Yu-Chieh Wu

  • Affiliations:
  • Department of Computer Science and Information Engineering, Ming Chuan University, 5 De-Ming Rd., Gwei Shan District, Taoyuan 333, Taiwan, ROC;Department of Computer Science and Information Engineering, National Central University, 300 Jong-Da Rd., Jhongli City, Taoyuan 320, Taiwan, ROC

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2007

Quantified Score

Hi-index 12.05

Visualization

Abstract

Automatic text chunking aims to recognize grammatical phrase structures in natural language text. Text chunking provides downstream syntactic information for further analysis, which is also an important technology in the area of text mining (TM) and natural language processing (NLP). Existing chunking systems make use of external knowledge, e.g. grammar parsers, or integrate multiple learners to achieve higher performance. However, the external knowledge is almost unavailable in many domains and languages. Besides, employing multiple learners does not only complicate the system architecture, but also increase training and testing time costs. In this paper, we present a novel phrase chunking model based on the proposed mask method without employing external knowledge and multiple learners. The mask method could automatically derive more training examples from the original training data, which significantly improves system performance. We had evaluated our method in different chunking tasks and languages in comparison to previous studies. The experimental results show that our method achieves state of the art performance in chunking tasks. In two English chunking tasks, i.e., shallow parsing and base-chunking, our method achieves 94.22 and 93.23 in F"("@b"="1") rates. When porting to Chinese, the F"("@b"="1") rate is 92.30. Also, our chunker is quite efficient. The complete chunking time of a 50K-words is less than 10s.