A hybrid framework to extract bilingual multiword expression from free text

  • Authors:
  • Jianyong Duan;Mei Zhang;Wang Jingzhong;Yushi Xu

  • Affiliations:
  • College of Information Engineering, North China University of Technology, 100144 Beijing, China;College of Information Engineering, North China University of Technology, 100144 Beijing, China;College of Information Engineering, North China University of Technology, 100144 Beijing, China;Spoken Language Systems Group, MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar Street Cambridge, MA 02139, USA

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2011

Quantified Score

Hi-index 12.05

Visualization

Abstract

Bilingual multiword expression extraction is always a significant problem in extracting meaning from free text. This involves analyzing large amounts of textual information. In this paper we propose a text mining approach to extract bilingual multiword expression. Both statistic and rule-based methods are employed into the system. There are two phases in the extraction process. In the first phase, lots of candidates are extracted from the corpus by statistic methods. The algorithm of multiple sequence alignment is sensitive to the flexible multiword. In the second phase, error-driven rules and patterns are extracted from corpus. For acquired high qualified instances, the manual work with active learning is also performed in sample selection. These trained rules are used to filter the candidates. Bilingual comparisons are used in a parallel corpus. Parts of bilingual syntactic patterns are obtained from the bilingual phrase dictionary. Some related experiments are designed for achieving the best performance because there are lots of parameters in this system. Experimental results showed our approach gains good performance.