Information Extraction from Thai Text with Unknown Phrase Boundaries

  • Authors:
  • Peerasak Intarapaiboon;Ekawit Nantajeewarawat;Thanaruk Theeramunkong

  • Affiliations:
  • School of Information, Computer, and Communication Technology Sirindhorn International Institute of Technology, Thammasat University, Thailand;School of Information, Computer, and Communication Technology Sirindhorn International Institute of Technology, Thammasat University, Thailand;School of Information, Computer, and Communication Technology Sirindhorn International Institute of Technology, Thammasat University, Thailand

  • Venue:
  • PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Using sliding-window rule application and extraction filtering techniques, we propose a framework for extracting semantic frames from Thai textual phrases with unknown boundaries based on patterns of triggering terms. A supervised rule learning algorithm is used for constructing multi-slot extraction rules from hand-tagged training phrases. A filtering module is introduced for predicting rule application across phrase boundaries based on instantiation features of rule internal wildcards. The framework is applied to text documents in three domains with different target-phrase density and average lengths. The experimental results show that the filtering module improves precision and preserves high recall satisfactorily, yielding extraction performance comparable to frame extraction with manually identified phrase boundaries.