Information Extraction from Thai Text with Unknown Phrase Boundaries

Authors:
Peerasak Intarapaiboon;Ekawit Nantajeewarawat;Thanaruk Theeramunkong
Affiliations:
School of Information, Computer, and Communication Technology Sirindhorn International Institute of Technology, Thammasat University, Thailand;School of Information, Computer, and Communication Technology Sirindhorn International Institute of Technology, Thammasat University, Thailand;School of Information, Computer, and Communication Technology Sirindhorn International Institute of Technology, Thammasat University, Thailand
Venue:
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Year:
2009

Citing 2
Cited 0

Learning Information Extraction Rules for Semi-Structured and Free Text

Machine Learning - Special issue on natural language learning
A Survey of Web Information Extraction Systems

IEEE Transactions on Knowledge and Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Using sliding-window rule application and extraction filtering techniques, we propose a framework for extracting semantic frames from Thai textual phrases with unknown boundaries based on patterns of triggering terms. A supervised rule learning algorithm is used for constructing multi-slot extraction rules from hand-tagged training phrases. A filtering module is introduced for predicting rule application across phrase boundaries based on instantiation features of rule internal wildcards. The framework is applied to text documents in three domains with different target-phrase density and average lengths. The experimental results show that the filtering module improves precision and preserves high recall satisfactorily, yielding extraction performance comparable to frame extraction with manually identified phrase boundaries.