Automatic extraction of Thai-English term translations and synonyms from medical web using iterative candidate generation with association measures

  • Authors:
  • Kobkrit Viriyayudhakorn;Thanaruk Theeramunkong;Cholwich Nattee;Thepchai Supnithi;Manabu Okumura

  • Affiliations:
  • Sirindhorn International Institute of Technology, Thammasat University, Muang, Pathumthani, Thailand;Sirindhorn International Institute of Technology, Thammasat University, Muang, Pathumthani, Thailand;Sirindhorn International Institute of Technology, Thammasat University, Muang, Pathumthani, Thailand;National Electronics and Computer Technology Center, Klongluang, Pathumthani, Thailand;Precision and Intelligence Laboratory, Tokyo Institute of Technology, Midori, Yokohama, Japan

  • Venue:
  • PAKDD'09 Proceedings of the 13th Pacific-Asia international conference on Knowledge discovery and data mining: new frontiers in applied data mining
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Electronic technical documents available on the Internet are a powerful source for automatic extraction of term translations and synonyms. This paper presents an association-based approach to extract possible translations and synonyms by iterative candidate generation using a search engine. The plausible candidate pairs can be chosen by calculating their co-occurring statistics. In our experiment to extract Thai-English medical term pairs, four possible alternative associations; namely confidence, support, lift and conviction, are investigated and their performances are compared by ten-fold cross validation. The experimental results show that lift achieves the best performance with 73.1% f-measure with 67% precision and 84.2% recall on translation pair extraction, 68.7% f-measure with 71.5% precision and 67.7% recall on Thai synonym term extraction and 72.8% f-measure with 72.0% precision and 75.1% recall on English synonym term extraction. The precision of our approach in Thai-English translation, Thai synonym and English synonym extraction are 4 times, 3.5 times and 5.5 times higher than baseline precision respectively.