Cross Language Information Extraction Knowledge Adaptation

  • Authors:
  • Tak-Lam Wong;Kai-On Chow;Wai Lam

  • Affiliations:
  • Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong;Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong;Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, Hong Kong

  • Venue:
  • RSKT '09 Proceedings of the 4th International Conference on Rough Sets and Knowledge Technology
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a framework for adapting a previously learned wrapper from a source Web site to unseen sites which are written in different languages. The idea of our framework is to utilize the previously learned information extraction knowledge and the previously extracted or collected items in the source Web site. These knowledge and data are automatically translated to the same language as the unseen sites via online Web resources such as online Web dictionary or map. Multiple text mining methods are employed to automatically discover some machine labeled training examples in the unseen site. Both content oriented features and site dependent features of the machine labeled training examples are used for learning the new wrapper for the new unseen site using our language independent wrapper induction component. We conducted experiments on some real-world Web sites in different languages to demonstrate the effectiveness of our framework.