Cross Language Information Extraction Knowledge Adaptation

Authors:
Tak-Lam Wong;Kai-On Chow;Wai Lam
Affiliations:
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong;Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong;Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, Hong Kong
Venue:
RSKT '09 Proceedings of the 4th International Conference on Rough Sets and Knowledge Technology
Year:
2009

Citing 13
Cited 1

The nature of statistical learning theory

The nature of statistical learning theory
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Wrapper induction: efficiency and expressiveness

Artificial Intelligence - Special issue on Intelligent internet systems
Learning to extract hierarchical information from semi-structured documents

Proceedings of the ninth international conference on Information and knowledge management
Bootstrapping for example-based data extraction

Proceedings of the tenth international conference on Information and knowledge management
A flexible learning system for wrapping tables and lists in HTML documents

Proceedings of the 11th international conference on World Wide Web
RoadRunner: Towards Automatic Data Extraction from Large Web Sites

Proceedings of the 27th International Conference on Very Large Data Bases
Automatic Wrapper Generation for Multilingual Web Resources

DS '02 Proceedings of the 5th International Conference on Discovery Science
Data extraction and label assignment for web databases

WWW '03 Proceedings of the 12th international conference on World Wide Web
Adapting Information Extraction Knowledge For Unseen Web Sites

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Adaptive information extraction

ACM Computing Surveys (CSUR)
A Survey of Web Information Extraction Systems

IEEE Transactions on Knowledge and Data Engineering
Adapting Web information extraction knowledge via mining site-invariant and site-dependent features

ACM Transactions on Internet Technology (TOIT)

Learning to adapt cross language information extraction wrapper

Applied Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a framework for adapting a previously learned wrapper from a source Web site to unseen sites which are written in different languages. The idea of our framework is to utilize the previously learned information extraction knowledge and the previously extracted or collected items in the source Web site. These knowledge and data are automatically translated to the same language as the unseen sites via online Web resources such as online Web dictionary or map. Multiple text mining methods are employed to automatically discover some machine labeled training examples in the unseen site. Both content oriented features and site dependent features of the machine labeled training examples are used for learning the new wrapper for the new unseen site using our language independent wrapper induction component. We conducted experiments on some real-world Web sites in different languages to demonstrate the effectiveness of our framework.