Learning to adapt cross language information extraction wrapper

Authors:
Tak-Lam Wong
Affiliations:
Department of Mathematics and Information Technology, The Hong Kong Institute of Education, Tai Po, Hong Kong
Venue:
Applied Intelligence
Year:
2012

Citing 40
Cited 2

The nature of statistical learning theory

The nature of statistical learning theory
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
A scalable comparison-shopping agent for the World-Wide Web

AGENTS '97 Proceedings of the first international conference on Autonomous agents
Learning Information Extraction Rules for Semi-Structured and Free Text

Machine Learning - Special issue on natural language learning
Learning page-independent heuristics for extracting data from Web pages

WWW '99 Proceedings of the eighth international conference on World Wide Web
Regression testing for wrapper maintenance

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Wrapper induction: efficiency and expressiveness

Artificial Intelligence - Special issue on Intelligent internet systems
Learning to extract hierarchical information from semi-structured documents

Proceedings of the ninth international conference on Information and knowledge management
IEPAD: information extraction based on pattern discovery

Proceedings of the 10th international conference on World Wide Web
Bootstrapping for example-based data extraction

Proceedings of the tenth international conference on Information and knowledge management
A flexible learning system for wrapping tables and lists in HTML documents

Proceedings of the 11th international conference on World Wide Web
An Adaptable IE System to New Domains

Applied Intelligence
Hierarchical Wrapper Induction for Semistructured Information Sources

Autonomous Agents and Multi-Agent Systems
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
RoadRunner: Towards Automatic Data Extraction from Large Web Sites

Proceedings of the 27th International Conference on Very Large Data Bases
Extracting Patterns and Relations from the World Wide Web

WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Selective Sampling with Redundant Views

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Automatic Wrapper Generation for Multilingual Web Resources

DS '02 Proceedings of the 5th International Conference on Discovery Science
Genetic Mining of HTML Structures for Effective Web-Document Retrieval

Applied Intelligence
Learning domain-independent string transformation weights for high accuracy object identification

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Data extraction and label assignment for web databases

WWW '03 Proceedings of the 12th international conference on World Wide Web
Adapting Information Extraction Knowledge For Unseen Web Sites

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
A Supervised Visual Wrapper Generator for Web-Data Extraction

COMPSAC '03 Proceedings of the 27th Annual International Conference on Computer Software and Applications
Adaptive duplicate detection using learnable string similarity measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Information Extraction from the Web: System and Techniques

Applied Intelligence
Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data

ICML '04 Proceedings of the twenty-first international conference on Machine learning
A Heterogeneous Field Matching Method for Record Linkage

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Adaptive information extraction

ACM Computing Surveys (CSUR)
A Survey of Web Information Extraction Systems

IEEE Transactions on Knowledge and Data Engineering
Unsupervised learning of field segmentation models for information extraction

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Webpage understanding: an integrated approach

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction

The Journal of Machine Learning Research
Cross Language Information Extraction Knowledge Adaptation

RSKT '09 Proceedings of the 4th International Conference on Rough Sets and Knowledge Technology
Interactive information extraction with constrained conditional random fields

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Wrapper maintenance: a machine learning approach

Journal of Artificial Intelligence Research
Adaptive information extraction from text by rule induction and generalisation

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Getting from here to there: interactive planning and agent execution for optimizing travel

IAAI'02 Proceedings of the 14th conference on Innovative applications of artificial intelligence - Volume 1
Web information extraction using Markov logic networks

Proceedings of the 20th international conference companion on World wide web
Automatic extraction of acronym definitions from the Web

Applied Intelligence
Learning with scope, with application to information extraction and classification

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence

A relation extraction method of Chinese named entities based on location and semantic features

Applied Intelligence
Formal and relational concept analysis for fuzzy-based automatic semantic annotation

Applied Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a framework for adapting a previously learned wrapper from a source Web site to unseen sites in different languages. To achieve this, we exploit the previously learned information extraction knowledge and the previously extracted or collected items in the source Web site. These knowledge and data are automatically translated to the same language as the unseen sites via online Web resources such as online Web dictionaries or maps. Site independent features which capture the characteristics of the content of the data are then derived from the translated information. Several text mining methods are employed to automatically discover a set of machine labeled training examples in the unseen site. Both content oriented features and site dependent features of the machine labeled training examples are used for learning the new wrapper for the new unseen site using our language independent wrapper induction component. We conducted experiments on some real-world Web sites in different languages to demonstrate the effectiveness of our framework.