The nature of statistical learning theory
The nature of statistical learning theory
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Applications of approximate word matching in information retrieval
CIKM '97 Proceedings of the sixth international conference on Information and knowledge management
A scalable comparison-shopping agent for the World-Wide Web
AGENTS '97 Proceedings of the first international conference on Autonomous agents
Generating finite-state transducers for semi-structured data extraction from the Web
Information Systems - Special issue on semistructured data
Learning Information Extraction Rules for Semi-Structured and Free Text
Machine Learning - Special issue on natural language learning
Learning page-independent heuristics for extracting data from Web pages
WWW '99 Proceedings of the eighth international conference on World Wide Web
Learning dictionaries for information extraction by multi-level bootstrapping
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Wrapper induction: efficiency and expressiveness
Artificial Intelligence - Special issue on Intelligent internet systems
Learning to extract hierarchical information from semi-structured documents
Proceedings of the ninth international conference on Information and knowledge management
Bootstrapping for example-based data extraction
Proceedings of the tenth international conference on Information and knowledge management
A flexible learning system for wrapping tables and lists in HTML documents
Proceedings of the 11th international conference on World Wide Web
Learning object identification rules for information integration
Information Systems - Data extraction, cleaning and reconciliation
World Wide Web
Reasoning about Textual Similarity in a Web-Based Information Access System
Autonomous Agents and Multi-Agent Systems
Hierarchical Wrapper Induction for Semistructured Information Sources
Autonomous Agents and Multi-Agent Systems
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Extracting Patterns and Relations from the World Wide Web
WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Selective Sampling with Redundant Views
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Information Extraction with HMM Structures Learned by Stochastic Optimization
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Learning domain-independent string transformation weights for high accuracy object identification
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Data extraction and label assignment for web databases
WWW '03 Proceedings of the 12th international conference on World Wide Web
Getting from here to there: interactive planning and agent execution for optimizing travel
Eighteenth national conference on Artificial intelligence
Adapting Information Extraction Knowledge For Unseen Web Sites
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Bottom-up relational learning of pattern matching rules for information extraction
The Journal of Machine Learning Research
Adaptive duplicate detection using learnable string similarity measures
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining data records in Web pages
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A Probabilistic Approach for Adapting Information Extraction Wrappers and Discovering New Attributes
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Thresher: automating the unwrapping of semantic content from the World Wide Web
WWW '05 Proceedings of the 14th international conference on World Wide Web
Unsupervised named-entity extraction from the web: an experimental study
Artificial Intelligence
Wrapper maintenance: a machine learning approach
Journal of Artificial Intelligence Research
Adaptive information extraction from text by rule induction and generalisation
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
A probabilistic model of redundancy in information extraction
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Adaptive information extraction: core technologies for information agents
Intelligent information agents
Learning with scope, with application to information extraction and classification
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
An unsupervised framework for extracting and normalizing product attributes from multiple web sites
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A Structured Approach to Data Reverse Engineering of Web Applications
ICWE '9 Proceedings of the 9th International Conference on Web Engineering
Cross Language Information Extraction Knowledge Adaptation
RSKT '09 Proceedings of the 4th International Conference on Rough Sets and Knowledge Technology
An unsupervised approach for product record normalization across different web sites
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
News information extraction based on adaptive weighting using unsupervised Bayesian algorithm
WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
Hi-index | 0.00 |
We develop a novel framework that aims at automatically adapting previously learned information extraction knowledge from a source Web site to a new unseen target site in the same domain. Two kinds of features related to the text fragments from the Web documents are investigated. The first type of feature is called, a site-invariant feature. These features likely remain unchanged in Web pages from different sites in the same domain. The second type of feature is called a site-dependent feature. These features are different in the Web pages collected from different Web sites, while they are similar in the Web pages originating from the same site. In our framework, we derive the site-invariant features from previously learned extraction knowledge and the items previously collected or extracted from the source Web site. The derived site-invariant features will be exploited to automatically seek a new set of training examples in the new unseen target site. Both the site-dependent features and the site-invariant features of these automatically discovered training examples will be considered in the learning of new information extraction knowledge for the target site. We conducted extensive experiments on a set of real-world Web sites collected from three different domains to demonstrate the performance of our framework. For example, by just providing training examples from one online book catalog Web site, our approach can automatically extract information from ten different book catalog sites achieving an average precision and recall of 71.9% and 84.0% respectively without any further manual intervention.