Introduction to algorithms
Bayesian classification (AutoClass): theory and results
Advances in knowledge discovery and data mining
Wrapper induction: efficiency and expressiveness
Artificial Intelligence - Special issue on Intelligent internet systems
Elements of the Theory of Computation
Elements of the Theory of Computation
Internet and World Wide Web How to Program
Internet and World Wide Web How to Program
Hierarchical Wrapper Induction for Semistructured Information Sources
Autonomous Agents and Multi-Agent Systems
Learning Stochastic Regular Grammars by Means of a State Merging Method
ICGI '94 Proceedings of the Second International Colloquium on Grammatical Inference and Applications
Learning the Common Structure of Data
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Hi-index | 0.00 |
Dynamic web sites commonly return information in the form of lists and tables. Although hand crafting an extraction program for a specific template is time-consuming but straightforward, it is desirable to automatically generate template extraction programs from examples of lists and tables in html documents. We describe a novel technique, Post-supervised Learning, which exploits unsupervised learning to avoid the need for training examples, while minimally involving the user to achieve high accuracy. We have developed unsupervised algorithms to extract the number of rows and adopted a dynamic programming algorithm for extracting columns. Our system, called TIDE (Template Induction for web Data Extraction), achieves high performance with minimal user input compared to fully supervised techniques.