SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A hierarchical approach to wrapper induction
Proceedings of the third annual conference on Autonomous Agents
Generating finite-state transducers for semi-structured data extraction from the Web
Information Systems - Special issue on semistructured data
XTRACT: a system for extracting document type descriptors from XML documents
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
ACM SIGKDD Explorations Newsletter
IEPAD: information extraction based on pattern discovery
Proceedings of the 10th international conference on World Wide Web
A flexible learning system for wrapping tables and lists in HTML documents
Proceedings of the 11th international conference on World Wide Web
Stochastic Complexity in Statistical Inquiry Theory
Stochastic Complexity in Statistical Inquiry Theory
A brief survey of web data extraction tools
ACM SIGMOD Record
Knowledge Acquisition Via Incremental Conceptual Clustering
Machine Learning
Building Light-Weight Wrappers for Legacy Web Data-Sources Using W4F
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Visual Web Information Extraction with Lixto
Proceedings of the 27th International Conference on Very Large Data Bases
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Toolkits for Generating Wrappers
NODe '02 Revised Papers from the International Conference NetObjectDays on Objects, Components, Architectures, Services, and Applications for a Networked World
XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Active learning with multiple views
Active learning with multiple views
Web data extraction based on partial tree alignment
WWW '05 Proceedings of the 14th international conference on World Wide Web
Active learning with strong and weak views: a case study on wrapper induction
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Supporting end-users in the creation of dependable web clips
Proceedings of the 16th international conference on World Wide Web
Mining templates from search result records of search engines
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Relations, cards, and search templates: user-guided web data integration and layout
Proceedings of the 20th annual ACM symposium on User interface software and technology
Automatically maintaining navigation sequences for querying semi-structured web sources
Data & Knowledge Engineering
Wrapper-based personalised mobile meta portal
International Journal of Autonomous and Adaptive Communications Systems
ESWC '07 Proceedings of the 4th European conference on The Semantic Web: Research and Applications
Automated Semantic Analysis of Schematic Data
World Wide Web
Attaching UI enhancements to websites with end users
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Grubber: Allowing End-Users to Develop XML-Based Wrappers for Web Data Sources
APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
Can we learn a template-independent wrapper for news article extraction from a single training site?
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
The paths more taken: matching DOM trees to search logs for accurate webpage clustering
Proceedings of the 19th international conference on World wide web
Enhancing document structure analysis using visual analytics
Proceedings of the 2010 ACM Symposium on Applied Computing
Blog post and comment extraction using information quantity of web format
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
No Code Required: Giving Users Tools to Transform the Web
No Code Required: Giving Users Tools to Transform the Web
Highly efficient algorithms for structural clustering of large websites
Proceedings of the 20th international conference on World wide web
Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics
A framework for learning web wrappers from the crowd
Proceedings of the 22nd international conference on World Wide Web
Hi-index | 0.00 |
While much of the data on the web is unstructured in nature, there is also a significant amount of embedded structured data, such as product information on e-commerce sites or stock data on financial sites. A large amount of research has focused on the problem of generating wrappers, i.e., software tools that allow easy and robust extraction of structured data from text and HTML sources. In many applications, such as comparison shopping, data has to be extracted from many different sources, making manual coding of a wrapper for each source impractical. On the other hand, fully automatic approaches are often not reliable enough, resulting in low quality of the extracted data.We describe a complete system for semi-automatic wrapper generation that can be trained on different data sources in a simple interactive manner. Our goal is to minimize the amount of user effort for training reliable wrappers through design of a suitable training interface that is implemented based on a powerful underlying extraction language and a set of training and ranking algorithms. Our experiments show that our system achieves reliable extraction with a very small amount of user effort.