Generating finite-state transducers for semi-structured data extraction from the Web
Information Systems - Special issue on semistructured data
Regression testing for wrapper maintenance
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Wrapper induction: efficiency and expressiveness
Artificial Intelligence - Special issue on Intelligent internet systems
IEPAD: information extraction based on pattern discovery
Proceedings of the 10th international conference on World Wide Web
A flexible learning system for wrapping tables and lists in HTML documents
Proceedings of the 11th international conference on World Wide Web
Foundations of Databases: The Logical Level
Foundations of Databases: The Logical Level
A brief survey of web data extraction tools
ACM SIGMOD Record
Visual Web Information Extraction with Lixto
Proceedings of the 27th International Conference on Very Large Data Bases
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Semi-Automatic Wrapper Generation for Commercial Web Sources
Proceedings of the IFIP TC8 / WG8.1 Working Conference on Engineering Information Systems in the Internet Context
Data extraction and label assignment for web databases
WWW '03 Proceedings of the 12th international conference on World Wide Web
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Odaies: Ontology-driven Adaptive Web Information Extraction System
IAT '03 Proceedings of the IEEE/WIC International Conference on Intelligent Agent Technology
Schema-guided wrapper maintenance for web-data extraction
WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management
Using the structure of Web sites for automatic segmentation of tables
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Efficient Wrapper Reinduction from Dynamic Web Sources
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
OLERA: Semisupervised Web-Data Extraction with Visual Support
IEEE Intelligent Systems
Web data extraction based on partial tree alignment
WWW '05 Proceedings of the 14th international conference on World Wide Web
Wrapper maintenance: a machine learning approach
Journal of Artificial Intelligence Research
Active learning with strong and weak views: a case study on wrapper induction
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Extracting lists of data records from semi-structured web pages
Data & Knowledge Engineering
Finding and Extracting Data Records from Web Pages
Journal of Signal Processing Systems
Finding and extracting data records from web pages
EUC'07 Proceedings of the 2007 international conference on Embedded and ubiquitous computing
Using clustering and edit distance techniques for automatic web data extraction
WISE'07 Proceedings of the 8th international conference on Web information systems engineering
Providing resilient XPaths for external adaptation engines
Proceedings of the 21st ACM conference on Hypertext and hypermedia
Hi-index | 0.00 |
In order to let software programs gain full benefit from semi-structured web sources, wrapper programs must be built to provide a ''machine-readable'' view over them. Wrappers are able to accept a query against the source and return a set of structured results, thus enabling applications to access web data in a similar manner to that of information from databases. A significant problem in this approach arises as Web sources may undergo changes that invalidate the current wrappers. In this paper, we present novel heuristics and algorithms to address this problem. In our approach the system collects some query results during normal wrapper operation and, when the source changes, it uses them as input to generate a set of labeled examples for the source which can then be used to induce a new wrapper.