Template-based wrappers in the TSIMMIS system
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
CQ: a personalized update monitoring toolkit
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Modeling Web sources for information integration
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Continual Queries for Internet Scale Event-Driven Information Delivery
IEEE Transactions on Knowledge and Data Engineering
XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
A Fully Automated Object Extraction System for the World Wide Web
ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
KGOL: a Knowledge Grid operating language
ACM SIGPLAN Notices
IEEE Transactions on Knowledge and Data Engineering
Journal of Systems Architecture: the EUROMICRO Journal
A two-phase rule generation and optimization approach for wrapper generation
ADC '06 Proceedings of the 17th Australasian Database Conference - Volume 49
Logical structure analysis: From HTML to XML
Computer Standards & Interfaces
Combining content extraction heuristics: the CombinE system
Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Robust web extraction: an approach based on a probabilistic tree-edit model
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Automatic generation of wrapper for data extraction from the web
ICWE'03 Proceedings of the 2003 international conference on Web engineering
CETR: content extraction via tag ratios
Proceedings of the 19th international conference on World wide web
Automatic wrappers for large scale web extraction
Proceedings of the VLDB Endowment
Highly efficient algorithms for structural clustering of large websites
Proceedings of the 20th international conference on World wide web
Video-Based sign language content annotation by incorporation of MPEG-7 standard
ICWE'05 Proceedings of the 5th international conference on Web Engineering
Hybrid model of content extraction
Journal of Computer and System Sciences
Automatic web-scale information extraction
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Hi-index | 0.00 |
The vast majority of online information is part of the World Wide Web. In order to use this information for more than human browsing, web pages in HTML must be converted into a format meaningful to software programs. Wrappers have been a useful technique to convert HTML documents into semantically meaningful XML files. However, developing wrappers is slow and labor-intensive. Further, frequent changes on the HTML documents typically require frequent changes in the wrappers. This paper describes XWRAP Elite, a tool to automatically generate robust wrappers. XWRAP breaks down the conversion process into three steps. First, discover where the data is located in an HTML page and separating the data into individual objects. Second, decompose objects into data elements. Third, mark objects and elements in an output format. XWRAP Elite automates the first two steps and minimizes human involvement in marking output data. Our experience shows that XWRAP is able to create useful wrapper software for a wide variety of real world HTML documents.