Wrapper verification

Authors:
Nicholas Kushmerick
Affiliations:
-
Venue:
World Wide Web
Year:
2000

Citing 10
Cited 33

Information extraction

Communications of the ACM
A hierarchical approach to wrapper induction

Proceedings of the third annual conference on Autonomous Agents
Generating finite-state transducers for semi-structured data extraction from the Web

Information Systems - Special issue on semistructured data
Recognizing structure in Web pages using similarity queries

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Wrapper induction: efficiency and expressiveness

Artificial Intelligence - Special issue on Intelligent internet systems
Wrapper Generation for Web Accessible Data Sources

COOPIS '98 Proceedings of the 3rd IFCIS International Conference on Cooperative Information Systems
Jedi: Extracting and Synthesizing Information from the Web

COOPIS '98 Proceedings of the 3rd IFCIS International Conference on Cooperative Information Systems
A Conceptual-Modeling Approach to Extracting Data from the Web

ER '98 Proceedings of the 17th International Conference on Conceptual Modeling
Wrapper induction for information extraction

Wrapper induction for information extraction
Learning Bayesian networks with local structure

UAI'96 Proceedings of the Twelfth international conference on Uncertainty in artificial intelligence

Automatic repairing of web wrappers

Proceedings of the 3rd international workshop on Web information and data management
Learning to Match the Schemas of Data Sources: A Multistrategy Approach

Machine Learning
Visual Web Information Extraction with Lixto

Proceedings of the 27th International Conference on Very Large Data Bases
Automatic Extraction of Semantically-Meaningful Information from the Web.

AH '02 Proceedings of the Second International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems
Toolkits for Generating Wrappers

NODe '02 Revised Papers from the International Conference NetObjectDays on Objects, Components, Architectures, Services, and Applications for a Networked World
Schema-guided wrapper maintenance for web-data extraction

WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management
Retrieving and Semantically Integrating Heterogeneous Data from the Web

IEEE Intelligent Systems
Automatic information extraction from large websites

Journal of the ACM (JACM)
Constraint-based wrapper specification and verification for cooperative information systems

Information Systems - Special issue: Data quality in cooperative information systems
Efficient Wrapper Reinduction from Dynamic Web Sources

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Thresher: automating the unwrapping of semantic content from the World Wide Web

WWW '05 Proceedings of the 14th international conference on World Wide Web
How to make web sites talk together: web service solution

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Mapping maintenance for data integration systems

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Semantic-integration research in the database community

AI Magazine - Special issue on semantic integration
An efficient algorithm for XML type projection

Proceedings of the 8th ACM SIGPLAN international conference on Principles and practice of declarative programming
eTuner: tuning schema matching software using synthetic scenarios

The VLDB Journal — The International Journal on Very Large Data Bases
Adapting Web information extraction knowledge via mining site-invariant and site-dependent features

ACM Transactions on Internet Technology (TOIT)
From Wrapping to Knowledge

IEEE Transactions on Knowledge and Data Engineering
Wrapper-based personalised mobile meta portal

International Journal of Autonomous and Adaptive Communications Systems
Automated Semantic Analysis of Schematic Data

World Wide Web
Detection of corrupted schema mappings in XML data integration systems

ACM Transactions on Internet Technology (TOIT)
Deploying information agents on the web

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Coping with web knowledge

AWIC'03 Proceedings of the 1st international Atlantic web intelligence conference on Advances in web intelligence
Web wrapper validation

APWeb'03 Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications
No Code Required: Giving Users Tools to Transform the Web

No Code Required: Giving Users Tools to Transform the Web
Adaptive information extraction: core technologies for information agents

Intelligent information agents
Adaptable wrapper generation for web page format change

ACOS'06 Proceedings of the 5th WSEAS international conference on Applied computer science
Intelligent self-repairable web wrappers

AI*IA'11 Proceedings of the 12th international conference on Artificial intelligence around man and beyond
Integrating semi-structured data into business applications: a web intelligence example

WM'05 Proceedings of the Third Biennial conference on Professional Knowledge Management
PNS: personalized multi-source news delivery

KES'06 Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part II
RecipeCrawler: collecting recipe data from WWW incrementally

WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Information extraction for the semantic web

Proceedings of the First international conference on Reasoning Web
WebSelF: a web scraping framework

ICWE'12 Proceedings of the 12th international conference on Web Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many Internet information-management applications (e.g., information integration systems) require a library of wrappers, specialized information extraction procedures that translate a source's native format into a structured representation suitable for further application-specific processing. Maintaining wrappers is tedious and error-prone, because the formatting regularities on which wrappers rely change frequently on the decentralized and dynamic Internet. The wrapper verification problem is to determine whether a wrapper is operating correctly. Standard regression testing approaches are inappropriate, because both the formatting regularities on which wrappers rely and the source's underlying content may change. We introduce RAPTURE, a fully-implemented, domain-independent wrapper verification algorithm. RAPTURE computes a probabilistic similarity measure between a wrapper's expected and observed output, where similarity is defined in terms of simple numeric features (e.g., the length, or the fraction of punctuation characters) of the extracted strings. Experiments with numerous actual Internet sources demostrate that RAPTURE performs substantially better than standard regression testing.