Communications of the ACM
A hierarchical approach to wrapper induction
Proceedings of the third annual conference on Autonomous Agents
Generating finite-state transducers for semi-structured data extraction from the Web
Information Systems - Special issue on semistructured data
Recognizing structure in Web pages using similarity queries
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Wrapper induction: efficiency and expressiveness
Artificial Intelligence - Special issue on Intelligent internet systems
Wrapper Generation for Web Accessible Data Sources
COOPIS '98 Proceedings of the 3rd IFCIS International Conference on Cooperative Information Systems
Jedi: Extracting and Synthesizing Information from the Web
COOPIS '98 Proceedings of the 3rd IFCIS International Conference on Cooperative Information Systems
A Conceptual-Modeling Approach to Extracting Data from the Web
ER '98 Proceedings of the 17th International Conference on Conceptual Modeling
Wrapper induction for information extraction
Wrapper induction for information extraction
Learning Bayesian networks with local structure
UAI'96 Proceedings of the Twelfth international conference on Uncertainty in artificial intelligence
Automatic repairing of web wrappers
Proceedings of the 3rd international workshop on Web information and data management
Visual Web Information Extraction with Lixto
Proceedings of the 27th International Conference on Very Large Data Bases
Automatic Extraction of Semantically-Meaningful Information from the Web.
AH '02 Proceedings of the Second International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems
Toolkits for Generating Wrappers
NODe '02 Revised Papers from the International Conference NetObjectDays on Objects, Components, Architectures, Services, and Applications for a Networked World
Schema-guided wrapper maintenance for web-data extraction
WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management
Retrieving and Semantically Integrating Heterogeneous Data from the Web
IEEE Intelligent Systems
Automatic information extraction from large websites
Journal of the ACM (JACM)
Constraint-based wrapper specification and verification for cooperative information systems
Information Systems - Special issue: Data quality in cooperative information systems
Efficient Wrapper Reinduction from Dynamic Web Sources
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Thresher: automating the unwrapping of semantic content from the World Wide Web
WWW '05 Proceedings of the 14th international conference on World Wide Web
How to make web sites talk together: web service solution
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Mapping maintenance for data integration systems
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Semantic-integration research in the database community
AI Magazine - Special issue on semantic integration
An efficient algorithm for XML type projection
Proceedings of the 8th ACM SIGPLAN international conference on Principles and practice of declarative programming
eTuner: tuning schema matching software using synthetic scenarios
The VLDB Journal — The International Journal on Very Large Data Bases
Adapting Web information extraction knowledge via mining site-invariant and site-dependent features
ACM Transactions on Internet Technology (TOIT)
IEEE Transactions on Knowledge and Data Engineering
Wrapper-based personalised mobile meta portal
International Journal of Autonomous and Adaptive Communications Systems
Automated Semantic Analysis of Schematic Data
World Wide Web
Detection of corrupted schema mappings in XML data integration systems
ACM Transactions on Internet Technology (TOIT)
Deploying information agents on the web
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
AWIC'03 Proceedings of the 1st international Atlantic web intelligence conference on Advances in web intelligence
APWeb'03 Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications
No Code Required: Giving Users Tools to Transform the Web
No Code Required: Giving Users Tools to Transform the Web
Adaptive information extraction: core technologies for information agents
Intelligent information agents
Adaptable wrapper generation for web page format change
ACOS'06 Proceedings of the 5th WSEAS international conference on Applied computer science
Intelligent self-repairable web wrappers
AI*IA'11 Proceedings of the 12th international conference on Artificial intelligence around man and beyond
Integrating semi-structured data into business applications: a web intelligence example
WM'05 Proceedings of the Third Biennial conference on Professional Knowledge Management
PNS: personalized multi-source news delivery
KES'06 Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part II
RecipeCrawler: collecting recipe data from WWW incrementally
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Information extraction for the semantic web
Proceedings of the First international conference on Reasoning Web
WebSelF: a web scraping framework
ICWE'12 Proceedings of the 12th international conference on Web Engineering
Hi-index | 0.00 |
Many Internet information-management applications (e.g., information integration systems) require a library of wrappers, specialized information extraction procedures that translate a source's native format into a structured representation suitable for further application-specific processing. Maintaining wrappers is tedious and error-prone, because the formatting regularities on which wrappers rely change frequently on the decentralized and dynamic Internet. The wrapper verification problem is to determine whether a wrapper is operating correctly. Standard regression testing approaches are inappropriate, because both the formatting regularities on which wrappers rely and the source's underlying content may change. We introduce RAPTURE, a fully-implemented, domain-independent wrapper verification algorithm. RAPTURE computes a probabilistic similarity measure between a wrapper's expected and observed output, where similarity is defined in terms of simple numeric features (e.g., the length, or the fraction of punctuation characters) of the extracted strings. Experiments with numerous actual Internet sources demostrate that RAPTURE performs substantially better than standard regression testing.