A hierarchical approach to wrapper induction
Proceedings of the third annual conference on Autonomous Agents
The Tree-to-Tree Correction Problem
Journal of the ACM (JACM)
Annotea: an open RDF infrastructure for shared Web annotations
Proceedings of the 10th international conference on World Wide Web
World Wide Web
Sticky notes for the semantic web
Proceedings of the 8th international conference on Intelligent user interfaces
New Tools for the Semantic Web
EKAW '02 Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web
Information Extraction with HMM Structures Learned by Stochastic Optimization
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Using urls and table layout for web classification tasks
Proceedings of the 13th international conference on World Wide Web
Automatic web news extraction using tree edit distance
Proceedings of the 13th international conference on World Wide Web
Lightweight structured text processing
ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
Documentum ECI self-repairing wrappers: performance analysis
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
A Survey of Web Information Extraction Systems
IEEE Transactions on Knowledge and Data Engineering
Summarizing personal web browsing sessions
UIST '06 Proceedings of the 19th annual ACM symposium on User interface software and technology
Enabling web browsers to augment web sites' filtering and sorting functionalities
UIST '06 Proceedings of the 19th annual ACM symposium on User interface software and technology
Structured Data Extraction from the Web Based on Partial Tree Alignment
IEEE Transactions on Knowledge and Data Engineering
Adapting Web information extraction knowledge via mining site-invariant and site-dependent features
ACM Transactions on Internet Technology (TOIT)
Piggy Bank: Experience the Semantic Web inside your web browser
Web Semantics: Science, Services and Agents on the World Wide Web
U-REST: an unsupervised record extraction system
Proceedings of the 16th international conference on World Wide Web
Mining templates from search result records of search engines
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Joint optimization of wrapper generation and template detection
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Magpie: Experiences in supporting Semantic Web browsing
Web Semantics: Science, Services and Agents on the World Wide Web
Relations, cards, and search templates: user-guided web data integration and layout
Proceedings of the 20th annual ACM symposium on User interface software and technology
Extracting lists of data records from semi-structured web pages
Data & Knowledge Engineering
Grubber: Allowing End-Users to Develop XML-Based Wrappers for Web Data Sources
APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
Pattern-Based Annotation of HTML-Streams
ESWC 2009 Heraklion Proceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications
Automatic wrapper generation using tree matching and partial tree alignment
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
SemNews: a semantic news framework
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
SemNews: a semantic news framework
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Efficient record-level wrapper induction
Proceedings of the 18th ACM conference on Information and knowledge management
Information extraction for search engines using fast heuristic techniques
Data & Knowledge Engineering
Finding and Extracting Data Records from Web Pages
Journal of Signal Processing Systems
Semantic annotation for knowledge management: Requirements and a survey of the state of the art
Web Semantics: Science, Services and Agents on the World Wide Web
The paths more taken: matching DOM trees to search logs for accurate webpage clustering
Proceedings of the 19th international conference on World wide web
WMS-extracting multiple sections data records from search engine results pages
Proceedings of the 2010 ACM Symposium on Applied Computing
Digging the wild web: an interactive tool for web data consolidation
WISE'07 Proceedings of the 8th international conference on Web information systems engineering
Best of both: using semantic web technologies to enrich user interaction with the web and vice versa
SOFSEM'08 Proceedings of the 34th conference on Current trends in theory and practice of computer science
Web news extraction based on path pattern mining
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
ObjectRunner: lightweight, targeted extraction and querying of structured web data
Proceedings of the VLDB Endowment
Integrating keywords and semantics on document annotation and search
OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems: Part II
Towards a unified solution: data record region detection and segmentation
Proceedings of the 20th ACM international conference on Information and knowledge management
Piggy bank: experience the semantic web inside your web browser
ISWC'05 Proceedings of the 4th international conference on The Semantic Web
A theoretical analysis of alignment and edit problems for trees
ICTCS'05 Proceedings of the 9th Italian conference on Theoretical Computer Science
Sift: an end-user tool for gathering web content on the go
Proceedings of the 2012 ACM symposium on Document engineering
Mix-n-Match: building personal libraries from web content
TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries
Web news extraction via path ratios
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Robust detection of semi-structured web records using a DOM structure-knowledge-driven model
ACM Transactions on the Web (TWEB)
Hi-index | 0.00 |
We describe Thresher, a system that lets non-technical users teach their browsers how to extract semantic web content from HTML documents on the World Wide Web. Users specify examples of semantic content by highlighting them in a web browser and describing their meaning. We then use the tree edit distance between the DOM subtrees of these examples to create a general pattern, or wrapper, for the content, and allow the user to bind RDF classes and predicates to the nodes of these wrappers. By overlaying matches to these patterns on standard documents inside the Haystack semantic web browser, we enable a rich semantic interaction with existing web pages, "unwrapping" semantic data buried in the pages' HTML. By allowing end-users to create, modify, and utilize their own patterns, we hope to speed adoption and use of the Semantic Web and its applications.