Efficient Web form entry on PDAs
Proceedings of the 10th international conference on World Wide Web
Efficient Web form entry on PDAs
Proceedings of the 10th international conference on World Wide Web
Proceedings of the 27th International Conference on Very Large Data Bases
Understanding Web query interfaces: best-effort parsing with hidden syntax
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Mining semantics for large scale integration on the web: evidences, insights, and challenges
ACM SIGKDD Explorations Newsletter
Towards Building a MetaQuerier: Extracting and Matching Web Query Interfaces
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Data & Knowledge Engineering
Combining classifiers to identify online databases
Proceedings of the 16th international conference on World Wide Web
Instance-based schema matching for web databases by domain-specific query probing
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Data management projects at Google
ACM SIGMOD Record
Random sampling from a search engine's index
Journal of the ACM (JACM)
Learning to extract form labels
Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment
ACM Computing Surveys (CSUR)
ODE: Ontology-assisted data extraction
ACM Transactions on Database Systems (TODS)
HDSampler: revealing data behind web form interfaces
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Understanding the Search Interfaces of the Deep Web Based on Domain Model
ICIS '09 Proceedings of the 2009 Eigth IEEE/ACIS International Conference on Computer and Information Science
WordNet::Similarity: measuring the relatedness of concepts
HLT-NAACL--Demonstrations '04 Demonstration Papers at HLT-NAACL 2004
An empirical study on using hidden markov model for search interface segmentation
Proceedings of the 18th ACM conference on Information and knowledge management
A hierarchical approach to model web query interfaces for web source integration
Proceedings of the VLDB Endowment
Understanding deep web search interfaces: a survey
ACM SIGMOD Record
PruSM: a prudent schema matching approach for web forms
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Carbon: domain-independent automatic web form filling
ICWE'10 Proceedings of the 10th international conference on Web engineering
Determining relevance of accesses at runtime
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Little knowledge rules the web: domain-centric result page extraction
RR'11 Proceedings of the 5th international conference on Web reasoning and rule systems
OPAL: automated form understanding for the deep web
Proceedings of the 21st international conference on World Wide Web
Deep Web Query Interface Understanding and Integration
Deep Web Query Interface Understanding and Integration
DEQA: deep web extraction for question answering
ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part II
Understanding query interfaces by statistical parsing
ACM Transactions on the Web (TWEB)
Spatial reasoning with rectangular cardinal relations
Annals of Mathematics and Artificial Intelligence
A framework for learning web wrappers from the crowd
Proceedings of the 22nd international conference on World Wide Web
Hi-index | 0.00 |
Forms are our gates to the Web. They enable us to access the deep content of Web sites. Automatic form understanding provides applications, ranging from crawlers over meta-search engines to service integrators, with a key to this content. Yet, it has received little attention other than as component in specific applications such as crawlers or meta-search engines. No comprehensive approach to form understanding exists, let alone one that produces rich models for semantic services or integration with linked open data. In this paper, we present opal, the first comprehensive approach to form understanding and integration. We identify form labeling and form interpretation as the two main tasks involved in form understanding. On both problems, opal advances the state of the art: For form labeling, it combines features from the text, structure, and visual rendering of a Web page. In extensive experiments on the ICQ and TEL-8 benchmarks and a set of 200 modern Web forms, opal outperforms previous approaches for form labeling by a significant margin. For form interpretation, opal uses a schema (or ontology) of forms in a given domain. Thanks to this domain schema, it is able to produce nearly perfect ( $$$$ 97 % accuracy in the evaluation domains) form interpretations. Yet, the effort to produce a domain schema is very low, as we provide a datalog-based template language that eases the specification of such schemata and a methodology for deriving a domain schema largely automatically from an existing domain ontology. We demonstrate the value of opal's form interpretations through a light-weight form integration system that successfully translates and distributes master queries to hundreds of forms with no error, yet is implemented with only a handful translation rules.