A translation approach to portable ontology specifications
Knowledge Acquisition - Special issue: Current issues in knowledge modeling
Efficient Web form entry on PDAs
Proceedings of the 10th international conference on World Wide Web
Efficient Web form entry on PDAs
Proceedings of the 10th international conference on World Wide Web
Proceedings of the 27th International Conference on Very Large Data Bases
An interactive clustering-based approach to integrating source query interfaces on the deep Web
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Understanding Web query interfaces: best-effort parsing with hidden syntax
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
MonetDB/XQuery: a fast XQuery processor powered by a relational engine
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
IEEE Intelligent Systems
Learning to extract form labels
Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment
ODE: Ontology-assisted data extraction
ACM Transactions on Database Systems (TODS)
Understanding deep web search interfaces: a survey
ACM SIGMOD Record
OPAL: automated form understanding for the deep web
Proceedings of the 21st international conference on World Wide Web
DIADEM: domain-centric, intelligent, automated data extraction methodology
Proceedings of the 21st international conference companion on World Wide Web
OPAL: a passe-partout for web forms
Proceedings of the 21st international conference companion on World Wide Web
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Understanding query interfaces by statistical parsing
ACM Transactions on the Web (TWEB)
Top-k diversity queries over bounded regions
ACM Transactions on Database Systems (TODS)
Hi-index | 0.00 |
Finding an apartment is a lengthy and tedious process. Once decided, one can never be sure not to have missed an even better offer which would have been just one click away. Form understanding is key to automatically access and process all the relevant---and nowadays readily available---data. We introduce opal (ontology-based web pattern analysis with logic), a novel, purely logical approach to web form understanding: opal labels, structures, and groups form fields according to a domain-specific ontology linked through phenomenological rules to a logical representation of a DOM. The phenomenological rules describe how ontological concepts appear on the web; the ontology formalizes and structures common patterns of web pages observed in a domain. A unique feature of opal is that all domain-independent assumptions about web forms are represented in rules, whereas domain-specific assumptions are represented in the ontology. This yields a coherent logical framework, robust in face of changing web trends. We apply opal to a significant, randomly selected sample of UK real estate sites, showing that straightforward rules suffice to achieve high precision form understanding.