Efficient Web form entry on PDAs
Proceedings of the 10th international conference on World Wide Web
Efficient Web form entry on PDAs
Proceedings of the 10th international conference on World Wide Web
Machine Learning
Proceedings of the 27th International Conference on Very Large Data Bases
Data extraction and label assignment for web databases
WWW '03 Proceedings of the 12th international conference on World Wide Web
Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Crawling for Domain-Speci.c Hidden Web Resources
WISE '03 Proceedings of the Fourth International Conference on Web Information Systems Engineering
An interactive clustering-based approach to integrating source query interfaces on the deep Web
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Understanding Web query interfaces: best-effort parsing with hidden syntax
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Discovering complex matchings across web query interfaces: a correlation mining approach
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic integration of Web search interfaces with WISE-Integrator
The VLDB Journal — The International Journal on Very Large Data Bases
Structured databases on the web: observations and implications
ACM SIGMOD Record
Schema Matching Using Duplicates
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
MetaQuerier: querying structured web sources on-the-fly
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Merging Source Query Interfaces onWeb Databases
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Query Selection Techniques for Efficient Crawling of Structured Web Sources
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Meaningful labeling of integrated query interfaces
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Combining classifiers to identify online databases
Proceedings of the 16th international conference on World Wide Web
An adaptive crawler for locating hidden-Web entry points
Proceedings of the 16th international conference on World Wide Web
COMA: a system for flexible combination of schema matching approaches
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Instance-based schema matching for web databases by domain-specific query probing
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Data management projects at Google
ACM SIGMOD Record
Constructing interface schemas for search interfaces of web databases
WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Deriving Customized Integrated Web Query Interfaces
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Stop word and related problems in web interface integration
Proceedings of the VLDB Endowment
Understanding deep web search interfaces: a survey
ACM SIGMOD Record
Proceedings of the 1st ACM International Health Informatics Symposium
Deep web integration with VisQI
Proceedings of the VLDB Endowment
Encapsulating multi-stepped web forms as web services
ICSOC/ServiceWave'09 Proceedings of the 2009 international conference on Service-oriented computing
Attribute domain discovery for hidden web databases
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Layout object model for extracting the schema of web query interfaces
APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
How the minotaur turned into ariadne: ontologies in web data extraction
ICWE'11 Proceedings of the 11th international conference on Web engineering
Web Query Interface Parsing for Building Web-Based Metasearch Systems
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
Unsupervised transactional query classification based on webpage form understanding
Proceedings of the 20th ACM international conference on Information and knowledge management
Automatically mapping and integrating multiple data entry forms into a database
ER'11 Proceedings of the 30th international conference on Conceptual modeling
Automatic identification of web query interfaces
MICAI'11 Proceedings of the 10th international conference on Artificial Intelligence: advances in Soft Computing - Volume Part II
OPAL: automated form understanding for the deep web
Proceedings of the 21st international conference on World Wide Web
DIADEM: domain-centric, intelligent, automated data extraction methodology
Proceedings of the 21st international conference companion on World Wide Web
OPAL: a passe-partout for web forms
Proceedings of the 21st international conference companion on World Wide Web
Optimal algorithms for crawling a hidden database in the web
Proceedings of the VLDB Endowment
Learning to discover complex mappings from web forms to ontologies
Proceedings of the 21st ACM international conference on Information and knowledge management
Automatic discovery of Web Query Interfaces using machine learning techniques
Journal of Intelligent Information Systems
Understanding query interfaces by statistical parsing
ACM Transactions on the Web (TWEB)
Web object identification for web automation and meta-search
Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics
The ontological key: automatically understanding and integrating forms to access the deep Web
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
Much data in the Web is hidden behind Web query interfaces. In most cases the only means to "surface" the content of a Web database is by formulating complex queries on such interfaces. Applications such as Deep Web crawling and Web database integration require an automatic usage of these interfaces. Therefore, an important problem to be addressed is the automatic extraction of query interfaces into an appropriate model. We hypothesize the existence of a set of domain-independent "commonsense design rules" that guides the creation of Web query interfaces. These rules transform query interfaces into schema trees. In this paper we describe a Web query interface extraction algorithm, which combines HTML tokens and the geometric layout of these tokens within a Web page. Tokens are classified into several classes out of which the most significant ones are text tokens and field tokens. A tree structure is derived for text tokens using their geometric layout. Another tree structure is derived for the field tokens. The hierarchical representation of a query interface is obtained by iteratively merging these two trees. Thus, we convert the extraction problem into an integration problem. Our experiments show the promise of our algorithm: it outperforms the previous approaches on extracting query interfaces on about 6.5% in accuracy as evaluated over three corpora with more than 500 Deep Web interfaces from 15 different domains.