Understanding query interfaces by statistical parsing

Authors:
Weifeng Su;Hejun Wu;Yafei Li;Jing Zhao;Frederick H. Lochovsky;Hongmin Cai;Tianqiang Huang
Affiliations:
BNU-HKBU United International College and Shenzhen Key Laboratory of Intelligent Media and Speech, PKU-HKUST Shenzhen Hong Kong Institution;Sun Yat-Sen University;BNU-HKBU United International College;BNU-HKBU United International College;The Hong Kong University of Science and Technology;South China University of Technology;Fujian Normal University
Venue:
ACM Transactions on the Web (TWEB)
Year:
2013

Citing 32
Cited 1

Efficient Web form entry on PDAs

Proceedings of the 10th international conference on World Wide Web
Efficient Web form entry on PDAs

Proceedings of the 10th international conference on World Wide Web
Text Categorization Based on Regularized Linear Classification Methods

Information Retrieval
Crawling the Hidden Web

Proceedings of the 27th International Conference on Very Large Data Bases
KBFS: K-Best-First Search

Annals of Mathematics and Artificial Intelligence
A maximum entropy approach to named entity recognition

A maximum entropy approach to named entity recognition
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
An interactive clustering-based approach to integrating source query interfaces on the deep Web

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Understanding Web query interfaces: best-effort parsing with hidden syntax

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Structured databases on the web: observations and implications

ACM SIGMOD Record
DEQUE: querying the deep web

Data & Knowledge Engineering
MetaQuerier: querying structured web sources on-the-fly

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Merging Source Query Interfaces onWeb Databases

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Query Selection Techniques for Efficient Crawling of Structured Web Sources

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Combining classifiers to identify online databases

Proceedings of the 16th international conference on World Wide Web
An adaptive crawler for locating hidden-Web entry points

Proceedings of the 16th international conference on World Wide Web
Towards Deeper Understanding of the Search Interfaces of the Deep Web

World Wide Web
Extracting Personalised Ontology from Data-Intensive Web Application: an HTML Forms-Based Reverse Engineering Approach

Informatica
Learning to extract form labels

Proceedings of the VLDB Endowment
Siphon++: a hidden-webcrawler for keyword-based interfaces

Proceedings of the 17th ACM conference on Information and knowledge management
ODE: Ontology-assisted data extraction

ACM Transactions on Database Systems (TODS)
An empirical study on using hidden markov model for search interface segmentation

Proceedings of the 18th ACM conference on Information and knowledge management
A hierarchical approach to model web query interfaces for web source integration

Proceedings of the VLDB Endowment
Understanding deep web search interfaces: a survey

ACM SIGMOD Record
Real understanding of real estate forms

Proceedings of the International Conference on Web Intelligence, Mining and Semantics
Automatic hierarchical classification of structured deep web databases

WISE'06 Proceedings of the 7th international conference on Web Information Systems
Constructing interface schemas for search interfaces of web databases

WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Holistic schema matching for web query interfaces

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
OPAL: automated form understanding for the deep web

Proceedings of the 21st international conference on World Wide Web
OPAL: a passe-partout for web forms

Proceedings of the 21st international conference companion on World Wide Web
Optimal algorithms for crawling a hidden database in the web

Proceedings of the VLDB Endowment
Deep Web Query Interface Understanding and Integration

Deep Web Query Interface Understanding and Integration

The ontological key: automatically understanding and integrating forms to access the deep Web

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Users submit queries to an online database via its query interface. Query interface parsing, which is important for many applications, understands the query capabilities of a query interface. Since most query interfaces are organized hierarchically, we present a novel query interface parsing method, StatParser (Statistical Parser), to automatically extract the hierarchical query capabilities of query interfaces. StatParser automatically learns from a set of parsed query interfaces and parses new query interfaces. StatParser starts from a small grammar and enhances the grammar with a set of probabilities learned from parsed query interfaces under the maximum-entropy principle. Given a new query interface, the probability-enhanced grammar identifies the parse tree with the largest global probability to be the query capabilities of the query interface. Experimental results show that StatParser very accurately extracts the query capabilities and can effectively overcome the problems of existing query interface parsers.