Understanding Web query interfaces: best-effort parsing with hidden syntax
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Learning to extract form labels
Proceedings of the VLDB Endowment
Extracting Web Query Interfaces Based on Form Structures and Semantic Similarity
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
An empirical study on using hidden markov model for search interface segmentation
Proceedings of the 18th ACM conference on Information and knowledge management
A hierarchical approach to model web query interfaces for web source integration
Proceedings of the VLDB Endowment
Stop word and related problems in web interface integration
Proceedings of the VLDB Endowment
Understanding deep web search interfaces: a survey
ACM SIGMOD Record
Hi-index | 0.00 |
Meta-search that provides the capability for users to access and search all of the information sources in one query submission is one of the most important mechanisms to search deep web. One of the fundamental problems in building meta-search systems is to extract the semantic model of each query interface so that the system can automatically form and submit queries to each online source. In this paper, we develop a rule-based approach for parsing query interfaces. We classify query conditions into 5 categories of semantic structures, and develop parsing rules for each category. Our parsing rules use both structural and visual information. To alleviate the ambiguity, three parsing passes are adopted in our approach: null-path cluster parsing, inner-cluster parsing and inter-cluster parsing. Experiment shows that our approach works very well, achieving precision 86% and recall 92% for IW Random dataset, and precision 88.9% and recall 88.8% for ICQ dataset.