R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Proceedings of the 27th International Conference on Very Large Data Bases
Automated discovery of search interfaces on the web
ADC '03 Proceedings of the 14th Australasian database conference - Volume 17
An interactive clustering-based approach to integrating source query interfaces on the deep Web
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Understanding Web query interfaces: best-effort parsing with hidden syntax
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Automatic extraction of web search interfaces for interface schema integration
Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Discovering complex matchings across web query interfaces: a correlation mining approach
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning important models for web page blocks based on layout and content analysis
ACM SIGKDD Explorations Newsletter
Query Selection Techniques for Efficient Crawling of Structured Web Sources
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Meaningful labeling of integrated query interfaces
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
An adaptive crawler for locating hidden-Web entry points
Proceedings of the 16th international conference on World Wide Web
Learning to extract form labels
Proceedings of the VLDB Endowment
Extracting Web Query Interfaces Based on Form Structures and Semantic Similarity
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
A hierarchical approach to model web query interfaces for web source integration
Proceedings of the VLDB Endowment
Constructing interface schemas for search interfaces of web databases
WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Hi-index | 0.00 |
This paper presents a novel approach to extract the hierarchical schema of web interfaces by employing the visual features. In our approach, the Layout Object Model (LOM) is proposed to extract the schema of interface based on the geometric layout of interface elements. In the LOM, each field or label of interface is a layout object denoted with a rectangle in web browser. The schema of interface can be expressed by organizing these rectangles with a tree structure which is called as the LOM tree. So we extract the schema by constructing the LOM tree. The construction is start with generating the basic layout tree from the DOM tree. Then, we match labels for fields or groups of them by employing their layout relation and feature rules, and the LOM tree is constructed by adjusting the basic layout tree. Finally, we transforms the LOM tree of a web interface into a schema tree to extract the schema. The experimental results show that our approach can match labels and fields accurately, which is very useful for deep web applications.