Layout object model for extracting the schema of web query interfaces

  • Authors:
  • Tiezheng Nie;Derong Shen;Ge Yu;Yue Kou

  • Affiliations:
  • College of Information Science and Engineering, Northeastern University, Shenyang, China;College of Information Science and Engineering, Northeastern University, Shenyang, China;College of Information Science and Engineering, Northeastern University, Shenyang, China;College of Information Science and Engineering, Northeastern University, Shenyang, China

  • Venue:
  • APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a novel approach to extract the hierarchical schema of web interfaces by employing the visual features. In our approach, the Layout Object Model (LOM) is proposed to extract the schema of interface based on the geometric layout of interface elements. In the LOM, each field or label of interface is a layout object denoted with a rectangle in web browser. The schema of interface can be expressed by organizing these rectangles with a tree structure which is called as the LOM tree. So we extract the schema by constructing the LOM tree. The construction is start with generating the basic layout tree from the DOM tree. Then, we match labels for fields or groups of them by employing their layout relation and feature rules, and the LOM tree is constructed by adjusting the basic layout tree. Finally, we transforms the LOM tree of a web interface into a schema tree to extract the schema. The experimental results show that our approach can match labels and fields accurately, which is very useful for deep web applications.