A Conceptual Model and Rule-Based Query Language for HTML

  • Authors:
  • Mengchi Liu;Tok Wang Ling

  • Affiliations:
  • School of Computer Science, Carleton University, 1125 Colonel By Drive, Ottawa, ON, Canada K1S 5B6 mengchi@scs.carleton.ca;Department of Computer Science, School of Computing, National University of Singapore, 3 Science Drive 2, Singapore 117543 lingtw@comp.nus.edu.sg

  • Venue:
  • World Wide Web
  • Year:
  • 2001

Quantified Score

Hi-index 0.01

Visualization

Abstract

Most documents available over the Web conform to the HTML specification. Such documents are hierarchically structured in nature. The existing data models for the Web either fail to capture the hierarchical structure within the documents or can only provide a very low level representation of such hierarchical structure. How to represent and query HTML documents at a higher level is an important issue. In this paper, we first propose a novel conceptual model for HTML. This conceptual model has only a few simple constructs but is able to represent the complex hierarchical structure within HTML documents at a level that is close to human conceptualization/visualization of the documents. We also describe how to convert HTML documents based on this conceptual model. Using the conceptual model and conversion method, one can capture the essence (i.e., semistructure) of HTML documents in a natural and simple way. Based on this conceptual model, we then present a rule-based language to query HTML documents over the Internet. This language provides a simple but very powerful way to query both intra-document structures and inter-document structures and allows the query results to be restructured. Being rule-based, it naturally supports negation and recursion and therefore is more expressive than SQL-based languages. A logical semantics is also provided.