Extracting Structures of HTML Documents

Authors:
Seung-Jin Lim;Yiu-Kai Ng
Affiliations:
-;-
Venue:
ICOIN '98 Proceedings of the 13th International Conference on Information Networking
Year:
1998

Citing 0
Cited 2

Recognition of Common Areas in a Web Page Using a Visualization Approach

AIMSA '02 Proceedings of the 10th International Conference on Artificial Intelligence: Methodology, Systems, and Applications
Document Visualization on Small Displays

MDM '03 Proceedings of the 4th International Conference on Mobile Data Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Abstract: Information on the Web, which are conglomeration of heterogeneous data, such as texts, images and audio clips, are often accessed through documents written according to the HTML specification. According to the HTML specification, HTML documents are semistructured in nature. We propose a high-level stack machine (HSM) which accesses an HTML document through its URL and constructs a semistructured data graph (SDG) of the document. The SDG of an HTML document H precisely captures the structure of the semistructured data embedded in H based on the dependency relationship among the data objects in H. HSM is configurable to accommodate a user's interest with respect to the HTML elements in H to be considered during the construction process of the SDG of H.