Structuring the Web

Authors:
V. Carchiolo;A. Longheu;M. Malgeri
Affiliations:
-;-;-
Venue:
DEXA '00 Proceedings of the 11th International Workshop on Database and Expert Systems Applications
Year:
2000

Citing 0
Cited 2

Extraction of Hidden Semantics from Web Pages

IDEAL '02 Proceedings of the Third International Conference on Intelligent Data Engineering and Automated Learning
Information categorization in web pages and sites

Web Intelligence and Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The WWW is a very large and rich information source but with no structure, so locating data of interest may be difficult. In particular a page may be divided into different logical sections of information, whose highlighting may improve both browsing and searching. We propose a simple Web page structuring, by introducing the "semantic block" as a more granular level to categorize information inside a page. We also propose a set of XML tags to be added to the existing HTML tags in order to locate such blocks and to use structured pages both with current and future, structure-aware browsers, reaching the goal of a gradual migration towards a more structured Web. We explore our technique on several Web sites, in order to detect which semantic blocks are needed, also using two simple Java-based tools we developed to add XML tags and manage such structure. Finally, we consider how schema can be represented for a better browsing.