ADBIS-DASFAA '00 Proceedings of the East-European Conference on Advances in Databases and Information Systems Held Jointly with International Conference on Database Systems for Advanced Applications: Current Issues in Databases and Information Systems
Towards logical hypertext structure
IICS'04 Proceedings of the 4th international conference on Innovative Internet Community Systems
Hi-index | 0.00 |
We propose a method of identifying logical documents in Web data. Pages in Web data are sometimes designed for presentation and do not always reflect logical structure, while a logical document is a data unit representing logical structure. One logical document often corresponds to a connected subgraph consisting of multiple pages. Therefore, for various Web data processing that should capture logical structure, such as querying facilities, extended support for user navigation, and Web structure analysis, logical documents are more appropriate data units than pages. We develop a method of identifying such logical documents in existing Web data. Our method uses three kind of information: link structure, directory structure embedded in URIs, and page contents.