Focused Crawls, Tunneling, and Digital Libraries
ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
Agents, Crawlers, and Web Retrieval
CIA '02 Proceedings of the 6th International Workshop on Cooperative Information Agents VI
Hi-index | 0.01 |
As an alternative to search capability, many search engines are providing directory servers containing categorized Web documents for users to navigate and browse through. In this paper, we are investigating three issues in portal site construction given a large collection of categorized Web documents: (1) distillation of important topics for each category of documents; (2) distillation of important documents/sites for these topics; and (3) automation of the such two tasks. We have developed an automated technique for topics and Web site distillation. Our technique integrates Web document content analysis and link structure analysis. It considers local importance of keywords and their global distribution statistics on a given Web document category hierarchy.