Topic Distillation on Hierarchically Categorized Web Documents

  • Authors:
  • Vadim Katz;W-S. Li

  • Affiliations:
  • -;-

  • Venue:
  • KDEX '99 Proceedings of the 1999 Workshop on Knowledge and Data Engineering Exchange
  • Year:
  • 1999

Quantified Score

Hi-index 0.01

Visualization

Abstract

As an alternative to search capability, many search engines are providing directory servers containing categorized Web documents for users to navigate and browse through. In this paper, we are investigating three issues in portal site construction given a large collection of categorized Web documents: (1) distillation of important topics for each category of documents; (2) distillation of important documents/sites for these topics; and (3) automation of the such two tasks. We have developed an automated technique for topics and Web site distillation. Our technique integrates Web document content analysis and link structure analysis. It considers local importance of keywords and their global distribution statistics on a given Web document category hierarchy.