Querying and Clustering Web Pages about Persons and Organizations

  • Authors:
  • Shiren Ye;Tat-seng Chua;Jeremy R. Kei

  • Affiliations:
  • -;-;-

  • Venue:
  • WI '03 Proceedings of the 2003 IEEE/WIC International Conference on Web Intelligence
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

One of the most frequent Web surfing tasks is to search for names of persons and organizations. Such names are often not distinctive, commonly occurring, and nonunique. Thus, a single name may be mapped to several entities. The paper describes a methodology to cluster the Web pages returned by the search engine so that pages belonging to different entities are clustered into different groups. The algorithm uses a combination of named entities, link-based and structure-based information as features to partition the document set into direct and indirect pages using a decision model. It then uses the distinct direct pages as seeds to cluster the document set into different clusters. The algorithm has been found to beeffective for Web-based applications.