Grouping Web Pages about Persons and Organizations for Information Extraction

  • Authors:
  • Shiren Ye;Tat-Seng Chua;Jimin Liu;Jeremy R. Kei

  • Affiliations:
  • -;-;-;-

  • Venue:
  • ICADL '02 Proceedings of the 5th International Conference on Asian Digital Libraries: Digital Libraries: People, Knowledge, and Technology
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Information extraction on the Web permits users to retrieve specific information on a person or an organization. As names are non-unique, the same name may be mapped to multiple entities. The aim of this paper is to describe an algorithm to cluster Web pages returned by search engines so that pages belonging to different entities are clustered into different groups. The algorithm uses named entities as the features to divide the document set into direct and indirect pages. It then uses distinct direct pages as seeds of clusters to group indirect pages into different clusters. The algorithm has been found to be effective for Web-based applications.