Person name disambiguation in web pages using social network, compound words and latent topics

  • Authors:
  • Shingo Ono;Issei Sato;Minoru Yoshida;Hiroshi Nakagawa

  • Affiliations:
  • Graduate School of Information Science and Technology, The University of Tokyo;Graduate School of Information Science and Technology, The University of Tokyo;Information Technology Center, The University of Tokyo;Information Technology Center, The University of Tokyo

  • Venue:
  • PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The World Wide Web (WWW) provides much information about persons, and in recent years WWW search engines have been commonly used for learning about persons. However, many persons have the same name and that ambiguity typically causes the search results of one person name to include Web pages about several different persons. We propose a novel framework for person name disambiguation that has the following three components processes. Extraction of social network information by finding co-occurrences of named entities, Measurement of document similarities based on occurrences of key compound words, Inference of topic information from documents based on the Dirichlet process unigram mixture model. Experiments using an actual Web document dataset show that the result of our framework is promising.