Named entity recognition and identification for finding the owner of a home page

  • Authors:
  • Vassilis Plachouras;Matthieu Rivière;Michalis Vazirgiannis

  • Affiliations:
  • LIX, École Polytechnique, Palaiseau, France,PRESANS, X-TEC, École Polytechnique, Palaiseau, France;PRESANS, X-TEC, École Polytechnique, Palaiseau, France;LIX, École Polytechnique, Palaiseau, France,Dept of Informatics, AUEB, Athens, Greece

  • Venue:
  • PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Entity-based applications, such as expert search or online social networks where users search for persons, require high-quality datasets of named entity references. Obtaining such high-quality datasets can be achieved by automatically extracting metadata from Web pages. In this work, we focus on the identification of the named entity that corresponds to the owner of a particular Web page, for example, a home page or an organizational staff Web page. More specifically, from a set of named entities that have already been extracted from a Web page, we identify the one which corresponds to the owner of the home page. First, we develop a set of features which are combined in a scoring function to select the named entity of the Web page owner. Second, we formulate the problem as a classification problem in which a pair of a Web page and named entity is classified as being associated or not. We evaluate the proposed approaches on a set of Web pages in which we have previously identified named entities. Our experimental results show that we can identify the named entity corresponding to the owner of a home page with accuracy over 90%.