Growing parallel paths for entity-page discovery

  • Authors:
  • Tim Weninger;Fabio Fumarola;Cindy Xide Lin;Rick Barber;Jiawei Han;Donato Malerba

  • Affiliations:
  • University of Illinois Urbana-Champaign, Urbana, IL, USA;Università degli Studi di Bari, Bari, IL, Italy;University of Illinois Urbana-Champaign, Urbana, IL, USA;University of Illinois Urbana-Champaign, Urbana, IL, USA;University of Illinois Urbana-Champaign, Urban, IL, USA;Università degli Studi di Bari, Bari, Italy

  • Venue:
  • Proceedings of the 20th international conference companion on World wide web
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we use the structural and relational information on the Web to find entity-pages. Specifically, given a Web site and an entity-page (e.g., department and faculty member homepage) we seek to find all of the entity-pages of the same type (e.g., all faculty members in the department). To do this, we propose a web structure mining method which grows parallel paths through the web graph and DOM trees. We show that by utilizing these parallel paths we can efficiently discover all entity-pages of the same type. Finally, we demonstrate the accuracy of our method with a case study on various domains.