HyLiEn: a hybrid approach to general list extraction on the web

  • Authors:
  • Fabio Fumarola;Tim Weninger;Rick Barber;Donato Malerba;Jiawei Han

  • Affiliations:
  • Università degli Studi di Bari, Bari, UNK, Italy;University of Illinois at Urbana-Champaign, Urbana-Champaign, IL, USA;University of Illinois at Urbana-Champaign, Urbana-Champaign, IL, USA;Università degli Studi di Bari, Bari, Italy;University of Illinois at Urbana-Champaign, Urbana-Champaign, IL, USA

  • Venue:
  • Proceedings of the 20th international conference companion on World wide web
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the problem of automatically extracting general lists from the web. Existing approaches are mostly dependent upon either the underlying HTML markup or the visual structure of the Web page. We present HyLiEn an unsupervised, Hybrid approach for automatic List discovery and Extraction on the Web. It employs general assumptions about the visual rendering of lists, and the structural representation of items contained in them. We show that our method significantly outperforms existing methods.