Mix-n-Match: building personal libraries from web content

  • Authors:
  • Matthias Geel;Timothy Church;Moira C. Norrie

  • Affiliations:
  • Institute for Information Systems, ETH Zurich, Zurich, Switzerland;Institute for Information Systems, ETH Zurich, Zurich, Switzerland;Institute for Information Systems, ETH Zurich, Zurich, Switzerland

  • Venue:
  • TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present an approach to web content aggregation that allows information to be harvested from web pages, independent of specific markup languages. It builds on ideas from data warehousing and we present solutions to the well-known problems of data integration, namely detection of equivalences and data cleaning, adapted to this context. We describe how the content aggregation engine has been realised as an extensible framework in such a way that end-users as well as developers can use the associated tools to create personal libaries of content extracted from the web.