Web data management

  • Authors:
  • Michael J. Cafarella;Alon Y. Halevy

  • Affiliations:
  • University of Michigan, Ann Arbor, MI, USA;Google, Inc., Mountain View, CA, USA

  • Venue:
  • Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Web Data Management (or WDM) refers to a body of work concerned with leveraging the large collections of structured data that can be extracted from the Web. Over the past few years, several research and commercial efforts have explored these collections of data with the goal of improving Web search and developing mechanisms for surfacing different kinds of search answers. This work has leveraged (1) collections of structured data such as HTML tables, lists and forms, (2) recent ontologies and knowledge bases created by crowd-sourcing, such as Wikipedia and its derivatives, DBPedia, YAGO and Freebase, and (3) the collection of text documents from the Web, from which facts could be extracted in a domain-independent fashion. The promise of this line of work is based on the observation that new kinds of results can be obtained by leveraging a huge collection of independently created fragments of data, and typically in ways that are wholly unrelated to the authors' original intent. For example, we might use many database schemas to compute a schema thesaurus. Or we might examine many spreadsheets of scientific data that reveal the aggregate practice of an entire scientific field. As such, WDM is tightly linked to Web-enabled collaboration, even (or especially) if the collaborators are unwitting ones. We will cover the key techniques, principles and insights obtained so far in the area of Web Data Management.