Turning the web into a database: extracting data and structure

  • Authors:
  • Eduard H. Hovy

  • Affiliations:
  • Information Sciences Institute, University of Southern California

  • Venue:
  • NLDB'09 Proceedings of the 14th international conference on Applications of Natural Language to Information Systems
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

People build databases to collect, systematize, and make available to users knowledge in a consistent and hopefully trustworthy form. But the largest data collection today, the web, is not systematic, consistent, or trustworthy, and the access techniques we use are provably inadequate. Focusing just on text, what would it take to extract information from the web, organize it, and form a database (both instances and metadata) from it? This paper discusses some of the core problems and provides examples of recent research in NLP: automated instance mining, metadata structure harvesting, and inter-concept relation discovery.