Semistructured data and XML

  • Authors:
  • Dan Suciu

  • Affiliations:
  • AT&T Labs, Florham Park, NJ

  • Venue:
  • Information organization and databases
  • Year:
  • 2000

Quantified Score

Hi-index 0.02

Visualization

Abstract

XML poses a new set of challenges for semistructured data research. The Extensible Markup Language, XML, is a new recommendation from World Wide Web Consortium that will become a universal data exchange format for the Web. XML shares many common features with semistructured data. Also, it is easy to convert data from virtually any source into XML, which will make it attractive for organizations to "publish" their information sources in XML, and thus make them available to other XML applications on the Web. For such applications to reach their full potential, however, we need to build the right tools to process data in this new format, to perform database operations, like data extraction, data integration, data translation, data storage. Research done so far on semistructured data may offer some solutions, like illustrated by the query language XML-QL. But, as we argue in this paper, XML creates problems which the research on semistructured data has not yet addressed (e.g. type inference), or has not considered important (e.g. distributed evaluation), or simply hasn't solved yet (e.g. storage).