Towards inference of more realistic XSDs

  • Authors:
  • Irena Mlýnková;Martin Nečaský

  • Affiliations:
  • Charles University in Prague, Czech Republic;Charles University in Prague, Czech Republic

  • Venue:
  • Proceedings of the 2009 ACM symposium on Applied Computing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The XML has undoubtedly become a standard for data representation and manipulation. But most of XML documents are still created without the respective description of their structure, i.e. an XML schema. Hence, in this paper we focus on the problem of automatic inferring of an XML schema for a given sample set of XML documents. Contrary to existing works, whose aim is to infer as concise schema as possible, we focus on inferring of a more realistic result, i.e. a schema that is closer to human-written ones and bears more precise information. For this purpose we extend and combine the existing verified techniques (such as ACO heuristics or MDL principle) with a set of heuristics exploiting semantics of element/attribute names, thesauri or statistical analysis of input data. Using a set of examples we show and discuss advantages of our proposal.