Efficient schema extraction from a large collection of XML documents

  • Authors:
  • Guangming Xing;Vijayeandra Parthepan

  • Affiliations:
  • Western Kentucky University, Bowling Green, KY;Western Kentucky University, Bowling Green, KY

  • Venue:
  • Proceedings of the 49th Annual Southeast Regional Conference
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

XML is becoming the standard format for data exchange on the Internet. In this paper, we present a system that is effective in extracting schema information from a large collection of XML documents. Based on Xtract, we propose using the cost of an NFA simulation to compute the Minimum Length Description. We also studied using frequencies of the sample inputs to improve the effectiveness of the schema extraction. Experimental studies were conducted on synthesized XML data sets, suggesting the efficiency and effectiveness of our approach as a solution for schema inference.