Correlation-based Attribute Outlier Detection in XML

  • Authors:
  • Judice L. Y. Koh;Mong Li Lee;Wynne Hsu;Wee Tiong Ang

  • Affiliations:
  • School of Computing, National University of Singapore, 3 Science Drive 2, Singapore 119260. judice.koh@utoronto.ca;School of Computing, National University of Singapore, 3 Science Drive 2, Singapore 119260. leeml@comp.nus.edu.sg;School of Computing, National University of Singapore, 3 Science Drive 2, Singapore 119260. whsu@comp.nus.edu.sg;Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613. wtang@i2r.a-star.edu.sg

  • Venue:
  • ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Compared to relational data models, the hierarchical structure of semi-structured data such as XML provides semantically meaningful neighbourhoods advancing data cleaning problems such as outlier detection. In this paper, we introduce the concept of correlated subspace that leverages on the hierarchical relationships between XML attributes to provide contextually informative neighbourhoods for attribute outlier detection. We also design two correlation-based attribute outlier metrics for XML, namely the xO-Measure and xQ-Measure. The effectiveness of our XML outlier detection approach is supported with experimental results.