Detecting Aggregate Incongruities in XML

Authors:
Wynne Hsu;Qiangfeng Peter Lau;Mong Li Lee
Affiliations:
Department of Computer Science, National University of Singapore,;Department of Computer Science, National University of Singapore,;Department of Computer Science, National University of Singapore,
Venue:
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Year:
2009

Citing 9
Cited 0

Finding Intensional Knowledge of Distance-Based Outliers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Data Cleaning and XML: The DBLP Experience

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Polishing Blemishes: Issues in Data Correction

IEEE Intelligent Systems
Class Noise vs. Attribute Noise: A Quantitative Study

Artificial Intelligence Review
An effective and efficient algorithm for high-dimensional outlier detection

The VLDB Journal — The International Journal on Very Large Data Bases
DogmatiX tracks down duplicates in XML

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Correlation-based Attribute Outlier Detection in XML

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Correlation-based detection of attribute outliers

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
XML duplicate detection using sorted neighborhoods

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of identifying deviating patterns in XML repositories has important applications in data cleaning, fraud detection, and stock market analysis. Current methods determine data discrepancies by assessing whether the data conforms to the expected distribution of its immediate neighborhood. This approach may miss interesting deviations involving aggregated information. For example, the average number of transactions of a particular bank account may be exceptionally high as compared to other accounts with similar profiles. Such incongruity could only be revealed through aggregating appropriate data and analyzing the aggregated results in the associated neighborhood. This neighborhood is implicitly encapsulated in the XML structure. In addition, the hierarchical nature of the XML structure reflects the different levels of abstractions in the real world. This work presents a framework that detects incongruities in aggregate information. It utilizes the inherent characteristics of the XML structure to systematically aggregate leaf-level data and propagate the aggregated information up the hierarchy. The aggregated information is analyzed using a novel method by first clustering similar data, then, assuming a statistical distribution and identifying aggregate incongruity within the clusters. Experiments results indicate that the proposed approach is effective in detecting interesting discrepancies in a real world bank data set.