Simple fast algorithms for the editing distance between trees and related problems
SIAM Journal on Computing
Discrete-time signal processing (2nd ed.)
Discrete-time signal processing (2nd ed.)
LOF: identifying density-based local outliers
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Outlier detection for high dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Efficient Similarity Search In Sequence Databases
FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Efficient Retrieval of Similar Time Sequences Under Time Warping
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Kernels for Semi-Structured Data
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
On Similarity Queries for Time-Series Data: Constraint Specification and Implementation
CP '95 Proceedings of the First International Conference on Principles and Practice of Constraint Programming
gSpan: Graph-Based Substructure Pattern Mining
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Information Systems - Special issue on web data integration
Polishing Blemishes: Issues in Data Correction
IEEE Intelligent Systems
Fast Detection of XML Structural Similarity
IEEE Transactions on Knowledge and Data Engineering
DogmatiX tracks down duplicates in XML
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Class noise vs. attribute noise: a quantitative study of their impacts
Artificial Intelligence Review
DOLPHIN: An efficient algorithm for mining distance-based outliers in very large datasets
ACM Transactions on Knowledge Discovery from Data (TKDD)
Correlation-based Attribute Outlier Detection in XML
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Hi-index | 0.00 |
XML (eXtensible Markup Language) became in recent years the new standard for data representation and exchange on the WWW. This has resulted in a great need for data cleaning techniques in order to identify outlying data. In this paper, we present a technique for outlier detection that singles out anomalies with respect to a relevant group of objects. We exploit a suitable encoding of XML documents that are encoded as signals of fixed frequency that can be transformed using Fourier Transforms. Outliers are identified by simply looking at the signal spectra. The results show the effectiveness of our approach.