Learning in the presence of large fluctuations: a study of aggregation and correlation

Authors:
Eric Paquet;Herna Lydia Viktor;Hongyu Guo
Affiliations:
National Research Council, Ottawa, Ontario, Canada,School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, Ontario, Canada;School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, Ontario, Canada;National Research Council, Ottawa, Ontario, Canada
Venue:
NFMCP'12 Proceedings of the First international conference on New Frontiers in Mining Complex Patterns
Year:
2012

Citing 6
Cited 0

Involving Aggregate Functions in Multi-relational Search

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Multirelational classification: a multiple view approach

Knowledge and Information Systems
Estimation of stable spectral measures

Mathematical and Computer Modelling: An International Journal
Lévy-stability-under-addition and fractal structure of markets: Implications for the investment management industry and emphasized examination of MATIF notional contract

Mathematical and Computer Modelling: An International Journal
Aggregation and privacy in multi-relational databases

PST '12 Proceedings of the 2012 Tenth Annual International Conference on Privacy, Security and Trust (PST)
Next challenges for adaptive learning systems

ACM SIGKDD Explorations Newsletter

Quantified Score

Hi-index	0.00

Visualization

Abstract

Consider a scenario where one aims to learn models from data being characterized by very large fluctuations that are neither attributable to noise nor outliers. This may be the case, for instance, when predicting the potential future damages of earthquakes or oil spills, or when conducting financial data analysis. If follows that, in such a situation, the standard central limit theorem does not apply, since the associated Gaussian distribution exponentially suppresses large fluctuations. In this paper, we present an analysis of data aggregation and correlation in such scenarios. To this end, we introduce the Lévy, or stable, distribution which is a generalization of the Gaussian distribution. Our theoretical conclusions are illustrated with various simulations, as well as against a benchmarking financial database. We show which specific strategies should be adopted for aggregation, depending on the stability exponent of the Lévy distribution. Our results indicate that the correlation in between two attributes may be underestimated if a Gaussian distribution is erroneously assumed. Secondly, we show that, in the scenario where we aim to learn a set of rules to estimate the level of stability of a stock market, the Lévy distribution produces superior results. Thirdly, we illustrate that, in a multi-relational database mining setting, aggregation using average values may be highly unsuitable.