Robust tree-based incremental imputation method for data fusion

Authors:
Antonio D'Ambrosio;Massimo Aria;Roberta Siciliano
Affiliations:
Dipartimento di Matematica e Statistica, Università di Napoli Federico II, Italy;Dipartimento di Matematica e Statistica, Università di Napoli Federico II, Italy;Dipartimento di Matematica e Statistica, Università di Napoli Federico II, Italy
Venue:
IDA'07 Proceedings of the 7th international conference on Intelligent data analysis
Year:
2007

Citing 5
Cited 0

Statistical analysis with missing data

Statistical analysis with missing data
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Data fusion and data grafting

Computational Statistics & Data Analysis - Nonlinear methods and data mining
How to Make AdaBoost.M1 Work for Weak Base Classifiers by Changing Only One Line of the Code

ECML '02 Proceedings of the 13th European Conference on Machine Learning
Boosting and instability for regression trees

Computational Statistics & Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data Fusion and Data Grafting are concerned with combining files and information coming from different sources. The problem is not to extract data from a single database, but to merge information collected from different sample surveys. The typical data fusion situation formed of two data samples, the former made up of a complete data matrix X relative to a first survey, and the latter Y which contains a certain number of missing variables. The aim is to complete the matrix Y beginning from the knowledge acquired from the X. Thus, the goal is the definition of the correlation structure which joins the two data matrices to be merged. In this paper, we provide an innovative methodology for Data Fusion based on an incremental imputation algorithm in tree-based models. In addition, we consider robust tree validation by boosting iterations. A relevant advantage of the proposed method is that it works for a mixed data structure including both numerical and categorical variables. As benchmarking methods we consider explicit methods such as standard trees and multiple regression as well as an implicit method based principal component analysis. A widely extended simulation study proves that the proposed method is more accurate than the other methods.