Constrained linear regression models for symbolic interval-valued variables

Authors:
Eufrásio de A. Lima Neto;Francisco de A. T. de Carvalho
Affiliations:
Departamento de Estatística, Universidade Federal da Paraíba, Cidade Universitária s/n, CEP 58051-900, João Pessoa (PB), Brazil;Centro de Informática, Universidade Federal de Pernambuco, Av. Prof. Luiz Freire, s/n, Cidade Universitária, CEP 50740-540, Recife (PE), Brazil
Venue:
Computational Statistics & Data Analysis
Year:
2010

Citing 10
Cited 6

Symbolic clustering using a new dissimilarity measure

Pattern Recognition
A monothetic clustering method

Pattern Recognition Letters
Clustering of interval data based on city-block distances

Pattern Recognition Letters
Multivalued type proximity measure and concept of mutual similarity value useful for clustering symbolic patterns

Pattern Recognition Letters
Adaptive Hausdorff distances and dynamic clustering of symbolic interval data

Pattern Recognition Letters
Fuzzy c-means clustering methods for symbolic interval data

Pattern Recognition Letters
Centre and Range method for fitting a linear regression model to symbolic interval data

Computational Statistics & Data Analysis
Forecasting models for interval-valued time series

Neurocomputing
I-Scal: Multidimensional scaling of interval dissimilarities

Computational Statistics & Data Analysis
Rapid and brief communication: Multivalued type dissimilarity measure and concept of mutual dissimilarity value for clustering symbolic patterns

Pattern Recognition

Estimation of a flexible simple linear model for interval data based on set arithmetic

Computational Statistics & Data Analysis
Interval arithmetic-based simple linear regression between interval data: Discussion and sensitivity analysis on the choice of the metric

Information Sciences: an International Journal
A resampling approach for interval-valued data regression

Statistical Analysis and Data Mining
Robust regression with application to symbolic interval data

Engineering Applications of Artificial Intelligence
A set arithmetic-based linear regression model for modelling interval-valued responses through real-valued variables

Information Sciences: an International Journal
Interval kernel regression

Neurocomputing

Quantified Score

Hi-index	0.03

Visualization

Abstract

This paper introduces an approach to fitting a constrained linear regression model to interval-valued data. Each example of the learning set is described by a feature vector for which each feature value is an interval. The new approach fits a constrained linear regression model on the midpoints and range of the interval values assumed by the variables in the learning set. The prediction of the lower and upper boundaries of the interval value of the dependent variable is accomplished from its midpoint and range, which are estimated from the fitted linear regression models applied to the midpoint and range of each interval value of the independent variables. This new method shows the importance of range information in prediction performance as well as the use of inequality constraints to ensure mathematical coherence between the predicted values of the lower (y@?"L"i) and upper (y@?"U"i) boundaries of the interval. The authors also propose an expression for the goodness-of-fit measure denominated determination coefficient. The assessment of the proposed prediction method is based on the estimation of the average behavior of the root-mean-square error and square of the correlation coefficient in the framework of a Monte Carlo experiment with different data set configurations. Among other aspects, the synthetic data sets take into account the dependence, or lack thereof, between the midpoint and range of the intervals. The bias produced by the use of inequality constraints over the vector of parameters is also examined in terms of the mean-square error of the parameter estimates. Finally, the approaches proposed in this paper are applied to a real data set and performances are compared.