Interval versions of statistical techniques with applications to environmental analysis, bioinformatics, and privacy in statistical databases

Authors:
Vladik Kreinovich;Luc Longpré;Scott A. Starks;Gang Xiang;Jan Beck;Raj Kandathi;Asis Nayak;Scott Ferson;Janos Hajagos
Affiliations:
NASA Pan-American Center for Earth and Environmental Studies (PACES), University of Texas, El Paso, TX;NASA Pan-American Center for Earth and Environmental Studies (PACES), University of Texas, El Paso, TX;NASA Pan-American Center for Earth and Environmental Studies (PACES), University of Texas, El Paso, TX;NASA Pan-American Center for Earth and Environmental Studies (PACES), University of Texas, El Paso, TX;NASA Pan-American Center for Earth and Environmental Studies (PACES), University of Texas, El Paso, TX;NASA Pan-American Center for Earth and Environmental Studies (PACES), University of Texas, El Paso, TX;NASA Pan-American Center for Earth and Environmental Studies (PACES), University of Texas, El Paso, TX;Applied Biomathematics, Setauket, NY;Applied Biomathematics, Setauket, NY and Department of Ecology and Evolution, State University of New York, Stony Brook, NY
Venue:
Journal of Computational and Applied Mathematics - Special issue: Scientific computing, computer arithmetic, and validated numerics (SCAN 2004)
Year:
2007

Citing 1
Cited 7

Probabilities, Intervals, What Next? Optimization Problems Related to Extension of Interval Computations to Situations with Partial Information about Probabilities

Journal of Global Optimization

Estimating Variance Under Interval and Fuzzy Uncertainty: Case of Hierarchical Estimation

IFSA '07 Proceedings of the 12th international Fuzzy Systems Association world congress on Foundations of Fuzzy Logic and Soft Computing
Imprecise expectations for imprecise linear filtering

International Journal of Approximate Reasoning
Computing the variance of interval and fuzzy data

Fuzzy Sets and Systems
Computing the variance of interval and fuzzy data

Fuzzy Sets and Systems
Use of the domination property for interval valued digital signal processing

SUM'10 Proceedings of the 4th international conference on Scalable uncertainty management
No-free-lunch result for interval and fuzzy computing: when bounds are unusually good, their computation is unusually slow

MICAI'11 Proceedings of the 10th international conference on Artificial Intelligence: advances in Soft Computing - Volume Part II
Interval arithmetic-based simple linear regression between interval data: Discussion and sensitivity analysis on the choice of the metric

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many areas of science and engineering, it is desirable to estimate statistical characteristics (mean, variance, covariance, etc.) under interval uncertainty. For example, we may want to use the measured values x(t) of a pollution level in a lake at different moments of time to estimate the average pollution level; however, we do not know the exact values x(t)--e.g., if one of the measurement results is 0, this simply means that the actual (unknown) value of x(t) can be anywhere between 0 and the detection limit (DL). We must, therefore, modify the existing statistical algorithms to process such interval data. Such a modification is also necessary to process data from statistical databases, where, in order to maintain privacy, we only keep interval ranges instead of the actual numeric data (e.g., a salary range instead of the actual salary). Most resulting computational problems are NP-hard--which means, crudely speaking, that in general, no computationally efficient algorithm can solve all particular cases of the corresponding problem. In this paper, we overview practical situations in which computationally efficient algorithms exist: e.g., situations when measurements are very accurate, or when all the measurements are done with one (or few) instruments. As a case study, we consider a practical problem from bioinformatics: to discover the genetic difference between the cancer cells and the healthy cells, we must process the measurements results and find the concentrations c and h of a given gene in cancer and in healthy cells. This is a particular case of a general situation in which, to estimate states or parameters which are not directly accessible by measurements, we must solve a system of equations in which coefficients are only known with interval uncertainty. We show that in general, this problem is NP-hard, and we describe new efficient algorithms for solving this problem in practically important situations.