On the Use of Conceptual Reconstruction for Mining Massively Incomplete Data Sets

Authors:
Srinivasan Parthasarathy;Charu C. Aggarwal
Affiliations:
-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2003

Citing 11
Cited 4

Statistical analysis with missing data

Statistical analysis with missing data
Unknown attribute values in induction

Proceedings of the sixth international workshop on Machine learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Latent semantic indexing: a probabilistic analysis

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Dimensionality reduction for similarity searching in dynamic databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Applications of linear algebra in information retrieval and hypertext analysis

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
On the effects of dimensionality reduction on high dimensional similarity search

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Mining massively incomplete data sets by conceptual reconstruction

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Learning from Incomplete Data

Learning from Incomplete Data
Fundamentals of Applied Probability Theory

Fundamentals of Applied Probability Theory

An effective and efficient algorithm for high-dimensional outlier detection

The VLDB Journal — The International Journal on Very Large Data Bases
Toward Unsupervised Correlation Preserving Discretization

IEEE Transactions on Knowledge and Data Engineering
Data mining research for customer relationship management systems: a framework and analysis

International Journal of Business Information Systems
Prediction of cerebral aneurysm rupture using hemodynamic, morphologic and clinical features: a data mining approach

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Incomplete data sets have become almost ubiquitous in a wide variety of application domains. Common examples can be found in climate and image data sets, sensor data sets, and medical data sets. The incompleteness in these data sets may arise from a number of factors: In some cases, it may simply be a reflection of certain measurements not being available at the time, in others, the information may be lost due to partial system failure, or it may simply be a result of users being unwilling to specify attributes due to privacy concerns. When a significant fraction of the entries are missing in all of the attributes, it becomes very difficult to perform any kind of reasonable extrapolation on the original data. For such cases, we introduce the novel idea of conceptual reconstruction in which we create effective conceptual representations on which the data mining algorithms can be directly applied. The attraction behind the idea of conceptual reconstruction is to use the correlation structure of the data in order to express it in terms of concepts rather than the original dimensions. As a result, the reconstruction procedure estimates only those conceptual aspects of the data which can be mined from the incomplete data set, rather than force errors created by extrapolation. We demonstrate the effectiveness of the approach on a variety of real data sets.