Statistical analysis with missing data
Statistical analysis with missing data
C4.5: programs for machine learning
C4.5: programs for machine learning
Latent semantic indexing: a probabilistic analysis
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Dimensionality reduction for similarity searching in dynamic databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Privacy-preserving data mining
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Learning from Incomplete Data
On the Use of Conceptual Reconstruction for Mining Massively Incomplete Data Sets
IEEE Transactions on Knowledge and Data Engineering
Conceptual construction on incomplete survey data
Data & Knowledge Engineering
On k-anonymity and the curse of dimensionality
VLDB '05 Proceedings of the 31st international conference on Very large data bases
On privacy preservation against adversarial data mining
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A new imputation method for small software project data sets
Journal of Systems and Software
Suppression and failures in sensor networks: a Bayesian approach
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Missing Data Imputation Techniques
International Journal of Business Intelligence and Data Mining
On the influence of imputation in classification: practical issues
Journal of Experimental & Theoretical Artificial Intelligence
Multi-agent based multi-knowledge acquisition method for rough set
RSKT'08 Proceedings of the 3rd international conference on Rough sets and knowledge technology
CLINCH: clustering incomplete high-dimensional data for data mining application
APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
Hi-index | 0.00 |
Incomplete data sets have become almost ubiquitous in a wide variety of application domains. Common examples can be found in climate and image data sets, sensor data sets and medical data sets. The incompleteness in these data sets may arise from a number of factors: in some cases it may simply be a reflection of certain measurements not being available at the time; in others the information may be lost due to partial system failure; or it may simply be a result of users being unwilling to specify attributes due to privacy concerns. When a significant fraction of the entries are missing in all of the attributes, it becomes very difficult to perform any kind of reasonable extrapolation on the original data. For such cases, we introduce the novel idea of conceptual reconstruction, in which we create effective conceptual representations on which the data mining algorithms can be directly applied. The attraction behind the idea of conceptual reconstruction is to use the correlation structure of the data in order to express it in terms of concepts rather the original dimensions. As a result, the reconstruction procedure estimates only those conceptual aspects of the data which can be mined from the incomplete data set, rather than force errors created by extrapolation. We demonstrate the effectiveness of the approach on a variety of real data sets.