Synthesizing Statistical Knowledge from Incomplete Mixed-Mode Data

Authors:
Andrew K. C. Wong;David K. Y. Chiu
Affiliations:
Univ. of Waterloo, Waterloo, Ont., Canada;Univ. of Guelph, Guelph, Ont., Canada
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
1987

Citing 0
Cited 41

Probabilistic document indexing from relevance feedback data

SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Synthesis of Statistical Knowledge from Time-Dependent Data

IEEE Transactions on Pattern Analysis and Machine Intelligence
Synthesis and Recognition of Sequences

IEEE Transactions on Pattern Analysis and Machine Intelligence
A probabilistic learning approach for document indexing

ACM Transactions on Information Systems (TOIS) - Special issue on research and development in information retrieval
Integration of probabilistic fact and text retrieval

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Discretisation in Lazy Learning Algorithms

Artificial Intelligence Review - Special issue on lazy learning
An error-based conceptual clustering method for providing approximate query answers

Communications of the ACM - Electronic supplement to the December issue
Discretisation of Continuous Commercial Database Features for a Simulated Annealing Data Mining Algorithm

Applied Intelligence
An Information Theoretic Approach to Rule Induction from Databases

IEEE Transactions on Knowledge and Data Engineering
High-Order Pattern Discovery from Discrete-Valued Data

IEEE Transactions on Knowledge and Data Engineering
Class-Dependent Discretization for Inductive Learning from Continuous and Mixed-Mode Data

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Probabilistic Framework for Vague Queries and Imprecise Information in Databases

VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
Quantization of Continuous Input Variables for Binary Classification

IDEAL '00 Proceedings of the Second International Conference on Intelligent Data Engineering and Automated Learning, Data Mining, Financial Engineering, and Intelligent Agents
Discretization of Continuous Attributes on Decision System in Mitochondrial Encephalomyopathies

RSCTC '98 Proceedings of the First International Conference on Rough Sets and Current Trends in Computing
A Comparison of Several Approaches to Missing Attribute Values in Data Mining

RSCTC '00 Revised Papers from the Second International Conference on Rough Sets and Current Trends in Computing
Proportional k-Interval Discretization for Naive-Bayes Classifiers

EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Applying rough set theory to multi stage medical diagnosing

Fundamenta Informaticae
Local maximum ozone concentration prediction using soft computing methodologies

Systems Analysis Modelling Simulation
CAIM Discretization Algorithm

IEEE Transactions on Knowledge and Data Engineering
System for the recognition of human faces

IBM Systems Journal
A Discretization Algorithm Based on a Heterogeneity Criterion

IEEE Transactions on Knowledge and Data Engineering
A Fuzzy Approach to Partitioning Continuous Attributes for Classification

IEEE Transactions on Knowledge and Data Engineering
Decision Support Analysis for Software Effort Estimation by Analogy

PROMISE '07 Proceedings of the Third International Workshop on Predictor Models in Software Engineering
A global optimal algorithm for class-dependent discretization of continuous data

Intelligent Data Analysis
A discretization algorithm based on Class-Attribute Contingency Coefficient

Information Sciences: an International Journal
Improved Algorithms for Univariate Discretization of Continuous Features

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
A study on the use of imputation methods for experimentation with Radial Basis Function Network classifiers handling missing attribute values: The good synergy between RBFNs and EventCovering method

Neural Networks
The use of a Bayesian network for web effort estimation

ICWE'07 Proceedings of the 7th international conference on Web engineering
Pattern discovery for large mixed-mode database

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A discretization algorithm for uncertain data

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
Review:

The Knowledge Engineering Review
A review and comparison of strategies for handling missing values in separate-and-conquer rule learning

Journal of Intelligent Information Systems
A supervised and multivariate discretization algorithm for rough sets

RSKT'10 Proceedings of the 5th international conference on Rough set and knowledge technology
A quantitative diagnostic method based on bayesian networks in traditional chinese medicine

ICONIP'06 Proceedings of the 13th international conference on Neural information processing - Volume Part III
An effective discretization based on Class-Attribute Coherence Maximization

Pattern Recognition Letters
Extension of the generalization complexity measure to real valued input data sets

ISNN'10 Proceedings of the 7th international conference on Advances in Neural Networks - Volume Part I
Predicting web development effort using a bayesian network

EASE'07 Proceedings of the 11th international conference on Evaluation and Assessment in Software Engineering
Automating the knowledge acquisition process in the construction of medical expert systems

Artificial Intelligence in Medicine
Applying Rough Set Theory to Multi Stage Medical Diagnosing

Fundamenta Informaticae
Classification of Unseen Examples under Uncertainty

Fundamenta Informaticae
An incremental decision tree algorithm based on rough sets and its application in intrusion detection

Artificial Intelligence Review

Quantified Score

Hi-index	0.15

Visualization

Abstract

The difficulties in analyzing and clustering (synthesizing) multivariate data of the mixed type (discrete and continuous) are largely due to: 1) nonuniform scaling in different coordinates, 2) the lack of order in nominal data, and 3) the lack of a suitable similarity measure. This paper presents a new approach which bypasses these difficulties and can acquire statistical knowledge from incomplete mixed-mode data. The proposed method adopts an event-covering approach which covers a subset of statistically relevant outcomes in the outcome space of variable-pairs. And once the covered event patterns are acquired, subsequent analysis tasks such as probabilistic inference, cluster analysis, and detection of event patterns for each cluster based on the incomplete probability scheme can be performed. There are four phases in our method: 1) the discretization of the continuous components based on a maximum entropy criterion so that the data can be treated as n-tuples of discrete-valued features; 2) the estimation of the missing values using our newly developed inference procedure; 3) the initial formation of clusters by analyzing the nearest-neighbor distance on subsets of selected samples; and 4) the reclassification of the n-tuples into more reliable clusters based on the detected interdependence relationships. For performance evaluation, experiments have been conducted using both simulated and real life data.