Probabilistic document indexing from relevance feedback data
SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Synthesis of Statistical Knowledge from Time-Dependent Data
IEEE Transactions on Pattern Analysis and Machine Intelligence
Synthesis and Recognition of Sequences
IEEE Transactions on Pattern Analysis and Machine Intelligence
A probabilistic learning approach for document indexing
ACM Transactions on Information Systems (TOIS) - Special issue on research and development in information retrieval
Integration of probabilistic fact and text retrieval
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Discretisation in Lazy Learning Algorithms
Artificial Intelligence Review - Special issue on lazy learning
An error-based conceptual clustering method for providing approximate query answers
Communications of the ACM - Electronic supplement to the December issue
An Information Theoretic Approach to Rule Induction from Databases
IEEE Transactions on Knowledge and Data Engineering
High-Order Pattern Discovery from Discrete-Valued Data
IEEE Transactions on Knowledge and Data Engineering
Class-Dependent Discretization for Inductive Learning from Continuous and Mixed-Mode Data
IEEE Transactions on Pattern Analysis and Machine Intelligence
A Probabilistic Framework for Vague Queries and Imprecise Information in Databases
VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
Quantization of Continuous Input Variables for Binary Classification
IDEAL '00 Proceedings of the Second International Conference on Intelligent Data Engineering and Automated Learning, Data Mining, Financial Engineering, and Intelligent Agents
Discretization of Continuous Attributes on Decision System in Mitochondrial Encephalomyopathies
RSCTC '98 Proceedings of the First International Conference on Rough Sets and Current Trends in Computing
A Comparison of Several Approaches to Missing Attribute Values in Data Mining
RSCTC '00 Revised Papers from the Second International Conference on Rough Sets and Current Trends in Computing
Proportional k-Interval Discretization for Naive-Bayes Classifiers
EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Applying rough set theory to multi stage medical diagnosing
Fundamenta Informaticae
Local maximum ozone concentration prediction using soft computing methodologies
Systems Analysis Modelling Simulation
IEEE Transactions on Knowledge and Data Engineering
System for the recognition of human faces
IBM Systems Journal
A Discretization Algorithm Based on a Heterogeneity Criterion
IEEE Transactions on Knowledge and Data Engineering
A Fuzzy Approach to Partitioning Continuous Attributes for Classification
IEEE Transactions on Knowledge and Data Engineering
Decision Support Analysis for Software Effort Estimation by Analogy
PROMISE '07 Proceedings of the Third International Workshop on Predictor Models in Software Engineering
A global optimal algorithm for class-dependent discretization of continuous data
Intelligent Data Analysis
A discretization algorithm based on Class-Attribute Contingency Coefficient
Information Sciences: an International Journal
Improved Algorithms for Univariate Discretization of Continuous Features
PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
The use of a Bayesian network for web effort estimation
ICWE'07 Proceedings of the 7th international conference on Web engineering
Pattern discovery for large mixed-mode database
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A discretization algorithm for uncertain data
DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
The Knowledge Engineering Review
Journal of Intelligent Information Systems
A supervised and multivariate discretization algorithm for rough sets
RSKT'10 Proceedings of the 5th international conference on Rough set and knowledge technology
A quantitative diagnostic method based on bayesian networks in traditional chinese medicine
ICONIP'06 Proceedings of the 13th international conference on Neural information processing - Volume Part III
An effective discretization based on Class-Attribute Coherence Maximization
Pattern Recognition Letters
Extension of the generalization complexity measure to real valued input data sets
ISNN'10 Proceedings of the 7th international conference on Advances in Neural Networks - Volume Part I
Predicting web development effort using a bayesian network
EASE'07 Proceedings of the 11th international conference on Evaluation and Assessment in Software Engineering
Automating the knowledge acquisition process in the construction of medical expert systems
Artificial Intelligence in Medicine
Applying Rough Set Theory to Multi Stage Medical Diagnosing
Fundamenta Informaticae
Classification of Unseen Examples under Uncertainty
Fundamenta Informaticae
Artificial Intelligence Review
Hi-index | 0.15 |
The difficulties in analyzing and clustering (synthesizing) multivariate data of the mixed type (discrete and continuous) are largely due to: 1) nonuniform scaling in different coordinates, 2) the lack of order in nominal data, and 3) the lack of a suitable similarity measure. This paper presents a new approach which bypasses these difficulties and can acquire statistical knowledge from incomplete mixed-mode data. The proposed method adopts an event-covering approach which covers a subset of statistically relevant outcomes in the outcome space of variable-pairs. And once the covered event patterns are acquired, subsequent analysis tasks such as probabilistic inference, cluster analysis, and detection of event patterns for each cluster based on the incomplete probability scheme can be performed. There are four phases in our method: 1) the discretization of the continuous components based on a maximum entropy criterion so that the data can be treated as n-tuples of discrete-valued features; 2) the estimation of the missing values using our newly developed inference procedure; 3) the initial formation of clusters by analyzing the nearest-neighbor distance on subsets of selected samples; and 4) the reclassification of the n-tuples into more reliable clusters based on the detected interdependence relationships. For performance evaluation, experiments have been conducted using both simulated and real life data.