Towards on-line analytical mining in large databases
ACM SIGMOD Record
PARSIMONY: An infrastructure for parallel multidimensional analysis and data mining
Journal of Parallel and Distributed Computing - Special issue on high-performance data mining
High Performance OLAP and Data Mining on Parallel Computers
Data Mining and Knowledge Discovery
Unsupervised Learning with Mixed Numeric and Nominal Data
IEEE Transactions on Knowledge and Data Engineering
A Parallel Scalable Infrastructure for OLAP and Data Mining
IDEAS '99 Proceedings of the 1999 International Symposium on Database Engineering & Applications
Mapping nominal values to numbers for effective visualization
Information Visualization - Special issue of selected and extended InfoVis 03 papers
Interpretable Hierarchical Clustering by Constructing an Unsupervised Decision Tree
IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Knowledge and Data Engineering
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
IEEE Transactions on Visualization and Computer Graphics
Enhanced mining of association rules from data cubes
DOLAP '06 Proceedings of the 9th ACM international workshop on Data warehousing and OLAP
A k-mean clustering algorithm for mixed numeric and categorical data
Data & Knowledge Engineering
Data warehousing and knowledge discovery: a chronological view of research challenges
DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery
Hi-index | 0.00 |
Data mining and data warehousing are two key technologies which have made significant contributions to the field of knowledge discovery in a variety of domains. More recently, the integrated use of traditional data mining techniques such as clustering and pattern recognition with data warehousing technique of Online Analytical Processing (OLAP) have motivated diverse research areas for leveraging knowledge discovery from complex real-world datasets. Recently, a number of such integrated methodologies have been proposed to extract knowledge from datasets but most of these methodologies lack automated and generic methods for schema generation and knowledge extraction. Mostly data analysts need to rely on domain specific knowledge and have to cope with technological constraints in order to discover knowledge from high dimensional datasets. In this paper we present a generic methodology which incorporates semi-automated knowledge extraction methods to provide data-driven assistance towards knowledge discovery. In particular, we provide a method for constructing a binary tree of hierarchical clusters and annotate each node in the tree with significant numeric variables. Additionally, we propose automated methods to rank nominal variables and to generate candidate multidimensional schema with highly significant dimensions. We have performed three case studies on three real-world datasets taken from the UCI machine learning repository in order to validate the generality and applicability of our proposed methodology.