A data mining approach to knowledge discovery from multidimensional cube structures

Authors:
Muhammad Usman;Russel Pears;A. C. M. Fong
Affiliations:
Auckland University of Technology, Auckland, New Zealand;Auckland University of Technology, Auckland, New Zealand;Auckland University of Technology, Auckland, New Zealand
Venue:
Knowledge-Based Systems
Year:
2013

Citing 18
Cited 1

The KDD process for extracting useful knowledge from volumes of data

Communications of the ACM
Towards on-line analytical mining in large databases

ACM SIGMOD Record
PARSIMONY: An infrastructure for parallel multidimensional analysis and data mining

Journal of Parallel and Distributed Computing - Special issue on high-performance data mining
iDiff: Informative Summarization of Differences in Multidimensional Aggregates

Data Mining and Knowledge Discovery
Unsupervised Learning with Mixed Numeric and Nominal Data

IEEE Transactions on Knowledge and Data Engineering
Knowledge Discovery in High-Dimensional Data: Case Studies and a User Survey for the Rank-by-Feature Framework

IEEE Transactions on Visualization and Computer Graphics
Interactive color mosaic and dendrogram displays for signal/noise optimization in microarray data analysis

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 3 (ICME '03) - Volume 03
A k-mean clustering algorithm for mixed numeric and categorical data

Data & Knowledge Engineering
Multiple factor analysis and clustering of a mixture of quantitative, categorical and frequency data

Computational Statistics & Data Analysis
Incremental clustering of mixed data based on distance hierarchy

Expert Systems with Applications: An International Journal
Data Warehousing and Mining: Concepts, Methodologies, Tools, and Applications

Data Warehousing and Mining: Concepts, Methodologies, Tools, and Applications
Evaluating statistical tests on OLAP cubes to compare degree of disease

IEEE Transactions on Information Technology in Biomedicine - Special section on computational intelligence in medical systems
Learning task models in ill-defined domain using an hybrid knowledge discovery framework

Knowledge-Based Systems
A fuzzy c-means-type algorithm for clustering of data with mixed numeric and categorical attributes employing a probabilistic dissimilarity functional

Expert Systems with Applications: An International Journal
Clustering mixed data based on evidence accumulation

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data

Knowledge-Based Systems
Integrating clustering data mining into the multidimensional modeling of data warehouses with UML profiles

DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery
Automatic Item Weight Generation for Pattern Mining and its Application

International Journal of Data Warehousing and Mining

Discovering diverse association rules from multidimensional schema

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this research we present a novel methodology for the discovery of cubes of interest in large multi-dimensional datasets. Unlike previous research in this area, our approach does not rely on the availability of specialized domain knowledge and instead makes use of robust methods of data reduction such as Principal Component Analysis and Multiple Correspondence Analysis to identify a small subset of numeric and nominal variables that are responsible for capturing the greatest degree of variation in the data and are thus used in generating cubes of interest. Hierarchical clustering was integrated with the use of data reduction in order to gain insights into the dynamics of relationships between variables of interests at different levels of data abstraction. The two case studies that were conducted on two real word datasets revealed that the methodology was able to capture regions of interest that were significant from both the application and statistical perspectives.