Data Mining in Large Databases Using Domain Generalization Graphs

Authors:
Robert J. Hilderman;Howard J. Hamilton;Nick Cercone
Affiliations:
Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2. hilder@cs.uregina.ca;Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2. hamilton@cs.uregina.ca;Department of Computer Science, Faculty of Mathematics, University of Waterloo, Waterloo, Ontario, Canada N2L 3G1. ncercone@math.uwaterloo.ca
Venue:
Journal of Intelligent Information Systems
Year:
1999

Citing 27
Cited 9

C4.5: programs for machine learning

C4.5: programs for machine learning
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Towards efficient induction mechanisms in database systems

Theoretical Computer Science - Special issue on formal methods in databases and software engineering
An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Dynamic itemset counting and implication rules for market basket data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
An error-based conceptual clustering method for providing approximate query answers

Communications of the ACM - Electronic supplement to the December issue
Efficient Attribute-Oriented Generalization for Knowledge Discovery from Large Databases

IEEE Transactions on Knowledge and Data Engineering
Data-Driven Discovery of Quantitative Rules in Relational Databases

IEEE Transactions on Knowledge and Data Engineering
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Parallel Knowledge Discovery Using Domain Generalization Graphs

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Share Based Measures for Itemsets

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Conceptual Knowledge Discovery in Databases Using Formal Concept Analysis Methods

PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
Knowledge Discovery in Databases: An Attribute-Oriented Approach

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Discovery of Multiple-Level Association Rules from Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Mining Generalized Association Rules

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
A Comparison of Attribute Selection Strategies for Attribute-Oriented Generalization

ISMIS '97 Proceedings of the 10th International Symposium on Foundations of Intelligent Systems
Ranking the Interestingness of Summaries from Data Mining Systems

Proceedings of the Twelfth International Florida Artificial Intelligence Research Society Conference
Mining Market Basket Data Using Share Measures and Characterized Itemsets

PAKDD '98 Proceedings of the Second Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining
Performance evaluation of attribute-oriented algorithms for knowledge discovery from databases

TAI '95 Proceedings of the Seventh International Conference on Tools with Artificial Intelligence
Attribute-Oriented Induction Using Domain Generalization Graphs

ICTAI '96 Proceedings of the 8th International Conference on Tools with Artificial Intelligence
Data visualization in the DB-Discover system

ICTAI '97 Proceedings of the 9th International Conference on Tools with Artificial Intelligence
Version spaces: an approach to concept learning.

Version spaces: an approach to concept learning.
Advances of the DBLearn system for knowledge discovery in large databases

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

A Graph-Based Approach for Discovering Various Types of Association Rules

IEEE Transactions on Knowledge and Data Engineering
Applying Objective Interestingness Measures in Data Mining Systems

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Data Mining with Calendar Attributes

TSDM '00 Proceedings of the First International Workshop on Temporal, Spatial, and Spatio-Temporal Data Mining-Revised Papers
Evaluation of Interestingness Measures for Ranking Discovered Knowledge

PAKDD '01 Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining
On Mining Summaries by Objective Measures of Interestingness

Machine Learning
Measuring the interestingness of discovered knowledge: A principled approach

Intelligent Data Analysis
Data mining by attribute generalization with fuzzy hierarchies in fuzzy databases

Fuzzy Sets and Systems
Data Mining Approaches for Geo-Spatial Big Data: Uncertainty Issues

International Journal of Organizational and Collective Intelligence
A hybrid heuristic approach for attribute-oriented mining

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Attribute-oriented generalization summarizes theinformation in a relational database by repeatedly replacingspecific attribute values with more general concepts accordingto user-defined concept hierarchies. We introduce domaingeneralization graphs for controlling the generalization of aset of attributes and show how they are constructed. We thenpresent serial and parallel versions of the Multi-AttributeGeneralization algorithm for traversing the generalization statespace described by joining the domain generalization graphs formultiple attributes. Based upon a generate-and-test approach,the algorithm generates all possible summaries consistent withthe domain generalization graphs. Our experimental results showthat significant speedups are possible by partitioning pathcombinations from the DGGs across multiple processors. We alsorank the interestingness of the resulting summaries usingmeasures based upon variance and relative entropy. Ourexperimental results also show that these measures provide aneffective basis for analyzing summary data generated fromrelational databases. Variance appears more useful because ittends to rank the less complex summaries (i.e., those with fewattributes and/or tuples) as more interesting.