Efficient Attribute-Oriented Generalization for Knowledge Discovery from Large Databases

Authors:
Colin L. Carter;Howard J. Hamilton
Affiliations:
-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
1998

Citing 11
Cited 18

Probabilistic decision trees

Machine learning
Knowledge discovery in databases: an overview

AI Magazine
Data structures and algorithm analysis in C++

Data structures and algorithm analysis in C++
Towards efficient induction mechanisms in database systems

Theoretical Computer Science - Special issue on formal methods in databases and software engineering
An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Data-Driven Discovery of Quantitative Rules in Relational Databases

IEEE Transactions on Knowledge and Data Engineering
Database Mining: A Performance Perspective

IEEE Transactions on Knowledge and Data Engineering
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Performance evaluation of attribute-oriented algorithms for knowledge discovery from databases

TAI '95 Proceedings of the Seventh International Conference on Tools with Artificial Intelligence
Advances of the DBLearn system for knowledge discovery in large databases

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

Data Mining in Large Databases Using Domain Generalization Graphs

Journal of Intelligent Information Systems
A Graph-Based Approach for Discovering Various Types of Association Rules

IEEE Transactions on Knowledge and Data Engineering
SAINTETIQ: a fuzzy set-based approach to database summarization

Fuzzy Sets and Systems - Data bases and approximate reasoning
Applying Objective Interestingness Measures in Data Mining Systems

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Data Mining with Calendar Attributes

TSDM '00 Proceedings of the First International Workshop on Temporal, Spatial, and Spatio-Temporal Data Mining-Revised Papers
Heuristic for Ranking the Interestigness of Discovered Knowledge

PAKDD '99 Proceedings of the Third Pacific-Asia Conference on Methodologies for Knowledge Discovery and Data Mining
A Concurrent Approach to the Key-Preserving Attribute-Oriented Induction Method

PADKK '00 Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications
Evaluation of Interestingness Measures for Ranking Discovered Knowledge

PAKDD '01 Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining
Cooperative learning and virtual reality-based visualization for data mining

Data mining
Measuring the interestingness of discovered knowledge: A principled approach

Intelligent Data Analysis
AOG-ags Algorithms and Applications

ADMA '07 Proceedings of the 3rd international conference on Advanced Data Mining and Applications
Data mining by attribute generalization with fuzzy hierarchies in fuzzy databases

Fuzzy Sets and Systems
Mining negative generalized knowledge from relational databases

Knowledge-Based Systems
An efficient classifier design integrating Rough Set and Dempster-Shafer Theory

International Journal of Artificial Intelligence and Soft Computing
From data to global generalized knowledge

Decision Support Systems
A study on the modified attribute oriented induction algorithm of mining the multi-value attribute data

ACIIDS'12 Proceedings of the 4th Asian conference on Intelligent Information and Database Systems - Volume Part I
Data Mining Approaches for Geo-Spatial Big Data: Uncertainty Issues

International Journal of Organizational and Collective Intelligence
A hybrid heuristic approach for attribute-oriented mining

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present GDBR (Generalize DataBase Relation) and FIGR (Fast, Incremental Generalization and Regeneralization), two enhancements of Attribute-Oriented Generalization, a well-known knowledge discovery from databases technique. GDBR and FIGR are both O(n) and, as such, are optimal. GDBR is an on-line algorithm and requires only a small, constant amount of space. FIGR also requires a constant amount of space that is generally reasonable although, under certain circumstances, may grow large. FIGR is incremental, allowing changes to the database to be reflected in the generalization results without rereading input data. FIGR also allows fast regeneralization to both higher and lower levels of generality without rereading input. We compare GDBR and FIGR to two previous algorithms, LCHR and AOI, which are O(n log n) and O(np), respectively, where n is the number of input tuples and p the number of tuples in the generalized relation. Both require O(n) space that, for large input, causes memory problems. We implemented all four algorithms and ran empirical tests, and we found that GDBR and FIGR are faster. In addition, their runtimes increase only linearly as input size increases, while the runtimes of LCHR and AOI increase greatly when input size exceeds memory limitations.