Data Organization and Access for Efficient Data Mining

Authors:
Affiliations:
Venue:
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Year:
1999

Citing 0
Cited 23

Turbo-charging vertical mining of large databases

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Depth first generation of long patterns

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Towards long pattern generation in dense databases

ACM SIGKDD Explorations Newsletter
DEMON: Mining and Monitoring Evolving Data

IEEE Transactions on Knowledge and Data Engineering
Enhancing the Apriori Algorithm for Frequent Set Counting

DaWaK '01 Proceedings of the Third International Conference on Data Warehousing and Knowledge Discovery
Frequent Itemset Counting Across Multiple Tables

PADKK '00 Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications
Answering the Most Correlated N Association Rules Efficiently

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Fast vertical mining using diffsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A structural adviser for the XML document authoring

Proceedings of the 2003 ACM symposium on Document engineering
An Efficient Algorithm for Discovering Frequent Subgraphs

IEEE Transactions on Knowledge and Data Engineering
Efficient Algorithms for Mining Closed Itemsets and Their Lattice Structure

IEEE Transactions on Knowledge and Data Engineering
Frequent Substructure-Based Approaches for Classifying Chemical Compounds

IEEE Transactions on Knowledge and Data Engineering
MAFIA: A Maximal Frequent Itemset Algorithm

IEEE Transactions on Knowledge and Data Engineering
Association mining

ACM Computing Surveys (CSUR)
Discovering frequent geometric subgraphs

Information Systems
Efficient online mining of large databases

International Journal of Business Information Systems
Identifying appropriate methodologies and strategies for vertical mining with incomplete data

WSEAS Transactions on Computers
Vertical mining with incomplete data

MAMECTIS'08 Proceedings of the 10th WSEAS international conference on Mathematical methods, computational techniques and intelligent systems
Looking into the seeds of time: Discovering temporal patterns in large transaction sets

Information Sciences: an International Journal
GENCCS: a correlated group difference approach to contrast set mining

MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
An efficient approach for interactive mining of frequent itemsets

WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
Designing a context-aware system to detect dangerous situations in school routes for kids outdoor safety care

EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
Bitmap index-based decision trees

ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Efficient mining of data presents a significant challenge due to problems of combinatorial explosion in the space and time often required for such processing. While previous work has focused on improving the efficiency of the mining algorithms, we consider how the representation, organization, and access of the data may significantly affect performance, especially when I/O costs are also considered. By a simple analysis and comparison of the counting stage for the Apriori association rules algorithm, we show that a `column-wise' approach to data access is often more efficient than the standard row-wise approach. We also provide the results of empirical simulations to validate our analysis. The key idea in our approach is that counting in the Apriori algorithm with data accessed in a column-wise manner significantly reduces the number of disk accesses required to identify itemsets with a minimum support in the database -- primarily by reducing the degree to which data and counters need to be repeatedly brought into memory.