Turbo-charging vertical mining of large databases
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Depth first generation of long patterns
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Towards long pattern generation in dense databases
ACM SIGKDD Explorations Newsletter
DEMON: Mining and Monitoring Evolving Data
IEEE Transactions on Knowledge and Data Engineering
Enhancing the Apriori Algorithm for Frequent Set Counting
DaWaK '01 Proceedings of the Third International Conference on Data Warehousing and Knowledge Discovery
Frequent Itemset Counting Across Multiple Tables
PADKK '00 Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications
Answering the Most Correlated N Association Rules Efficiently
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Fast vertical mining using diffsets
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A structural adviser for the XML document authoring
Proceedings of the 2003 ACM symposium on Document engineering
An Efficient Algorithm for Discovering Frequent Subgraphs
IEEE Transactions on Knowledge and Data Engineering
Efficient Algorithms for Mining Closed Itemsets and Their Lattice Structure
IEEE Transactions on Knowledge and Data Engineering
Frequent Substructure-Based Approaches for Classifying Chemical Compounds
IEEE Transactions on Knowledge and Data Engineering
MAFIA: A Maximal Frequent Itemset Algorithm
IEEE Transactions on Knowledge and Data Engineering
ACM Computing Surveys (CSUR)
Discovering frequent geometric subgraphs
Information Systems
Efficient online mining of large databases
International Journal of Business Information Systems
Identifying appropriate methodologies and strategies for vertical mining with incomplete data
WSEAS Transactions on Computers
Vertical mining with incomplete data
MAMECTIS'08 Proceedings of the 10th WSEAS international conference on Mathematical methods, computational techniques and intelligent systems
Looking into the seeds of time: Discovering temporal patterns in large transaction sets
Information Sciences: an International Journal
GENCCS: a correlated group difference approach to contrast set mining
MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
An efficient approach for interactive mining of frequent itemsets
WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
Bitmap index-based decision trees
ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
Hi-index | 0.00 |
Efficient mining of data presents a significant challenge due to problems of combinatorial explosion in the space and time often required for such processing. While previous work has focused on improving the efficiency of the mining algorithms, we consider how the representation, organization, and access of the data may significantly affect performance, especially when I/O costs are also considered. By a simple analysis and comparison of the counting stage for the Apriori association rules algorithm, we show that a `column-wise' approach to data access is often more efficient than the standard row-wise approach. We also provide the results of empirical simulations to validate our analysis. The key idea in our approach is that counting in the Apriori algorithm with data accessed in a column-wise manner significantly reduces the number of disk accesses required to identify itemsets with a minimum support in the database -- primarily by reducing the degree to which data and counters need to be repeatedly brought into memory.