Mining Very Large Databases

Authors:
Venkatesh Ganti;Johannes Gehrke;Raghu Ramakrishnan
Affiliations:
-;-;-
Venue:
Computer
Year:
1999

Citing 17
Cited 38

Machine learning, neural and statistical classification

Machine learning, neural and statistical classification
An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Dynamic itemset counting and implication rules for market basket data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Advances in knowledge discovery and data mining

Advances in knowledge discovery and data mining
Fast discovery of association rules

Advances in knowledge discovery and data mining
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
On the Discovery of Interesting Patterns in Association Rules

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
RainForest - A Framework for Fast Decision Tree Construction of Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
SPRINT: A Scalable Parallel Classifier for Data Mining

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Clustering Large Datasets in Arbitrary Metric Spaces

ICDE '99 Proceedings of the 15th International Conference on Data Engineering

Scalability for clustering algorithms revisited

ACM SIGKDD Explorations Newsletter
H-BLOB: a hierarchical visual clustering method using implicit surfaces

Proceedings of the conference on Visualization '00
Tri-plots: scalable tools for multidimensional data mining

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining by Means of Binary Representation: A Model for Similarity and Clustering

Information Systems Frontiers
Sampling Strategies for Mining in Data-Scarce Domains

Computing in Science and Engineering
Database Technology for Decision Support Systems

Computer
An Open Framework for Smart and Personalized Distance Learning

ICWL '02 Proceedings of the First International Conference on Advances in Web-Based Learning
Parallel Data Mining on ATM-Connected PC Cluster and Optimization of Its Execution Environments

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Implementation Issues in the Design of I/O Intensive Data Mining Applications on Clusters of Workstations

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries

Proceedings of the 27th International Conference on Very Large Data Bases
The Application of Case Based Reasoning on Q&A System

AI '02 Proceedings of the 15th Australian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Enhancing the Apriori Algorithm for Frequent Set Counting

DaWaK '01 Proceedings of the Third International Conference on Data Warehousing and Knowledge Discovery
T3: A Classification Algorithm for Data Mining

IDEAL '02 Proceedings of the Third International Conference on Intelligent Data Engineering and Automated Learning
Parallel Fuzzy c-Means Clustering for Large Data Sets

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
One-Pass Wavelet Decompositions of Data Streams

IEEE Transactions on Knowledge and Data Engineering
Fuzzy and rough sets

Handbook of data mining and knowledge discovery
On finding common neighborhoods in massive graphs

Theoretical Computer Science
Identifying Candidate Disease Genes with High-Performance Computing

The Journal of Supercomputing
Turning CARTwheels: an alternating algorithm for mining redescriptions

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Distributed approximate mining of frequent patterns

Proceedings of the 2005 ACM symposium on Applied computing
The Nearest Subclass Classifier: A Compromise between the Nearest Mean and Nearest Neighbor Classifier

IEEE Transactions on Pattern Analysis and Machine Intelligence
Scalable Model-Based Clustering for Large Databases Based on Data Summarization

IEEE Transactions on Pattern Analysis and Machine Intelligence
Scalable visual assessment of cluster tendency for large data sets

Pattern Recognition
Analysing users' access logs in Moodle to improve e learning

EATIS '07 Proceedings of the 2007 Euro American conference on Telematics and information systems
Approximate mining of frequent patterns on streams

Intelligent Data Analysis - Knowlegde Discovery from Data Streams
Customer analytics projects: addressing existing problems with a process that leads to success

AusDM '07 Proceedings of the sixth Australasian conference on Data mining and analytics - Volume 70
Prototype Proliferation in the Growing Neural Gas Algorithm

ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part II
New results for finding common neighborhoods in massive graphs in the data stream model

Theoretical Computer Science
A scalable framework for cluster ensembles

Pattern Recognition
ODMCA: An adaptive data mining control algorithm in multicarrier networks

Computer Communications
Extending fuzzy and probabilistic clustering to very large data sets

Computational Statistics & Data Analysis
An efficient parallel and distributed algorithm for counting frequent sets

VECPAR'02 Proceedings of the 5th international conference on High performance computing for computational science
Classification of software artifacts based on structural information

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part IV
Distributed Multi-Feature Recognition Scheme for Greyscale Images

Neural Processing Letters
A single-pass online data mining algorithm combined with control theory with limited memory in dynamic data streams

GCC'05 Proceedings of the 4th international conference on Grid and Cooperative Computing
An efficient distributed algorithm for mining association rules

ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Data mining for diagnostic debugging in sensor networks: preliminary evidence and lessons learned

Sensor-KDD'08 Proceedings of the Second international conference on Knowledge Discovery from Sensor Data
An Efficient Method for Discretizing Continuous Attributes

International Journal of Data Warehousing and Mining

Quantified Score

Hi-index	4.10

Visualization

Abstract

Established companies have had decades to accumulate masses of data about their customers, suppliers, products and services, and employees. Data mining, also known as knowledge discovery in databases, gives organizations the tools to sift through these vast data stores to find the trends, patterns, and correlations that can guide strategic decision making. Traditionally, algorithms for data analysis assume that the input data contains relatively few records. Current databases, however, are much too large to be held in main memory. To be efficient, the data-mining techniques applied to very large databases must be highly scalable. An algorithm is said to be scalable if--given a fixed amount of main memory--its runtime increases linearly with the number of records in the input database. Recent work has focused on scaling data-mining algorithms to very large data sets. In this survey, the authors describe a broad range of algorithms that address three classical data-mining problems: market basket analysis, clustering, and classification.