Proceedings of the sixteenth international conference on Very large databases
Tradeoffs in processing complex join queries via hashing in multiprocessor database machines
Proceedings of the sixteenth international conference on Very large databases
Parallel database systems: the future of high performance database systems
Communications of the ACM
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
High-performance sorting on networks of workstations
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
ACM Transactions on Database Systems (TODS)
Hash based parallel algorithms for mining association rules
DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Benchmark Handbook: For Database and Transaction Processing Systems
Benchmark Handbook: For Database and Transaction Processing Systems
Query Execution for Large Relations on Functional Disk Systems
Proceedings of the Fifth International Conference on Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Monet And Its Geographic Extensions: A Novel Approach to High Performance GIS Processing
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Communication overhead for space science applications on the Beowulf parallel workstation
HPDC '95 Proceedings of the 4th IEEE International Symposium on High Performance Distributed Computing
Commodity Clusters: Performance Comparison Between PC's and Workstations
HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Dynamic remote memory acquisition for parallel data mining on ATM-connected PC cluster
ICS '99 Proceedings of the 13th international conference on Supercomputing
Towards self-tuning data placement in parallel database systems
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Query optimization for vector space problems
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Web mining and its SQL based parallel execution
ITVE '01 Proceedings of the workshop on Information technology for virtual enterprises
Web community mining and web log mining: commodity cluster based execution
ADC '02 Proceedings of the 13th Australasian database conference - Volume 5
Web Log Mining and Parallel SQL Based Execution
DNIS '00 Proceedings of the International Workshop on Databases in Networked Information Systems
HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
Parallel Data Mining on ATM-Connected PC Cluster and Optimization of Its Execution Environments
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
A PC-NOW Based Parallel Extension for a Sequential DBMS
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Parallel Data Mining on Large Scale PC Cluster
WAIM '00 Proceedings of the First International Conference on Web-Age Information Management
Dynamic Load Balancing for Parallel Association Rule Mining on Heterogenous PC Cluster Systems
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
OLAP Query Evaluation in a Database Cluster: A Performance Study on Intra-Query Parallelism
ADBIS '02 Proceedings of the 6th East European Conference on Advances in Databases and Information Systems
Mining Generalized Association Rule Using Parallel RDB Engine on PC Cluster
DaWaK '99 Proceedings of the First International Conference on Data Warehousing and Knowledge Discovery
PAKDD '99 Proceedings of the Third Pacific-Asia Conference on Methodologies for Knowledge Discovery and Data Mining
Performance Analysis for Parallel Generalized Association Rule Mining on a Large Scale PC Cluster
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Parallel Generalized Association Rule Mining on Large Scale PC Cluster
Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
DEXA '00 Proceedings of the 11th International Conference on Database and Expert Systems Applications
Pipelined operator tree scheduling in heterogeneous environments
Journal of Parallel and Distributed Computing
Practical Divisible Load Scheduling on Grid Platforms with APST-DV
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Multiround Algorithms for Scheduling Divisible Loads
IEEE Transactions on Parallel and Distributed Systems
Research works on cluster computing and storage area network
Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication
Exploiting programmable network interfaces for parallel query execution in workstation clusters
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Performance analysis of a parallel sort merge join on cluster architectures
ICA3PP'05 Proceedings of the 6th international conference on Algorithms and Architectures for Parallel Processing
Hi-index | 0.00 |
We developed a PC cluster system consists of 100 PCs. Each PC employs the 200MHz Pentium Pro CPU and is connected with others through an ATM switch. We picked up two kinds of data intensive applications. One is decision support query processing. And the other is data mining, specifically, association rule mining.As a high speed network, ATM technology has recently come to be a de facto standard. While other high performance network standards are also available, ATM networks are widely used from local area to widely distributed environments. One of the problems of the ATM networks is its high latencies, in contrast to their higher bandwidths. This is usually considered a serious flaw of ATM in composing high performance massively parallel processors. However, applications such as large scale database analyses are insensitive to the communication latency, requiring only the bandwidth.On the other hand, the performance of personal computers is increasing rapidly these days while the prices of PCs continue to fall at a much faster rate than workstations'. The 200MHz Pentium Pro CPU is competitive in integer performance to the processor chips found in workstations. Although it is still weak at floating point operations, they are not frequently used in database applications.Thus, by combining PCs and ATM switches we can construct a large scale parallel platform very easily and very inexpensively. In this paper, we examine how such a system can help the data warehouse processing, which currently runs on expensive high-end mainframes and/or workstation servers.In our first experiment, we used the most complex query of the standard benchmark, TPC-D, on a 100 GB database to evaluate the system compared with commercial parallel systems. Our PC cluster exhibited much higher performance compared with those in current TPC benchmark reports. Second, we parallelized association rule mining and ran large scale data mining on the PC cluster. Sufficiently high linearity was obtained. Thus we believe that such commodity based PC clusters will play a very important role in large scale database processing.