Parallel processing for stepwise generalisation method on multi-core PC cluster

Authors:
Shinpei Yagi;Keiichi Tamura;Hajime Kitakami
Affiliations:
Graduate School of Information Sciences, Hiroshima City University, 3-4-1, Ozuka-Higashi, Asa-Minami-Ku, Hiroshima, 731-3194, Japan.;Graduate School of Information Sciences, Hiroshima City University, 3-4-1, Ozuka-Higashi, Asa-Minami-Ku, Hiroshima, 731-3194, Japan.;Graduate School of Information Sciences, Hiroshima City University, 3-4-1, Ozuka-Higashi, Asa-Minami-Ku, Hiroshima, 731-3194, Japan
Venue:
International Journal of Knowledge and Web Intelligence
Year:
2012

Citing 18
Cited 0

An improved algorithm for approximate string matching

SIAM Journal on Computing
A fast bit-vector algorithm for approximate string matching based on dynamic programming

Journal of the ACM (JACM)
An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Spelling Approximate Repeated or Common Motifs Using a Suffix Tree

LATIN '98 Proceedings of the Third Latin American Symposium on Theoretical Informatics
Efficient Searches for Similar Subsequences of Different Lengths in Sequence Databases

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Indexing mixed types for approximate retrieval

VLDB '05 Proceedings of the 31st international conference on Very large data bases
A New Indexing Method for Approximate Search in Text Databases

CIT '05 Proceedings of the The Fifth International Conference on Computer and Information Technology
Understanding the Impact of Multi-Core Architecture in Cluster Computing: A Case Study with Intel Dual-Core System

CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Extraction of Ambiguous Sequential Patterns with Least Minimum Generalization from Mismatch Clusters

SITIS '07 Proceedings of the 2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System
Parallel Data Mining on Multicore Clusters

GCC '08 Proceedings of the 2008 Seventh International Conference on Grid and Cooperative Computing
Parallel Skyline Computation on Multicore Architectures

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Dependency-aware reordering for parallelizing query optimization in multi-core CPUs

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Performance Issues in Parallelizing Data-Intensive Applications on a Multi-core Cluster

CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Sort vs. Hash revisited: fast join implementation on modern multi-core CPUs

Proceedings of the VLDB Endowment
MCC-DB: minimizing cache conflicts in multi-core processors for databases

Proceedings of the VLDB Endowment
A new dynamic load balancing technique for parallel modified PrefixSpan with distributed worker paradigm and its performance evaluation

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Strategies for preparing computer science students for the multicore world

Proceedings of the 2010 ITiCSE working group reports

Quantified Score

Hi-index	0.00

Visualization

Abstract

An approximate query, which is an approximate pattern matching in sequence databases, is one of the most important techniques for many different areas, such as computational biology, text mining, web intelligence and pattern recognition; it returns many similar sub-sequences. In this paper, we refer to a set of such similar sub-sequences as a mismatch cluster. To support users who execute an approximate query on a sequence database to find the regularities of approximate patterns that similar to the query pattern, we have developed the stepwise generalisation method that extracts a reduced expression, called a minimum generalised set, from a mismatch cluster. This paper proposes a novel parallelisation model with a hierarchical task pool for the parallel processing of the stepwise generalisation method on a multi-core PC cluster. To manage tasks efficiently on multi-core CPUs, the proposed model uses the hierarchical task pool and an efficient hierarchical dynamic load balancing technique. We evaluate the proposed method using real protein sequences on an actual multi-core PC cluster. Experimental results confirm that the proposed method performs well on multi-core CPUs and on a multi-core PC cluster.