Effective Management of Hierarchical Storage Using Two Levels of Data Clustering

Authors:
Ratko Orlandic
Affiliations:
-
Venue:
MSS '03 Proceedings of the 20 th IEEE/11 th NASA Goddard Conference on Mass Storage Systems and Technologies (MSS'03)
Year:
2003

Citing 17
Cited 3

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
The pyramid-technique: towards breaking the curse of dimensionality

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Data clustering: a review

ACM Computing Surveys (CSUR)
Indexing the edges—a simple and yet efficient approach to high-dimensional indexing

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases

ACM Computing Surveys (CSUR)
The design of a retrieval technique for high-dimensional data on tertiary storage

ACM SIGMOD Record
The K-D-B-tree: a search structure for large multidimensional dynamic indexes

SIGMOD '81 Proceedings of the 1981 ACM SIGMOD international conference on Management of data
A retrieval technique for high-dimensional data and partially specified queries

Data & Knowledge Engineering
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Trading Quality for Time with Nearest Neighbor Search

EDBT '00 Proceedings of the 7th International Conference on Extending Database Technology: Advances in Database Technology
Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
STING: A Statistical Information Grid Approach to Spatial Data Mining

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Coordinating Simultaneous Caching of File Bundles from Tertiary Storage

SSDBM '00 Proceedings of the 12th International Conference on Scientific and Statistical Database Management
The Hybrid Tree: An Index Structure for High Dimensional Feature Spaces

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Clustering High Dimensional Massive Scientific Datasets

SSDBM '01 Proceedings of the 13th International Conference on Scientific and Statistical Database Management

On accessing data in high-dimensional spaces: a comparative study of three space partitioning strategies

Journal of Systems and Software - Special issue: Performance modeling and analysis of computer systems and networks
Clustering high-dimensional data using an efficient and effective data space reduction

Proceedings of the 14th ACM international conference on Information and knowledge management
An efficient multi-tier tablet server storage architecture

Proceedings of the 2nd ACM Symposium on Cloud Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

When data resides on tertiary storage, clustering is the key to achieving high retrieval performance. However, a straightforward approach to clustering massive amounts of data on this storage requires considerable computational and storage resources that usually exceed the capabilities of even the richest super-computing centers. This paper develops a new approach to hierarchical storage management in Data Grid environments, which calls for two levels of clustering data on tertiary storage. Applying a mix of static and dynamic decisions, this approach achieves the benefits of clustering at reasonable costs. However, an effectiverealization of the approach in generic Data Grid environments requires advances in the areas of indexing and clustering large scientific data collections on tertiary storage. The paper describes some novel indexing and clustering techniques that can cope well not only with extremely large volumes but also with very high dimensionalities of scientific data. The basic principles of a new clustering technique for large volumes of multi-dimensional data are introduced in the paper for the first time.