Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Simultaneous optimization and evaluation of multiple dimensional queries
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Covering rectilinear polygons with axis-parallel rectangles
STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Close Approximation of Minimum Rectangular Coverings
Proceedings of the 16th Conference on Foundations of Software Technology and Theoretical Computer Science
Approximation Algorithms for Covering Polygons with Squares and Similar Problems
RANDOM '97 Proceedings of the International Workshop on Randomization and Approximation Techniques in Computer Science
The generalized MDL approach for summarization
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Hi-index | 0.00 |
We study the problem of economical representation of subsets of structured sets, that is, sets equipped with a set cover. Given a structured set U, and a language L whose expressions define subsets of U, the problem of Minimum Description Length in L (L-MDL) is: "given a subset V of U, find a shortest string in L that defines V".We show that the simple set cover is enough to model a number of realistic database structures. We focus on two important families: hierarchical and multidimensional organizations. The former is found in the context of semistructured data such as XML, the latter in the context of statistical and OLAP databases. In the case of general OLAP databases, data organization is a mixture of multidimensionality and hierarchy, which can also be viewed naturally as a structured set. We study the complexity of the L-MDL problem in several settings, and provide an efficient algorithm for the hierarchical case.Finally, we illustrate the application of the theory to summarization of large result sets, (multi) query optimization for ROLAP queries, and XML queries.