Covering a simple orthogonal polygon with a minimum number of orthogonally convex polygons
SCG '87 Proceedings of the third annual symposium on Computational geometry
Inferring decision trees using the minimum description length principle
Information and Computation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Bottom-up computation of sparse and Iceberg CUBE
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Iceberg-cube computation with PC clusters
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
On Optimal Node Splitting for R-trees
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Computing Iceberg Queries Efficiently
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
MDL learning of unions of simple pattern languages from positive examples
EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
Context models in the MDL framework
DCC '95 Proceedings of the Conference on Data Compression
Concise descriptions of subsets of structured sets
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Diamond in the rough: finding Hierarchical Heavy Hitters in multi-dimensional data
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Framework and algorithms for trend analysis in massive temporal data sets
Proceedings of the thirteenth ACM international conference on Information and knowledge management
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Efficient and effective explanation of change in hierarchical summaries
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Compressing rectilinear pictures and minimizing access control lists
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Finding hierarchical heavy hitters in data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Finding hierarchical heavy hitters in streaming data
ACM Transactions on Knowledge Discovery from Data (TKDD)
Graph summarization with bounded error
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Succinct summarization of transactional databases: an overlapped hyperrectangle scheme
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Towards Data Mining Without Information on Knowledge Structure
PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
A Bipartite Graph Framework for Summarizing High-Dimensional Binary, Categorical and Numeric Data
SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Summarizing transactional databases with overlapped hyperrectangles
Data Mining and Knowledge Discovery
Handling inconsistencies in data warehouses
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
The class cover problem with boxes
Computational Geometry: Theory and Applications
Hi-index | 0.00 |
There are many applications in OLAP and data analysis where we identify regions of interest. For example, in OLAP, an analysis query involving aggregate sales performance of various products in different locations and seasons could help identify interesting cells, such as cells of a data cube having an aggregate sales higher than a threshold. While a normal answer to such a quiry merely returns all interesting cells, it may be far more informative to the user if the system return summaries or descriptions of regions formed from the identified cells. The minimum Description Length (MDL) principle is a well-known strategy for finding such region descriptions. In this paper, we propose a generalization of the MDL principle, called GMDL, and show that GMDL leads to fewer regions than MDL, and hence more concise "answers" returned to the user. The key idea is that a region may contain "don't care" cells (up to a global maximum), if these "don't care" cells help to form bigger summary regions, leading to a more concise overall summary. We study the problem of generating minimal region descriptions under the GMDL principle for two different scenarios. In the first, all dimensions of the data space are spatial. In the second scenario, all dimentions are categorial and organized in hierarchies. We propose region finding algorithms for both scenarios and evaluate their run time and compression performance using detailed experimentation. Our results show the effectiveness of the GMDL principle and the proposed algorithms.