Discovering OLAP dimensions in semi-structured data

Authors:
Svetlana Mansmann;Nafees Ur Rehman;Andreas Weiler;Marc H. Scholl
Affiliations:
University of Konstanz, Konstanz, Germany;University of Konstanz, Konstanz, Germany;University of Konstanz, Konstanz, Germany;University of Konstanz, Konstanz, Germany
Venue:
Proceedings of the fifteenth international workshop on Data warehousing and OLAP
Year:
2012

Citing 17
Cited 1

The data warehouse toolkit: practical techniques for building dimensional data warehouses

The data warehouse toolkit: practical techniques for building dimensional data warehouses
A foundation for capturing and querying complex multidimensional data

Information Systems - Data warehousing
The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling

The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling
Database Technology for Decision Support Systems

Computer
Coarse Grained Parallel On-Line Analytical Processing (OLAP) for Data Mining

ICCS '01 Proceedings of the International Conference on Computational Science-Part II
A Framework for the Classification and Description of Multidimensional Data Models

DEXA '01 Proceedings of the 12th International Conference on Database and Expert Systems Applications
A UML profile for multidimensional modeling in data warehouses

Data & Knowledge Engineering - Special issue: ER 2003
Why we twitter: understanding microblogging usage and communities

Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis
BaseX & DeepFS joint storage for filesystem and database

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Using twitter to recommend real-time topical news

Proceedings of the third ACM conference on Recommender systems
A Conceptual Model for Combining Enhanced OLAP and Data Mining Systems

NCM '09 Proceedings of the 2009 Fifth International Joint Conference on INC, IMS and IDC
Short and tweet: experiments on recommending content from information streams

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
What is Twitter, a social network or a news media?

Proceedings of the 19th international conference on World wide web
Data warehousing and analytics infrastructure at facebook

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
TwitterMonitor: trend detection over the twitter stream

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Discovering users' topics of interest on twitter: a first look

AND '10 Proceedings of the fourth workshop on Analytics for noisy unstructured text data
Tweets from Justin Bieber's heart: the dynamics of the location field in user profiles

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

DOLAP 2012 workshop summary

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the standard OLAP technology, cubes are constructed from the input data based on the available data fields and known relationships between them. Structuring the data into a set of numeric measures distributed along a set of uniformly structured dimensions may be unrealistic for applications dealing with semi-structured data. We propose to extend the capabilities of OLAP via content-driven discovery of measures and dimensional characteristics in the original dataset. New structural elements are discovered by means of data mining and other techniques and are therefore prone to changes as the underlying dataset evolves. In this work we focus on the challenge of generating, maintaining, and querying such discovered elements of the cube. We demonstrate the benefits of our approach by providing OLAP to the public stream of user-generated content of the popular microblogging service Twitter. We were able to enrich the original set by discovering dynamic characteristics such as user activity, popularity, messaging behavior, as well as classifying messages by topic, impact, origin, method of generation, etc. Application of knowledge discovery techniques coupled with human expertise enable structural enrichment of the original data beyond the scope of the existing methods for generating multidimensional models from relational or semi-structured data.