Of cubes, DAGs and hierarchical correlations: a novel conceptual model for analyzing social media data

  • Authors:
  • Umeshwar Dayal;Chetan Gupta;Malu Castellanos;Song Wang;Manolo Garcia-Solaco

  • Affiliations:
  • Hewlett Packard Labs;Hewlett Packard Labs;Hewlett Packard Labs;Hewlett Packard Labs;Hewlett Packard Labs

  • Venue:
  • ER'12 Proceedings of the 31st international conference on Conceptual Modeling
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the advent of social media there is an ever increasing amount of unstructured data that can be analyzed to obtain insights. Two prominent examples are sentiment analysis and the discovery of correlated concepts. A convenient representation of information in such scenarios is in terms of concepts extracted from the unstructured data, and measures, such as sentiment scores, associated with these concepts. Typically, social media analysis reports these concepts and their associated measures. We argue that much richer insights can be obtained through the use of OLAP-style multidimensional analysis. It is fairly straightforward to see how to add traditional dimension hierarchies such as time and geography, and to analyze the data along these dimensions using traditional OLAP operations such as roll-up; for instance, to answer queries of the form "What was the average sentiment for X in Europe during the past month?" However, it is trickier to answer queries of the form "What was the average sentiment for concepts related to X in Europe during the past month?" We introduce a conceptual modeling framework that extends traditional multidimensional models and OLAP operators to address the new set of requirements for data extracted from social media. In this model, we organize data along both traditional dimensions (we call these metadata dimensions) and concept dimensions, which model relationships among concepts using parent-child hierarchies. Specifically: (i) we allow operations on parent-child hierarchies to be treated in a uniform way as operations on traditional dimension hierarchies; (ii) to model the rich relationships that can exist among concepts, we extend the parent-child hierarchies to be rooted level-DAGs rather than simply trees; and (iii) we introduce new equivalence classes that allow us to reason with "similar" concepts in new ways. We show that our modeling and operator framework facilitates multidimensional analysis to gain further insights from social media data than is possible with existing methods.