Topic modeling for OLAP on multidimensional text databases: topic cube and its applications

Authors:
Duo Zhang;ChengXiang Zhai;Jiawei Han;Ashok Srivastava;Nikunj Oza
Affiliations:
Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA;Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA;Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA;Intelligent Systems Division, NASA Ames Research Center, Moffett Field, California, USA;Intelligent Systems Division, NASA Ames Research Center, Moffett Field, California, USA
Venue:
Statistical Analysis and Data Mining - Best of SDM'09
Year:
2009

Citing 0
Cited 11

iNextCube: information network-enhanced text cube

Proceedings of the VLDB Endowment
Unsupervised public health event detection for epidemic intelligence

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Extracting dimensions for OLAP on multidimensional text databases

WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
Detecting health events on the social web to enable epidemic intelligence

SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Data-to-model: a mixed initiative approach for rapid ethnographic assessment

Computational & Mathematical Organization Theory
Exploring and analyzing documents with OLAP

Proceedings of the 5th Ph.D. workshop on Information and knowledge
Of cubes, DAGs and hierarchical correlations: a novel conceptual model for analyzing social media data

ER'12 Proceedings of the 31st international conference on Conceptual Modeling
Mining evolutionary multi-branch trees from text streams

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
EventCube: multi-dimensional search and mining of structured and text data

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Social microblogging cube

Proceedings of the sixteenth international workshop on Data warehousing and OLAP
CXT-cube: contextual text cube model and aggregation operator for text OLAP

Proceedings of the sixteenth international workshop on Data warehousing and OLAP

Quantified Score

Hi-index	0.00

Visualization

Abstract

As the amount of textual information grows explosively in various kinds of business systems, it becomes more and more desirable to analyze both structured data records and unstructured text data simultaneously. Although online analytical processing (OLAP) techniques have been proven very useful for analyzing and mining structured data, they face challenges in handling text data. On the other hand, probabilistic topic models are among the most effective approaches to latent topic analysis and mining on text data. In this paper, we study a new data model called topic cube to combine OLAP with probabilistic topic modeling and enable OLAP on the dimension of text data in a multidimensional text database. Topic cube extends the traditional data cube to cope with a topic hierarchy and stores probabilistic content measures of text documents learned through a probabilistic topic model. To materialize topic cubes efficiently, we propose two heuristic aggregations to speed up the iterative Expectation-Maximization (EM) algorithm for estimating topic models by leveraging the models learned on component data cells to choose a good starting point for iteration. Experimental results show that these heuristic aggregations are much faster than the baseline method of computing each topic cube from scratch. We also discuss some potential uses of topic cube and show sample experimental results. Copyright © 2009 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 2: 378-395, 2009