Exploring and analyzing documents with OLAP

Authors:
Grzegorz Drzadzewski;Frank Wm Tompa
Affiliations:
University of Waterloo, Waterloo, ON, Canada;University of Waterloo, Waterloo, ON, Canada
Venue:
Proceedings of the 5th Ph.D. workshop on Information and knowledge
Year:
2012

Citing 12
Cited 0

The use of MMR, diversity-based reranking for reordering documents and producing summaries

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The term vector database: fast access to indexing terms for Web pages

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
BlogScope: a system for online analysis of high volume text streams

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Introduction to Information Retrieval

Introduction to Information Retrieval
Text Visualization for Visual Text Analytics

Visual Data Mining
Top_Keyword: An Aggregation Function for Textual Document OLAP

DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Text Cube: Computing IR Measures for Multidimensional Text Database Analysis

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Interactive, topic-based visual text summarization and analysis

Proceedings of the 18th ACM conference on Information and knowledge management
Topic modeling for OLAP on multidimensional text databases: topic cube and its applications

Statistical Analysis and Data Mining - Best of SDM'09
TIARA: a visual exploratory text analytic system

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Visual cube and on-line analytical processing of images

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Supervised principal component analysis: Visualization, classification and regression on subspaces and submanifolds

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

When faced with a document collection of substantial size, it is difficult for users to explore and analyze the information contained in it. Tagging has been used to improve the organization of documents in a collection, but it has various limitations. We propose to improve the analysis and exploration of tagged document collections by organizing the documents into clusters and allowing users to perform online analytical processing on the clusters. However, supporting OLAP on clusters of documents poses various challenges that need to be addressed. These challenges include providing efficient representations for cluster centroids and document positions inside the clusters, dealing with overlapping clusters, efficient and accurate aggregation of clusters, providing functionality for helping users find representative documents for a cluster, and determining the strength of relationship between clusters.