Exploring and analyzing documents with OLAP

  • Authors:
  • Grzegorz Drzadzewski;Frank Wm Tompa

  • Affiliations:
  • University of Waterloo, Waterloo, ON, Canada;University of Waterloo, Waterloo, ON, Canada

  • Venue:
  • Proceedings of the 5th Ph.D. workshop on Information and knowledge
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

When faced with a document collection of substantial size, it is difficult for users to explore and analyze the information contained in it. Tagging has been used to improve the organization of documents in a collection, but it has various limitations. We propose to improve the analysis and exploration of tagged document collections by organizing the documents into clusters and allowing users to perform online analytical processing on the clusters. However, supporting OLAP on clusters of documents poses various challenges that need to be addressed. These challenges include providing efficient representations for cluster centroids and document positions inside the clusters, dealing with overlapping clusters, efficient and accurate aggregation of clusters, providing functionality for helping users find representative documents for a cluster, and determining the strength of relationship between clusters.