Exploration of Dimensionality Reduction for Text Visualization

  • Authors:
  • Shiping Huang;Matthew O. Ward;Elke A. Rundensteiner

  • Affiliations:
  • Worcester Polytechnic Institute;Worcester Polytechnic Institute;Worcester Polytechnic Institute

  • Venue:
  • CMV '05 Proceedings of the Coordinated and Multiple Views in Exploratory Visualization
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the text document visualization community, statistical analysis tools (e.g., principal component analysis and multi- dimensional scaling) and neurocomputationmodels (e.g., self-organizing feature maps) have been widely used fordimensionality reduction. Often the resulting dimensionality is set to two, as this facilitates plotting the results. The validity and effectiveness of these approaches largely depend on thespecific data sets used and semantics of the targeted applications.To date, there has been little evaluation to assess and comparedimensionality reduction methods and dimensionality reductionprocesses, either numerically or empirically. The focus of thispaper is to propose a mechanism for comparing and evaluatingthe effectiveness of dimensionality reduction techniques in thevisual exploration of text document archives. We use multivariatevisualization techniques and interactive visual exploration to study three problems: (a) Which dimensionality reduction technique best preserves the interrelationships within a set of text documents;(b) What is the sensitivity of the results to the number of outputdimensions; (c) Can we automatically remove redundant or unimportantwords from the vector extracted from the documents while stillpreserving the majority of information, and thus make dimensionalityreduction more efficient. To study each problem, we generate supplemental dimensions based on several dimensionality reduction algorithms and parameters controlling these algorithms. We then visually analyze and explore the characteristics of the reduced dimensional spaces as implemented within a linked,multi-view multi-dimensional visual exploration tool, XmdvTool.We compare the derived dimensions to features known to bepresent in the original data. Quantitative measures are also used inidentifying the quality of results using different numbers of outputdimensions.