Exemplar-based Visualization of Large Document Corpus (InfoVis2009-1115)

Authors:
Yanhua Chen;Lijun Wang;Ming Dong;Jing Hua
Affiliations:
Wayne State University, Detroit, MI;Wayne State University, Detroit, MI;Wayne State University, Detroit, MI;Wayne State University, Detroit, MI
Venue:
IEEE Transactions on Visualization and Computer Graphics
Year:
2009

Citing 0
Cited 7

TIARA: a visual exploratory text analytic system

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
DClusterE: A Framework for Evaluating and Understanding Document Clustering Using Visualization

ACM Transactions on Intelligent Systems and Technology (TIST)
TIARA: Interactive, Topic-Based Visual Text Summarization and Analysis

ACM Transactions on Intelligent Systems and Technology (TIST)
Visual comparison for information visualization

Information Visualization - Special issue on State of the Field and New Research Directions
Piecewise laplacian-based projection for interactive data exploration and organization

EuroVis'11 Proceedings of the 13th Eurographics / IEEE - VGTC conference on Visualization
Multifaceted visual analytics for healthcare applications

IBM Journal of Research and Development
Technical Section: EXOD: A tool for building and exploring a large graph of open datasets

Computers and Graphics

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the rapid growth of the World Wide Web and electronic information services,text corpus is becoming available on-line at an incredible rate.By displaying text data in a logical layout (e.g., color graphs),text visualization presents a direct way to observe the documentsas well as understand the relationship between them.In this paper, we propose a novel technique, Exemplar-based Visualization (EV), to visualizean extremely large text corpus. Capitalizing on recent advances in matrixapproximation and decomposition, EV presents a probabilistic multidimensional projection modelin the low-rank text subspace with a sound objective function. The probability of each document proportion to the topics is obtained through iterative optimization andembedded to a low dimensional space using parameter embedding.By selecting the representative exemplars, we obtain a compactapproximation of the data. This makes the visualization highly efficient and flexible. In addition, the selected exemplars neatly summarize the entire data set and greatly reduce the cognitiveoverload in the visualization, leading to an easier interpretation oflarge text corpus. Empirically, we demonstrate the superior performance of EVthrough extensive experiments performed on the publicly available text data sets.