Categorization and analysis of text in computer mediated communication archives using visualization

  • Authors:
  • Ahmed Abbasi;Hsinchun Chen

  • Affiliations:
  • University of Arizona, Tucson, AZ;University of Arizona, Tucson, AZ

  • Venue:
  • Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Digital libraries (DLs) for online discourse contain large amounts of valuable information that is difficult to navigate and analyze. Visualization systems developed to facilitate improved CMC archive analysis and navigation primarily focus on interaction information, with little emphasis on textual content. In this paper we present a system that provides DL exploration services such as visualization, categorization, and analysis for CMC text. The system incorporates an extended feature set comprised of stylistic, topical, and sentiment related features to enable richer content representation. The system also includes the Ink Blot technique which utilizes decision tree models and text overlay to visualize CMC messages. Ink Blots can be used for text categorization and analysis across forums, authors, threads, messages, and over time. The proposed system's analysis capabilities were evaluated with a series of examples and a qualitative user study. Empirical categorization experiments comparing the Ink Blot technique against a benchmark support vector machine classifier were also conducted. The results demonstrated the efficacy of the Ink Blot technique for text categorization and also highlighted the effectiveness of the extended feature set for improved text categorization.