Corpusexplorer: supporting a deeper understanding of linguistic corpora

  • Authors:
  • Andrés Esteban;Roberto Therón

  • Affiliations:
  • University of Salamanca, Spain;University of Salamanca, Spain

  • Venue:
  • SG'11 Proceedings of the 11th international conference on Smart graphics
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Word trees are a common way of representing frequency information obtained by analyzing natural language data. This article explores their usage and possibilities, and addresses the development of an application to visualize the relative frequencies of 2-grams and 3-grams in Google's "English One Million" corpus using a two-sided word tree and sparklines to show usage trends through time. It also discusses how the raw data was processed and trimmed to speed up access to it.