Linguistic Ethnography: Identifying Dominant Word Classes in Text

  • Authors:
  • Rada Mihalcea;Stephen Pulman

  • Affiliations:
  • Computer Science Department, University of North Texas, and Computational Linguistics Group, Oxford University,;Computational Linguistics Group, Oxford University,

  • Venue:
  • CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.02

Visualization

Abstract

In this paper, we propose a method for "linguistic ethnography" --- a general mechanism for characterising texts with respect to the dominance of certain classes of words. Using humour as a case study, we explore the automatic learning of salient word classes, including semantic classes (e.g., person, animal), psycholinguistic classes (e.g., tentative, cause), and affective load (e.g., anger, happiness). We measure the reliability of the derived word classes and their associated dominance scores by showing significant correlation across different corpora.