Recursive data mining for role identification in electronic communications

  • Authors:
  • Vineet Chaoji;Apirak Hoonlor;Boleslaw K. Szymanski

  • Affiliations:
  • Rensselaer Polytechnic Institute, Troy, NY, USA;Rensselaer Polytechnic Institute, Troy, NY, USA;(Correspd. Tel.: +1 518 276 2714/ Fax: +1 518 276 4033/ E-mail: szymansk@cs.rpi.edu) Rensselaer Polytechnic Institute, Troy, NY, USA

  • Venue:
  • International Journal of Hybrid Intelligent Systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a text mining approach that discovers patterns at varying degrees of abstraction in a hierarchical fashion. The approach allows for certain degree of approximation in matching patterns, which is necessary to capture non-trivial features in realistic datasets. Due to its nature, we call this approach Recursive Data Mining (RDM). We demonstrate a novel application of RDM to role identification in electronic communications. We use a hybrid approach in which the RDM discovered patterns are used as features to build efficient classifiers. Since we want to recognize a group of authors communicating in a specific role within an Internet community, the challenge is to recognize possibly different roles of an author within different communication communities. Moreover, each individual exchange in electronic communications is typically short, making the standard text mining approaches less efficient than in other applications. An example of such a problem is recognizing roles in a collection of emails from an organization in which middle level managers communicate both with superiors and subordinates. To validate our approach we use the Enron dataset which is such a collection. The results show that a classifier that uses the dominant patterns discovered by Recursive Data Mining performs well in role identification.