Discovery of email communication networks from the Enron corpus with a genetic algorithm using social network analysis

  • Authors:
  • Garnett Wilson;Wolfgang Banzhaf

  • Affiliations:
  • Department of Computer Science, Memorial University of Newfoundland, St. John's, NL, Canada;Department of Computer Science, Memorial University of Newfoundland, St. John's, NL, Canada

  • Venue:
  • CEC'09 Proceedings of the Eleventh conference on Congress on Evolutionary Computation
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

During the legal investigation of Enron Corporation, the U.S. Federal Regulatory Commission (FERC) made public a substantial data set of the company's internal corporate emails. This work presents a genetic algorithm (GA) approach to social network analysis (SNA) using the Enron corpus. Three SNA metrics, degree, density, and proximity prestige, were applied to the detection of networks with high email activity and presence of important actors with respect to email transactions. Quantitative analysis revealed that density and proximity prestige captured networks of high activity more so than degree. Subsequent qualitative analysis indicated that there were trade-offs in the selection of SNA metrics. Examination of the discovered social networks showed that density and proximity prestige isolated networks involving key actors to a greater extent than degree. In particular, density picked out interesting patterns in terms of email volume, while proximity prestige better isolated key actors at Enron. The roles of the particular actors picked out by the networks as reasons for their prominence are also discussed.