Structure in the Enron Email Dataset

Authors:
P. S. Keila;D. B. Skillicorn
Affiliations:
School of Computing, Queen's University, Kingston, Canada K7L 3N6;School of Computing, Queen's University, Kingston, Canada K7L 3N6
Venue:
Computational & Mathematical Organization Theory
Year:
2005

Citing 5
Cited 11

Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
A semidiscrete matrix decomposition for latent semantic indexing information retrieval

ACM Transactions on Information Systems (TOIS)
Algorithm 805: computation and uses of the semidiscrete matrix decomposition

ACM Transactions on Mathematical Software (TOMS)
Discovery of implicit and explicit connections between people using email utterance

ECSCW'03 Proceedings of the eighth conference on European Conference on Computer Supported Cooperative Work
Beyond keyword filtering for message and conversation detection

ISI'05 Proceedings of the 2005 IEEE international conference on Intelligence and Security Informatics

Improving a textual deception detection model

CASCON '06 Proceedings of the 2006 conference of the Center for Advanced Studies on Collaborative research
Automated social hierarchy detection through email network analysis

Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis
Detecting Invisible Relevant Persons in a Homogeneous Social Network

ICCS '07 Proceedings of the 7th international conference on Computational Science, Part IV: ICCS 2007
Segmentation and Automated Social Hierarchy Detection through Email Network Analysis

Advances in Web Mining and Web Usage Analysis
Discovery of email communication networks from the Enron corpus with a genetic algorithm using social network analysis

CEC'09 Proceedings of the Eleventh conference on Congress on Evolutionary Computation
Catalyst personality for fostering communication among groups with opposing preference

IEA/AIE'07 Proceedings of the 20th international conference on Industrial, engineering, and other applications of applied intelligent systems
Annotation scheme for social network extraction from text

LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
Extracting semantic user networks from informal communication exchanges

ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part I
"I don't know where he is not": does deception research yet offer a basis for deception detectives?

EACL 2012 Proceedings of the Workshop on Computational Approaches to Deception Detection
Deception detection for the tangled web

ACM SIGCAS Computers and Society
What kind of network are you?: using local and global characteristics in network categorisation tasks

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigate the structures present in the Enron email dataset using singular value decomposition and semidiscrete decomposition. Using word frequency profiles, we show that messages fall into two distinct groups, whose extrema are characterized by short messages and rare words versus long messages and common words. It is surprising that length of message and word use pattern should be related in this way. We also investigate relationships among individuals based on their patterns of word use in email. We show that word use is correlated to function within the organization, as expected. Lastly, we show that relative changes to individuals' word usage over time can be used to identify key players in major company events.