Graph Theoretic and Spectral Analysis of Enron Email Data
Computational & Mathematical Organization Theory
Proceedings of the 2006 international workshop on Mining software repositories
Towards Large Scale Semantic Annotation Built on MapReduce Architecture
ICCS '08 Proceedings of the 8th international conference on Computational Science, Part III
Pregel: a system for large-scale graph processing - "ABSTRACT"
Proceedings of the 28th ACM symposium on Principles of distributed computing
Power-Law Distributions in Empirical Data
SIAM Review
WikiPop: personalized event detection system based on Wikipedia page view statistics
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
SGDB: simple graph database optimized for activation spreading computation
DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
Use of E-mail Social Networks for Enterprise Benefit
WI-IAT '10 Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Email Social Network Extraction and Search
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
Benchmarking Traversal Operations over Graph Databases
ICDEW '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering Workshops
Hi-index | 0.00 |
In this paper, we present an approach for representing an email archive in the form of a network, capturing the communication among users and relations among the entities extracted from the textual part of the email messages. We showcase the method on the Enron email corpus, from which we extract various entities and a social network. The extracted named entities (NE), such as people, email addresses and telephone numbers, are organized in a graph along with the emails in which they were found. The edges in the graph indicate relations between NEs and represent a co-occurrence in the same email part, paragraph, sentence or a composite NE. We study mathematical properties of the graphs so created and describe our hands-on experience with the processing of such structures. Enron Graph corpus contains a few million nodes and is large enough for experimenting with various graph-querying techniques, e.g. graph traversal or spread of activation. Due to its size, the exploitation of traditional graph processing libraries might be problematic as they keep the whole structure in the memory. We describe our experience with the management of such data and with the relation discovery among the extracted entities. The described experience might be valuable for practitioners and highlights several research challenges.