Emails as graph: relation discovery in email archive

Authors:
Michal Laclavík;Štefan Dlugolinský;Martin Šeleng;Marek Ciglan;Ladislav Hluchý
Affiliations:
Institute of Informatics, Slovak Academy of Sciences, Bratislava, Slovakia;Institute of Informatics, Slovak Academy of Sciences, Bratislava, Slovakia;Institute of Informatics, Slovak Academy of Sciences, Bratislava, Slovakia;Institute of Informatics, Slovak Academy of Sciences, Bratislava, Slovakia;Institute of Informatics, Slovak Academy of Sciences, Bratislava, Slovakia
Venue:
Proceedings of the 21st international conference companion on World Wide Web
Year:
2012

Citing 10
Cited 0

Graph Theoretic and Spectral Analysis of Enron Email Data

Computational & Mathematical Organization Theory
Mining email social networks

Proceedings of the 2006 international workshop on Mining software repositories
Towards Large Scale Semantic Annotation Built on MapReduce Architecture

ICCS '08 Proceedings of the 8th international conference on Computational Science, Part III
Pregel: a system for large-scale graph processing - "ABSTRACT"

Proceedings of the 28th ACM symposium on Principles of distributed computing
Power-Law Distributions in Empirical Data

SIAM Review
WikiPop: personalized event detection system based on Wikipedia page view statistics

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
SGDB: simple graph database optimized for activation spreading computation

DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
Use of E-mail Social Networks for Enterprise Benefit

WI-IAT '10 Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Email Social Network Extraction and Search

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
Benchmarking Traversal Operations over Graph Databases

ICDEW '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering Workshops

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present an approach for representing an email archive in the form of a network, capturing the communication among users and relations among the entities extracted from the textual part of the email messages. We showcase the method on the Enron email corpus, from which we extract various entities and a social network. The extracted named entities (NE), such as people, email addresses and telephone numbers, are organized in a graph along with the emails in which they were found. The edges in the graph indicate relations between NEs and represent a co-occurrence in the same email part, paragraph, sentence or a composite NE. We study mathematical properties of the graphs so created and describe our hands-on experience with the processing of such structures. Enron Graph corpus contains a few million nodes and is large enough for experimenting with various graph-querying techniques, e.g. graph traversal or spread of activation. Due to its size, the exploitation of traditional graph processing libraries might be problematic as they keep the whole structure in the memory. We describe our experience with the management of such data and with the relation discovery among the extracted entities. The described experience might be valuable for practitioners and highlights several research challenges.