Emails as graph: relation discovery in email archive

  • Authors:
  • Michal Laclavík;Štefan Dlugolinský;Martin Šeleng;Marek Ciglan;Ladislav Hluchý

  • Affiliations:
  • Institute of Informatics, Slovak Academy of Sciences, Bratislava, Slovakia;Institute of Informatics, Slovak Academy of Sciences, Bratislava, Slovakia;Institute of Informatics, Slovak Academy of Sciences, Bratislava, Slovakia;Institute of Informatics, Slovak Academy of Sciences, Bratislava, Slovakia;Institute of Informatics, Slovak Academy of Sciences, Bratislava, Slovakia

  • Venue:
  • Proceedings of the 21st international conference companion on World Wide Web
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present an approach for representing an email archive in the form of a network, capturing the communication among users and relations among the entities extracted from the textual part of the email messages. We showcase the method on the Enron email corpus, from which we extract various entities and a social network. The extracted named entities (NE), such as people, email addresses and telephone numbers, are organized in a graph along with the emails in which they were found. The edges in the graph indicate relations between NEs and represent a co-occurrence in the same email part, paragraph, sentence or a composite NE. We study mathematical properties of the graphs so created and describe our hands-on experience with the processing of such structures. Enron Graph corpus contains a few million nodes and is large enough for experimenting with various graph-querying techniques, e.g. graph traversal or spread of activation. Due to its size, the exploitation of traditional graph processing libraries might be problematic as they keep the whole structure in the memory. We describe our experience with the management of such data and with the relation discovery among the extracted entities. The described experience might be valuable for practitioners and highlights several research challenges.