Discovery and regeneration of hidden emails

Authors:
Giuseppe Carenini;Raymond Ng;Xiaodong Zhou;Ed Zwart
Affiliations:
University of British Columbia, Vancouver, Canada;University of British Columbia, Vancouver, Canada;University of British Columbia, Vancouver, Canada;University of British Columbia, Vancouver, Canada
Venue:
Proceedings of the 2005 ACM symposium on Applied computing
Year:
2005

Citing 11
Cited 2

Approximating clique and biclique problems

Journal of Algorithms
Approximating the minimum equivalent digraph

SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Theoretical Improvements in Algorithmic Efficiency for Network Flow Problems

Journal of the ACM (JACM)
Reinventing the inbox: supporting the management of pending tasks in email

CHI '02 Extended Abstracts on Human Factors in Computing Systems
On bipartite and multipartite clique problems

Journal of Algorithms
Exploring discussion lists: steps and directions

Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Mining e-mail content for author identification forensics

ACM SIGMOD Record
Mining newsgroups using networks arising from social behavior

WWW '03 Proceedings of the 12th international conference on World Wide Web
Email classification for contact centers

Proceedings of the 2003 ACM symposium on Applied computing
Automatic Reassembly of Document Fragments via Context Based Statistical Models

ACSAC '03 Proceedings of the 19th Annual Computer Security Applications Conference
The maximum edge biclique problem is NP-complete

Discrete Applied Mathematics

Scalable discovery of hidden emails from large folders

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Summarizing email conversations with clue words

Proceedings of the 16th international conference on World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

The popularity of email has triggered researchers to look for ways to help users better organize the enormous amount of information stored in their email folders. One challenge that has not been studied extensively in text mining is the reconstruction of hidden emails. A hidden email is an original email that has been quoted in subsequent emails but is not itself present in the folder; it may have been deleted or may never have been received. This paper proposes a method for reconstructing hidden emails using the embedded quotations found in messages further down the thread hierarchy. To do so, we model all the quoted fragments in a precedence graph, from which hidden emails are regenerated as bulletized documents. The bulletized model is our solution to the situation when a total ordering of fragment is not possible. We give a necessary and sufficient condition for each component of the precedence graph to be captured in a single bulletized email, and we develop heuristics that minimize the number of regenerated emails when the condition is not met. Finally, we present empirical results showing the scalability of our approach.