Threading electronic mail: a preliminary study
Information Processing and Management: an International Journal - Special issue: methods and tools for the automatic construction of hypertext
Algorithms for association rule mining — a general survey and comparison
ACM SIGKDD Explorations Newsletter
The Alternating Decision Tree Learning Algorithm
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
An Algorithm for In-Core Frequent Itemset Mining on Streaming Data
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Contextual search and name disambiguation in email using graphs
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Automatically tagging email by leveraging other users' folders
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
Viewing email messages as parts of a sequence or a thread is a convenient way to quickly understand their context. Current threading techniques rely on purely syntactic methods, matching sender information, subject line, and reply/forward prefixes. As such, they are mostly limited to personal conversations. In contrast, machine-generated email, which amount, as per our experiments, to more than 60% of the overall email traffic, requires a different kind of threading that should reflect how a sequence of emails is caused by a few related user actions. For example, purchasing goods from an online store will result in a receipt or a confirmation message, which may be followed, possibly after a few days, by a shipment notification message from an express shipping service. In today's mail systems, they will not be a part of the same thread, while we believe they should. In this paper, we focus on this type of threading that we coin "causal threading". We demonstrate that, by analyzing recurring patterns over hundreds of millions of mail users, we can infer a causality relation between these two individual messages. In addition, by observing multiple causal relations over common messages, we can generate "causal threads" over a sequence of messages. The four key stages of our approach consist of: (1) identifying messages that are instances of the same email type or "template" (generated by the same machine process on the sender side) (2) building a causal graph, in which nodes correspond to email templates and edges indicate potential causal relations (3) learning a causal relation prediction function, and (4) automatically "threading" the incoming email stream. We present detailed experimental results obtained by analyzing the inboxes of 12.5 million Yahoo! Mail users, who voluntarily opted-in for such research. Supervised editorial judgments show that we can identify more than 70% (recall rate) of all "causal threads" at a precision level of 90%. In addition, for a search scenario we show that we achieve a precision close to 80% at 90% recall. We believe that supporting causal threads in email clients opens new grounds for improving both email search and browsing experiences.