Clustering e-mails for the Swedish social insurance agency - what part of the e-mail thread gives the best quality?

Authors:
Hercules Dalianis;Magnus Rosell;Eriks Sneiders
Affiliations:
Department of Computer and Systems Science, Stockholm University, Kista, Sweden;Department of Computer and Systems Science, Stockholm University, Kista, Sweden and KTH CSC, Stockholm, Sweden;Department of Computer and Systems Science, Stockholm University, Kista, Sweden
Venue:
IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
Year:
2010

Citing 5
Cited 1

Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
A multi-attribute, multi-weight clustering approach to managing ";e-mail overload"

Decision Support Systems
Introduction to Information Retrieval

Introduction to Information Retrieval
Segmenting email message text into zones

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2

Comparing manual text patterns and machine learning for classification of e-mails for automatic answering by a government agency

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

We need to analyse a large number of e-mails sent by the citizens to the customer services department of a governmental organisation based in Sweden. To carry out this analysis we clustered a large number of e-mails with the aim of automatic e-mail answering. One issue that came up was whether we should use the whole e-mail including the thread or just the original query for the clustering. In this paper we describe this investigation. Our results show that only the query and the answering part should be used, but not necessarily the whole email thread. The results clearly show that the original question contains more useful information than only the answer, although a combination is even better. Using the full e-mail thread does not downgrade the result.