Comparing manual text patterns and machine learning for classification of e-mails for automatic answering by a government agency

Authors:
Hercules Dalianis;Jonas Sjöbergh;Eriks Sneiders
Affiliations:
Department of Computer and Systems Sciences, DSV, Stockholm University, Kista, Sweden;KTH CSC, Stockholm, Sweden;Department of Computer and Systems Sciences, DSV, Stockholm University, Kista, Sweden
Venue:
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Year:
2011

Citing 7
Cited 2

Message classification in the call center

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Email answering assistance by semi-supervised text classification

Intelligent Data Analysis
Active learning with statistical models

Journal of Artificial Intelligence Research
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Detecting emails containing requests for action

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Clustering e-mails for the Swedish social insurance agency - what part of the e-mail thread gives the best quality?

IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
Automated email answering by text pattern matching

IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing

User centered development of automatic e-mail answering for the public sector

HCITOCH'11 Proceedings of the Second international conference on Human-Computer Interaction, Tourism and Cultural Heritage
Learning regular expressions to template-based FAQ retrieval systems

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

E-mails to government institutions as well as to large companies may contain a large proportion of queries that can be answered in a uniform way. We analysed and manually annotated 4,404 e-mails from citizens to the Swedish Social Insurance Agency, and compared two methods for detecting answerable e-mails: manually-created text patterns (rule-based) and machine learning-based methods. We found that the text pattern-based method gave much higher precision at 89 percent than the machine learning-based method that gave only 63 percent precision. The recall was slightly higher (66 percent) for the machine learning-based methods than for the text patterns (47 percent). We also found that 23 percent of the total e-mail flow was processed by the automatic e-mail answering system.