Document classification by machine: theory and practice
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
That's nice... what can you do with it?
Computational Linguistics
Email formality in the workplace: a case study on the Enron corpus
LSM '11 Proceedings of the Workshop on Languages in Social Media
Distributional lexical semantics for stop lists
IRSG'08 Proceedings of the 2008 BCS-IRSG conference on Corpus Profiling
Hi-index | 0.00 |
This paper describes the largest scale annotation project involving the Enron email corpus to date. Over 12,500 emails were classified, by humans, into the categories "Business" and "Personal", and then sub-categorised by type within these categories. The paper quantifies how well humans perform on this task (evaluated by inter-annotator agreement). It presents the problems experienced with the separation of these language types. As a final section, the paper presents preliminary results using a machine to perform this classification task.