Sentence-Level attachment prediction

Authors:
M-Dyaa Albakour;Udo Kruschwitz;Simon Lucas
Affiliations:
School of Computer Science and Electronic Engineering, University of Essex, Wivenhoe Park, Colchester, UK;School of Computer Science and Electronic Engineering, University of Essex, Wivenhoe Park, Colchester, UK;School of Computer Science and Electronic Engineering, University of Essex, Wivenhoe Park, Colchester, UK
Venue:
IRFC'10 Proceedings of the First international Information Retrieval Facility conference on Adbances in Multidisciplinary Retrieval
Year:
2010

Citing 13
Cited 3

Email overload: exploring personal information management of email

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Learning routing queries in a query zone

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Feature selection for text categorization on imbalanced data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Detecting action-items in e-mail

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Revisiting Whittaker & Sidner's "email overload" ten years later

CSCW '06 Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work
Intelligent email: reply and attachment prediction

Proceedings of the 13th international conference on Intelligent user interfaces
Improving "email speech acts" analysis via n-gram selection

ACTS '09 Proceedings of the HLT-NAACL 2006 Workshop on Analyzing Conversations in Text and Speech
TurKit: tools for iterative tasks on mechanical Turk

Proceedings of the ACM SIGKDD Workshop on Human Computation
Challenges for Sentence Level Opinion Detection in Blogs

ICIS '09 Proceedings of the 2009 Eigth IEEE/ACIS International Conference on Computer and Information Science
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Fast, cheap, and creative: evaluating translation quality using Amazon's Mechanical Turk

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1

Moving towards adaptive search in digital libraries

NLP4DL'09/AT4DL'09 Proceedings of the 2009 international conference on Advanced language technologies for digital libraries
Scaling up high-value retrieval to medium-volume data

IRFC'10 Proceedings of the First international Information Retrieval Facility conference on Adbances in Multidisciplinary Retrieval
Phrase detectives: Utilizing collective intelligence for internet-scale language resource creation

ACM Transactions on Interactive Intelligent Systems (TiiS) - Special section on internet-scale human problem solving and regular papers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Attachment prediction is the task of automatically identifying email messages that should contain an attachment. This can be useful to tackle the problem of sending out emails but forgetting to include the relevant attachment (something that happens all too often). A common Information Retrieval (IR) approach in analyzing documents such as emails is to treat the entire document as a bag of words. Here we propose a finer-grained analysis to address the problem. We aim at identifying individual sentences within an email that refer to an attachment. If we detect any such sentence, we predict that the email should have an attachment. Using part of the Enron corpus for evaluation we find that our finer-grained approach outperforms previously reported document-level attachment prediction in similar evaluation settings. A second contribution this paper makes is to give another successful example of the ‘wisdom of the crowd’ when collecting annotations needed to train the attachment prediction algorithm. The aggregated non-expert judgements collected on Amazon’s Mechanical Turk can be used as a substitute for much more costly expert judgements.