Linguistic redundancy in Twitter

Authors:
Fabio Massimo Zanzotto;Marco Pennacchiotti;Kostas Tsioutsiouliklis
Affiliations:
University of Rome "Tor Vergata", Rome, Italy;Yahoo! Labs, Sunnyvale, CA;Yahoo! Labs, Sunnyvale, CA
Venue:
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Year:
2011

Citing 29
Cited 2

WordNet: a lexical database for English

Communications of the ACM
Making large-scale support vector machine learning practical

Advances in kernel methods
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
More accurate tests for the statistical significance of result differences

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Automatic learning of textual entailments with cross-pair similarities

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Robust textual inference via graph matching

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Learning to recognize features of valid textual entailments

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Fast and effective kernels for relational learning from texts

Proceedings of the 24th international conference on Machine learning
Why we twitter: understanding microblogging usage and communities

Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis
A few chirps about twitter

Proceedings of the first workshop on Online social networks
WordNet::Similarity: measuring the relatedness of concepts

HLT-NAACL--Demonstrations '04 Demonstration Papers at HLT-NAACL 2004
Temporal and information flow based event detection from social text streams

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Exploring content models for multi-document summarization

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
A machine learning approach to textual entailment recognition

Natural Language Engineering
Measuring the semantic similarity of texts

EMSEE '05 Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment
Efficient kernels for sentence pair classification

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
What is Twitter, a social network or a news media?

Proceedings of the 19th international conference on World wide web
PET: a statistical model for popular events tracking in social communities

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Unsupervised modeling of Twitter conversations

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Streaming first story detection with application to Twitter

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Detecting controversial events from twitter

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
An empirical study on learning to rank of tweets

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Semantic role labeling for news tweets

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Robust sentiment detection on Twitter from biased and noisy data

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Enhanced sentiment learning using Twitter hashtags and smileys

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
The PASCAL recognising textual entailment challenge

MLCW'05 Proceedings of the First international conference on Machine Learning Challenges: evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment
Textual entailment recognition using a linguistically–motivated decision tree classifier

MLCW'05 Proceedings of the First international conference on Machine Learning Challenges: evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment

Open domain event extraction from twitter

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic selection of social media responses to news

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the last few years, the interest of the research community in micro-blogs and social media services, such as Twitter, is growing exponentially. Yet, so far not much attention has been paid on a key characteristic of micro-blogs: the high level of information redundancy. The aim of this paper is to systematically approach this problem by providing an operational definition of redundancy. We cast redundancy in the framework of Textual Entailment Recognition. We also provide quantitative evidence on the pervasiveness of redundancy in Twitter, and describe a dataset of redundancy-annotated tweets. Finally, we present a general purpose system for identifying redundant tweets. An extensive quantitative evaluation shows that our system successfully solves the redundancy detection task, improving over baseline systems with statistical significance.