Streaming analysis of discourse participants

Authors:
Benjamin Van Durme
Affiliations:
Johns Hopkins University
Venue:
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Year:
2012

Citing 26
Cited 0

Random sampling with a reservoir

ACM Transactions on Mathematical Software (TOMS)
Approximate counting: a detailed analysis

BIT - Ellis Horwood series in artificial intelligence
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Counting large numbers of events in small registers

Communications of the ACM
Similarity estimation techniques from rounding algorithms

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Data streams: algorithms and applications

Foundations and Trends® in Theoretical Computer Science
A quantitative analysis of lexical differences between genders in telephone conversations

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Randomized algorithms and NLP: using locality sensitive hash function for high speed noun clustering

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Online Passive-Aggressive Algorithms

The Journal of Machine Learning Research
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Opinion spam and analysis

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research
Finding the frequent items in streams of data

Communications of the ACM - A View of Parallel Computing
Relationship identification for social network discovery

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Streaming for large scale NLP: language modeling

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Succinct approximate counting of skewed data

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Probabilistic counting with randomized storage

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Modeling latent biographic attributes in conversational genres

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Stream-based randomised language models for SMT

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Streaming first story detection with application to Twitter

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Online generation of locality sensitive hash signatures

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Classifying latent user attributes in twitter

SMUC '10 Proceedings of the 2nd international workshop on Search and mining user-generated contents
SWITCHBOARD: telephone speech corpus for research and development

ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
Finding deceptive opinion spam by any stretch of the imagination

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Efficient online locality sensitive hashing via reservoir counting

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Discriminating gender on Twitter

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Inferring attributes of discourse participants has been treated as a batch-processing task: data such as all tweets from a given author are gathered in bulk, processed, analyzed for a particular feature, then reported as a result of academic interest. Given the sources and scale of material used in these efforts, along with potential use cases of such analytic tools, discourse analysis should be reconsidered as a streaming challenge. We show that under certain common formulations, the batch-processing analytic framework can be decomposed into a sequential series of updates, using as an example the task of gender classification. Once in a streaming framework, and motivated by large data sets generated by social media services, we present novel results in approximate counting, showing its applicability to space efficient streaming classification.