Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
Approximate counting: a detailed analysis
BIT - Ellis Horwood series in artificial intelligence
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Counting large numbers of events in small registers
Communications of the ACM
Similarity estimation techniques from rounding algorithms
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Data streams: algorithms and applications
Foundations and Trends® in Theoretical Computer Science
A quantitative analysis of lexical differences between genders in telephone conversations
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Randomized algorithms and NLP: using locality sensitive hash function for high speed noun clustering
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Online Passive-Aggressive Algorithms
The Journal of Machine Learning Research
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
LIBLINEAR: A Library for Large Linear Classification
The Journal of Machine Learning Research
Finding the frequent items in streams of data
Communications of the ACM - A View of Parallel Computing
Relationship identification for social network discovery
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Streaming for large scale NLP: language modeling
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Succinct approximate counting of skewed data
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Probabilistic counting with randomized storage
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Modeling latent biographic attributes in conversational genres
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Stream-based randomised language models for SMT
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Streaming first story detection with application to Twitter
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Online generation of locality sensitive hash signatures
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Classifying latent user attributes in twitter
SMUC '10 Proceedings of the 2nd international workshop on Search and mining user-generated contents
SWITCHBOARD: telephone speech corpus for research and development
ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
Finding deceptive opinion spam by any stretch of the imagination
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Efficient online locality sensitive hashing via reservoir counting
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Discriminating gender on Twitter
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Hi-index | 0.00 |
Inferring attributes of discourse participants has been treated as a batch-processing task: data such as all tweets from a given author are gathered in bulk, processed, analyzed for a particular feature, then reported as a result of academic interest. Given the sources and scale of material used in these efforts, along with potential use cases of such analytic tools, discourse analysis should be reconsidered as a streaming challenge. We show that under certain common formulations, the batch-processing analytic framework can be decomposed into a sequential series of updates, using as an example the task of gender classification. Once in a streaming framework, and motivated by large data sets generated by social media services, we present novel results in approximate counting, showing its applicability to space efficient streaming classification.