Parallel and distributed computation: numerical methods
Parallel and distributed computation: numerical methods
Topic Detection and Tracking: Event-Based Information Organization
Topic Detection and Tracking: Event-Based Information Organization
Bursty and hierarchical structure in streams
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
The Journal of Machine Learning Research
Parameter free bursty events detection in text streams
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Direct Methods for Sparse Linear Systems (Fundamentals of Algorithms 2)
Direct Methods for Sparse Linear Systems (Fundamentals of Algorithms 2)
Efficient projections onto the l1-ball for learning in high dimensions
Proceedings of the 25th international conference on Machine learning
Introduction to Information Retrieval
Introduction to Information Retrieval
Robust Face Recognition via Sparse Representation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Twitter power: Tweets as electronic word of mouth
Journal of the American Society for Information Science and Technology
Online Learning for Matrix Factorization and Sparse Coding
The Journal of Machine Learning Research
TwitterMonitor: trend detection over the twitter stream
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Dense error correction via l1-minimization
IEEE Transactions on Information Theory
Identifying breakpoints in public opinion
Proceedings of the First Workshop on Social Media Analytics
Twitter under crisis: can we trust what we RT?
Proceedings of the First Workshop on Social Media Analytics
Information resonance on Twitter: watching Iran
Proceedings of the First Workshop on Social Media Analytics
Emerging topic detection using dictionary learning
Proceedings of the 20th ACM international conference on Information and knowledge management
Probabilistic latent semantic analysis
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Alternating Direction Algorithms for $\ell_1$-Problems in Compressive Sensing
SIAM Journal on Scientific Computing
Dense subgraph maintenance under streaming edge weight updates for real-time story identification
Proceedings of the VLDB Endowment
Foundations and Trends® in Machine Learning
-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation
IEEE Transactions on Signal Processing
Concept labeling: building text classifiers with minimal supervision
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Hi-index | 0.00 |
Given the high volume of content being generated online, it becomes necessary to employ automated techniques to separate out the documents belonging to novel topics from the background discussion, in a robust and scalable manner (with respect to the size of the document set). We present a solution to this challenge based on sparse coding, in which a stream of documents (where each document is modeled as an m-dimensional vector y) can be used to learn a dictionary matrix A of dimension m × k, such that the documents can be approximately represented by a linear combination of a few columns of A. If a new document cannot be represented with low error as a sparse linear combination of these columns, then this is a strong indicator of novelty of the document. We scale up this approach to handle millions of documents by parallelizing sparse coding and dictionary learning, and by using the alternating-directions method to solve the resulting optimization problems. We conduct our experiments on high-performance computing clusters with differing architectures and evaluate our approach on news streams and streaming data from Twitter®. Based on the analysis, we share our insights on the distributed optimization and machine architecture that can help the design of exascale systems supporting data analytics.