A multiple cause mixture model for unsupervised learning
Neural Computation
Matrix computations (3rd ed.)
Latent semantic indexing: a probabilistic analysis
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Distributional clustering of words for text classification
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Making large-scale support vector machine learning practical
Advances in kernel methods
Unsupervised learning by probabilistic latent semantic analysis
Machine Learning
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
FOCS '02 Proceedings of the 43rd Symposium on Foundations of Computer Science
RANDOM '02 Proceedings of the 6th International Workshop on Randomization and Approximation Techniques
On the use of the singular value decomposition for text retrieval
Computational information retrieval
Enhanced word clustering for hierarchical text classification
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
The Journal of Machine Learning Research
Correlation Clustering: maximizing agreements via semidefinite programming
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Using mixture models for collaborative filtering
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Feature selection, L1 vs. L2 regularization, and rotational invariance
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Spectral Analysis of Random Graphs with Skewed Degree Distributions
FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
Probabilistic latent semantic analysis
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Hierarchical mixture models: a probabilistic analysis
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Using mixture models for collaborative filtering
Journal of Computer and System Sciences
Unsupervised Text Learning Based on Context Mixture Model with Dirichlet Prior
Advanced Web and NetworkTechnologies, and Applications
Which clustering do you want? inducing your ideal clustering with minimal feedback
Journal of Artificial Intelligence Research
Applying machine learning in accounting research
Expert Systems with Applications: An International Journal
Towards the taxonomy-oriented categorization of yellow pages queries
ACM Transactions on Internet Technology (TOIT)
Hi-index | 0.00 |
We propose a new algorithm for dimensionality reduction and unsupervised text classification. We use mixture models as underlying process of generating corpus and utilize a novel, L1-norm based approach introduced by Kleinberg and Sandler [19]. We show that our algorithm performs extremely well on large datasets, with peak accuracy approaching that of supervised learning based on Support Vector Machines (SVMs) with large training sets. The method is based on the same idea that underlies Latent Semantic Indexing (LSI). We find a good low-dimensional subspace of a feature space and project all documents into it. However our projection minimizes different error, and unlike LSI we build a basis, that in many cases corresponds to the actual topics. We present results of testing of our algorithm on the abstracts of arXiv - an electronic repository of scientific papers, and the 20 Newsgroup dataset - a small snapshot of 20 specific newsgroups.