Exploring the similarity space
ACM SIGIR Forum
A general language model for information retrieval
Proceedings of the eighth international conference on Information and knowledge management
LOF: identifying density-based local outliers
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A probabilistic model of information retrieval: development and comparative experiments Part 2
Information Processing and Management: an International Journal
Distance-based outliers: algorithms and applications
The VLDB Journal — The International Journal on Very Large Data Bases
Outlier detection by sampling with accuracy guarantees
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
ICMLA '06 Proceedings of the 5th International Conference on Machine Learning and Applications
An analysis of latent semantic term self-correlation
ACM Transactions on Information Systems (TOIS)
The VLDB Journal — The International Journal on Very Large Data Bases
Adapting Spectral Co-clustering to Documents and Terms Using Latent Semantic Analysis
AI '09 Proceedings of the 22nd Australasian Joint Conference on Advances in Artificial Intelligence
Fast approximate text document clustering using compressive sampling
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Density-preserving projections for large-scale local anomaly detection
Knowledge and Information Systems
Hi-index | 0.00 |
Outlier detection is an important process for text document collections, but as the collection grows, the detection process becomes a computationally expensive task. Random projection has shown to provide a good fast approximation of sparse data, such as document vectors, for outlier detection. The random samples of Fourier and cosine spectrum have shown to provide good approximations of sparse data when performing document clustering. In this article, we investigate the utility of using these random Fourier and cosine spectral projections for document outlier detection. We show that random samples of the Fourier spectrum for outlier detection provides better accuracy and requires less storage when compared with random projection. We also show that random samples of the cosine spectrum for outlier detection provides similar accuracy and computational time when compared with random projection, but requires much less storage.