Clustering High Dimensional Data Using SVM

Authors:
Tsau Young Lin;Tam Ngo
Affiliations:
Department of Computer Science, San José State University, San Jose, CA 95192, USA;Department of Computer Science, San José State University, San Jose, CA 95192, USA
Venue:
RSFDGrC '07 Proceedings of the 11th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Year:
2009

Citing 3
Cited 1

The nature of statistical learning theory

The nature of statistical learning theory
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Support vector machines: hype or hallelujah?

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”

SVM-based characterisation of liver cirrhosis by singular value decomposition of GLCM matrix

International Journal of Artificial Intelligence and Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Web contains massive amount of documents to the point where it has become impossible to classify them manually. This project's goal is to find a new method for clustering documents that is as close to humans' classification as possible and at the same time to reduce the size of the documents. This project uses a combination of Latent Semantic Indexing (LSI) with Singular Value Decomposition (SVD) calculation and Support Vector Machine (SVM) classification. Using SVD, data is decomposed and truncated to reduce the data size. The reduced data will be clustered into different categories. Using SVM, clustered data from SVD calculation is used for training to allow new data to be classified based on SVM's prediction. The project's result show that the method of combining SVD and SVM is able to reduce data size and classifies documents reasonably compared to humans' classification.