Latent semantic indexing is an optimal special case of multidimensional scaling
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
A corpus-based approach to comparative evaluation of statistical term association measures
Journal of the American Society for Information Science and Technology
Learning from Data: Concepts, Theory, and Methods
Learning from Data: Concepts, Theory, and Methods
Modern Information Retrieval
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A New Sammon Algorithm for Sparse Data Visualization
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 1 - Volume 01
Artificial neural networks for feature extraction and multivariate data projection
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
Dimension reduction techniques are important preprocessing algorithms for high dimensional applications that reduce the noise keeping the main structure of the dataset. They have been successfully applied to a large variety of problems and particularly in text mining applications. However, the algorithms proposed in the literature often suffer from a low discriminant power due to its unsupervised nature and to the ‘curse of dimensionality’. Fortunately several search engines such as Yahoo provide a manually created classification of a subset of documents that may be exploited to overcome this problem. In this paper we propose a semi-supervised version of a PCA like algorithm for textual data analysis. The new method reduces the term space dimensionality taking advantage of this document classification. The proposed algorithm has been evaluated using a text mining problem and it outperforms well known unsupervised techniques.