Concept decompositions for large sparse text data using clustering
Machine Learning
Structure and Perturbation Analysis of Truncated SVDs for Column-Partitioned Matrices
SIAM Journal on Matrix Analysis and Applications
Document clustering based on non-negative matrix factorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Scalable Supernode Selection in Peer-to-Peer Overlay Networks
HOT-P2P '05 Proceedings of the Second International Workshop on Hot Topics in Peer-to-Peer Systems
Document Clustering Using Locality Preserving Indexing
IEEE Transactions on Knowledge and Data Engineering
Entropy based nearest neighbor search in high dimensions
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Distributed, large-scale latent semantic analysis by index interpolation
Proceedings of the 3rd international conference on Scalable information systems
Hierarchically Distributed Peer-to-Peer Document Clustering and Cluster Summarization
IEEE Transactions on Knowledge and Data Engineering
Data Mining Methods and Models
Data Mining Methods and Models
Clustered SVD strategies in latent semantic indexing
Information Processing and Management: an International Journal
Text retrieval using sparsified concept decomposition matrix
CIS'04 Proceedings of the First international conference on Computational and Information Science
A MapReduce-based distributed SVM algorithm for automatic image annotation
Computers & Mathematics with Applications
Hi-index | 0.00 |
Document indexing using dimension reduction has been widely studied in recent years. Application of these methods in large distributed systems may be inefficient due to the required computational, storage, and communication costs. In this paper, we propose DLPR, a distributed locality preserving dimension reduction algorithm, to project a large distributed data set into a lower dimensional space. Partitioning methods are applied to divide the data set into several clusters. The system nodes communicate through virtual groups to project the clusters to the target space, independently or in conjunction with each other. The actual computation of reduction transforms is performed using Locality Preserving Indexing, which is a less studied method in distributed environments. Experimental results demonstrate the efficiency of DLPR in terms of preserving the local structure of the data set, and reducing the computing and storage costs.