Generalized vector spaces model in information retrieval
SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Self-Organizing Maps
The Journal of Machine Learning Research
Building Nutch: Open Source Search
Queue - Search Engines
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Integrating data and text mining processes for digital library applications
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Scalability of the Nutch search engine
Proceedings of the 21st annual international conference on Supercomputing
High performance MPI design using unreliable datagram for ultra-scale InfiniBand clusters
Proceedings of the 21st annual international conference on Supercomputing
Evaluating MapReduce for Multi-core and Multiprocessor Systems
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Adapting a message-driven parallel application to GPU-accelerated clusters
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Mars: a MapReduce framework on graphics processors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Graphical Processing Units for Quantum Chemistry
Computing in Science and Engineering
Browsing a Large Collection of Community Photos Based on Similarity on GPU
ISVC '08 Proceedings of the 4th International Symposium on Advances in Visual Computing, Part II
OpenMP to GPGPU: a compiler framework for automatic translation and optimization
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Predictive Runtime Code Scheduling for Heterogeneous Architectures
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Using graphics processors for high performance IR query processing
Proceedings of the 18th international conference on World wide web
Clustering billions of data points using GPUs
Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop
Parallel latent semantic analysis using a graphics processing unit
Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers
IEA/AIE '09 Proceedings of the 22nd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems: Next-Generation Applied Intelligence
Singular value decomposition on GPU using CUDA
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Proceedings of the 18th ACM conference on Information and knowledge management
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system
IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Best-effort semantic document search on GPUs
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
State-of-the-art in heterogeneous computing
Scientific Programming
Programming Massively Parallel Processors: A Hands-on Approach
Programming Massively Parallel Processors: A Hands-on Approach
Hybrid Map Task Scheduling for GPU-Based Heterogeneous Clusters
CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
A Chunking Method for Euclidean Distance Matrix Calculation on Large Dataset Using Multi-GPU
ICMLA '10 Proceedings of the 2010 Ninth International Conference on Machine Learning and Applications
Phoenix++: modular MapReduce for shared-memory systems
Proceedings of the second international workshop on MapReduce and its applications
Productive cluster programming with OmpSs
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Parallelizing BLAST and SOM Algorithms with MapReduce-MPI Library
IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
Multi-GPU MapReduce on GPU Clusters
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
A Fast Algorithm for Constructing Inverted Files on Heterogeneous Platforms
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Mahout in Action
Mont-Blanc: towards energy-efficient HPC systems
Proceedings of the 9th conference on Computing Frontiers
Empowering Visual Categorization With the GPU
IEEE Transactions on Multimedia
MapReduce in MPI for Large-scale graph algorithms
Parallel Computing
Self organization of a massive document collection
IEEE Transactions on Neural Networks
OpenACC: first experiences with real-world applications
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
Dandelion: a compiler and runtime for heterogeneous systems
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Hi-index | 0.00 |
Scientific computations have been using GPU-enabled computers successfully, often relying on distributed nodes to overcome the limitations of device memory. Only a handful of text mining applications benefit from such infrastructure. Since the initial steps of text mining are typically data intensive, and the ease of deployment of algorithms is an important factor in developing advanced applications, we introduce a flexible, distributed, MapReduce-based text mining workflow that performs I/O-bound operations on CPUs with industry-standard tools and then runs compute-bound operations on GPUs which are optimized to ensure coalesced memory access and effective use of shared memory. We have performed extensive tests of our algorithms on a cluster of eight nodes with two NVidia Tesla M2050s attached to each, and we achieve considerable speedups for random projection and self-organizing maps.