Efficient dispersal of information for security, load balancing, and fault tolerance
Journal of the ACM (JACM)
Scans as Primitive Parallel Operations
IEEE Transactions on Computers
A bridging model for parallel computation
Communications of the ACM
High-performance sorting on networks of workstations
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Cluster-based scalable network services
Proceedings of the sixteenth ACM symposium on Operating systems principles
Cluster I/O with River: making the fast case common
Proceedings of the sixth workshop on I/O in parallel and distributed systems
Journal of the ACM (JACM)
Systematic Efficient Parallelization of Scan and Other List Homomorphisms
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Distributed computing in practice: the Condor experience: Research Articles
Concurrency and Computation: Practice & Experience - Grid Performance
Diamond: A Storage Architecture for Early Discard in Interactive Search
FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
Explicit control a batch-aware distributed file system
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
NetSolve/D: A Massively Parallel Grid Execution System for Scalable Data Intensive Collaboration
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
Value-maximizing deadline scheduling and its application to animation rendering
Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
SimFusion: measuring similarity using unified relationship matrix
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Queue - Multiprocessors
Software and the Concurrency Revolution
Queue - Multiprocessors
Global-view abstractions for user-defined reductions and scans
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Computer
Finding near-duplicate web pages: a large-scale evaluation of algorithms
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Probabilistic accuracy bounds for fault-tolerant computations that discard tasks
Proceedings of the 20th annual international conference on Supercomputing
Evolving a language in and for the real world: C++ 1991-2006
Proceedings of the third ACM SIGPLAN conference on History of programming languages
Interpreting the data: Parallel analysis with Sawzall
Scientific Programming - Dynamic Grids and Worldwide Computing
Detecting near-duplicates for web crawling
Proceedings of the 16th international conference on World Wide Web
Google news personalization: scalable online collaborative filtering
Proceedings of the 16th international conference on World Wide Web
JouleSort: a balanced energy-efficiency benchmark
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Manticore: a heterogeneous parallel language
Proceedings of the 2007 workshop on Declarative aspects of multicore programming
Power provisioning for a warehouse-sized computer
Proceedings of the 34th annual international symposium on Computer architecture
Automatic inversion generates divide-and-conquer parallel programs
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Fixing the embarrassing slowness of OpenDHT on PlanetLab
WORLDS'05 Proceedings of the 2nd conference on Real, Large Distributed Systems - Volume 2
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Towards adaptive, scalable, and reliable resource provisioning for wsrf-compliant applications
Proceedings of the 16th international symposium on High performance distributed computing
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Enabling scalability and performance in a large scale CMP environment
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Scalability of the Nutch search engine
Proceedings of the 21st annual international conference on Supercomputing
MRPSO: MapReduce particle swarm optimization
Proceedings of the 9th annual conference on Genetic and evolutionary computation
Efficient document retrieval in main memory
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Corroborate and learn facts from the web
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Parallel test generation and execution with Korat
Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Autonomic operations in cooperative stream processing systems
HotAC II Hot Topics in Autonomic Computing on Hot Topics in Autonomic Computing
Google's MapReduce programming model — Revisited
Science of Computer Programming
Status report: the manticore project
ML '07 Proceedings of the 2007 workshop on Workshop on ML
Sinfonia: a new paradigm for building scalable distributed systems
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Using early phase termination to eliminate load imbalances at barrier synchronization points
Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications
Confessions of a used programming language salesman
Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications
Stasis: flexible transactional storage
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
The Chubby lock service for loosely-coupled distributed systems
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Efficient search ranking in social networks
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
The ghost in the browser analysis of web-based malware
HotBots'07 Proceedings of the first conference on First Workshop on Hot Topics in Understanding Botnets
Community systems research at Yahoo!
ACM SIGMOD Record
Google's MapReduce programming model – Revisited
Science of Computer Programming
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Automatic alignment of large-scale aerial rasters to road-maps
Proceedings of the 15th annual ACM international symposium on Advances in geographic information systems
FastForward for efficient pipeline parallelism: a cache-optimized concurrent lock-free queue
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Merge: a programming model for heterogeneous multi-core systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
On distributing symmetric streaming computations
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Tight lower bounds for selection in randomly ordered streams
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Cluster computing for web-scale data processing
Proceedings of the 39th SIGCSE technical symposium on Computer science education
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Scalable security for petascale parallel file systems
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Web science: an interdisciplinary approach to understanding the web
Communications of the ACM - Web science
Bigtable: A Distributed Storage System for Structured Data
ACM Transactions on Computer Systems (TOCS)
Proceedings of the 5th conference on Computing frontiers
Contextual advertising by combining relevance with click feedback
Proceedings of the 17th international conference on World Wide Web
Video suggestion and discovery for youtube: taking random walks through the view graph
Proceedings of the 17th international conference on World Wide Web
Developing a concurrent service orchestration engine in ccr
Proceedings of the 1st international workshop on Multicore software engineering
Runtime software adaptation: framework, approaches, and styles
Companion of the 30th international conference on Software engineering
Data management projects at Google
ACM SIGMOD Record
The design methodology of Phoenix cluster system software stack
CHINA HPC '07 Proceedings of the 2007 Asian technology information program's (ATIP's) 3rd workshop on High performance computing in China: solution approaches to impediments for high performance computing
Skippy: a new snapshot indexing method for time travel in the storage manager
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Efficient bulk insertion into a distributed ordered table
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
OLTP through the looking glass, and what we found there
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Improved approximations for multiprocessor scheduling under uncertainty
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Viewpoint: Envisioning the future of computing research
Communications of the ACM - Designing games with a purpose
Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Semantic-based distributed i/o with the paramedic framework
HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
DataLab: transactional data-parallel computing on an active storage cloud
HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Accelerating large-scale data exploration through data diffusion
DADC '08 Proceedings of the 2008 international workshop on Data-aware distributed computing
Proceedings of the first international conference on Networks for grid applications
San Fermín: aggregating large data sets using a binomial swap forest
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
D3S: debugging deployed distributed systems
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
Ghost turns zombie: exploring the life cycle of web-based malware
LEET'08 Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats
ACM SIGACT News
Fast support vector machine training and classification on graphics processors
Proceedings of the 25th international conference on Machine learning
Fully distributed EM for very large datasets
Proceedings of the 25th international conference on Machine learning
Algorithms and data structures for external memory
Foundations and Trends® in Theoretical Computer Science
Servo: a programming model for many-core computing
ACM SIGARCH Computer Architecture News
Cut-and-stitch: efficient parallel learning of linear dynamical systems on smps
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Context-aware query suggestion by mining click-through and session data
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Data mining using high performance data clouds: experimental studies using sector and sphere
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
A scalable, commodity data center network architecture
Proceedings of the ACM SIGCOMM 2008 conference on Data communication
Dcell: a scalable and fault-tolerant network structure for data centers
Proceedings of the ACM SIGCOMM 2008 conference on Data communication
Answering what-if deployment and configuration questions with wise
Proceedings of the ACM SIGCOMM 2008 conference on Data communication
Automatic optimization of parallel dataflow programs
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Free factories: unified infrastructure for data intensive web services
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Implicitly-threaded parallelism in Manticore
Proceedings of the 13th ACM SIGPLAN international conference on Functional programming
Toward loosely coupled programming on petascale systems
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Materialized community ground models for large-scale earthquake simulation
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A scalable parallel framework for analyzing terascale molecular dynamics simulation trajectories
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Compute and storage clouds using wide area high performance networks
Future Generation Computer Systems
Large-Scale Parallel Collaborative Filtering for the Netflix Prize
AAIM '08 Proceedings of the 4th international conference on Algorithmic Aspects in Information and Management
Towards Large Scale Semantic Annotation Built on MapReduce Architecture
ICCS '08 Proceedings of the 8th international conference on Computational Science, Part III
Comparative Studies Simplified in GPFlow
ICCS '08 Proceedings of the 8th international conference on Computational Science, Part III
User Defined Partitioning - Group Data Based on Computation Model
DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Towards the Design of a Scalable Email Archiving and Discovery Solution
ADBIS '08 Proceedings of the 12th East European conference on Advances in Databases and Information Systems
Clustera: an integrated computation and data management system
Proceedings of the VLDB Endowment
Simrank++: query rewriting through link analysis of the click graph
Proceedings of the VLDB Endowment
Scheduling shared scans of large data files
Proceedings of the VLDB Endowment
Pfp: parallel fp-growth for query recommendation
Proceedings of the 2008 ACM conference on Recommender systems
SCOPE: easy and efficient parallel processing of massive data sets
Proceedings of the VLDB Endowment
PNUTS: Yahoo!'s hosted data serving platform
Proceedings of the VLDB Endowment
Ad-hoc data processing in the cloud
Proceedings of the VLDB Endowment
Large-scale collaborative analysis and extraction of web data
Proceedings of the VLDB Endowment
Web-scale named entity recognition
Proceedings of the 17th ACM conference on Information and knowledge management
Data weaving: scaling up the state-of-the-art in data clustering
Proceedings of the 17th ACM conference on Information and knowledge management
Sindice.com: a document-oriented lookup index for open linked data
International Journal of Metadata, Semantics and Ontologies
GRIMS: a scalable management and storage system for massive remote sensing images
Proceedings of the 3rd international conference on Scalable information systems
Distributed, large-scale latent semantic analysis by index interpolation
Proceedings of the 3rd international conference on Scalable information systems
Disk aware discord discovery: finding unusual time series in terabyte sized datasets
Knowledge and Information Systems
Criteria to Compare Cloud Computing with Current Database Technology
IWSM/Metrikon/Mensura '08 Proceedings of the International Conferences on Software Process and Product Measurement
WorldTravel: A Testbed for Service-Oriented Applications
ICSOC '08 Proceedings of the 6th International Conference on Service-Oriented Computing
Thread Safety through Partitions and Effect Agreements
Languages and Compilers for Parallel Computing
gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments
Languages and Compilers for Parallel Computing
The Metadata Triumvirate: Social Annotations, Anchor Texts and Search Queries
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
SS'08 Proceedings of the 17th conference on Security symposium
Bellwether analysis: Searching for cost-effective query-defined predictors in large databases
ACM Transactions on Knowledge Discovery from Data (TKDD)
Serialization sets: a dynamic dependence-based parallel execution model
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
PCM '08 Proceedings of the 9th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Gordon: using flash memory to build fast, power-efficient clusters for data-intensive applications
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Bringing big systems to small schools: distributed systems for undergraduates
Proceedings of the 40th ACM technical symposium on Computer science education
Seattle: a platform for educational cloud computing
Proceedings of the 40th ACM technical symposium on Computer science education
Data integration flows for business intelligence
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Teaching large scale data processing: the five-week course and two years' experiences
SCE '08 Proceedings of the 1st ACM Summit on Computing Education in China on First ACM Summit on Computing Education in China
SnowFlock: rapid virtual machine cloning for cloud computing
Proceedings of the 4th ACM European conference on Computer systems
SCAN-Lite: enterprise-wide analysis on the cheap
Proceedings of the 4th ACM European conference on Computer systems
Adding the easy button to the cloud with SnowFlock and MPI
Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing
Smart Miner: a new framework for mining large scale web usage data
Proceedings of the 18th international conference on World wide web
Proceedings of the 18th international conference on World wide web
Adaptive workload allocation in query processing in autonomous heterogeneous environments
Distributed and Parallel Databases
Future Generation Computer Systems
Map-reduce programming model and hadoop distributed file system for use in undergraduate curriculum
Journal of Computing Sciences in Colleges
Supporting MapReduce on large-scale asymmetric multi-core clusters
ACM SIGOPS Operating Systems Review
Traverse: Simplified Indexing on Large Map-Reduce-Merge Clusters
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
TRUSTER: TRajectory Data Processing on ClUSTERs
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Dynamic Resources Management of Virtual Appliances on a Computational Cluster
Euro-Par 2008 Workshops - Parallel Processing
A Parallel Algorithm for Finding Related Pages in the Web by Using Segmented Link Structures
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
A translation system for enabling data mining applications on GPUs
Proceedings of the 23rd international conference on Supercomputing
Harnessing parallelism in multicore clusters with the all-pairs and wavefront abstractions
Proceedings of the 18th ACM international symposium on High performance distributed computing
CLOUDLET: towards mapreduce implementation on virtual machines
Proceedings of the 18th ACM international symposium on High performance distributed computing
The quest for scalable support of data-intensive workloads in distributed systems
Proceedings of the 18th ACM international symposium on High performance distributed computing
A distributed architecture for data mining and integration
Proceedings of the second international workshop on Data-aware distributed computing
Abstract storage: moving file format-specific abstractions intopetabyte-scale storage systems
Proceedings of the second international workshop on Data-aware distributed computing
Large-scale deep unsupervised learning using graphics processors
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Tashi: location-aware cluster management
ACDC '09 Proceedings of the 1st workshop on Automated control for datacenters and clouds
Crystal-growth-inspired algorithms for computational grids
BADS '09 Proceedings of the 2009 workshop on Bio-inspired algorithms for distributed systems
Fastest parallel molecular algorithms for the elliptic curve discrete logarithm problem over GF(2n)
BADS '09 Proceedings of the 2009 workshop on Bio-inspired algorithms for distributed systems
MapReduce optimization using regulated dynamic prioritization
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Toward a cloud computing research agenda
ACM SIGACT News
Open-source grid technologies for web-scale computing
ACM SIGACT News
BBM: bayesian browsing model from petabyte-scale data
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Social influence analysis in large-scale networks
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
DOULION: counting triangles in massive graphs with a coin
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining rich session context to improve web search
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
OLAP on search logs: an infrastructure supporting data-driven applications in search engines
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Pairwise document similarity in large collections with MapReduce
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Zeno: eventually consistent Byzantine-fault tolerance
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
RPC chains: efficient client-server communication in geodistributed systems
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
BotGraph: large scale spamming botnet detection
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
A comparison of approaches to large-scale data analysis
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Generating example data for dataflow programs
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
E = MC3: managing uncertain enterprise data in a cluster-computing environment
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Distributed data-parallel computing using a high-level programming language
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Experiences on Processing Spatial Data with MapReduce
SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
What's inside the Cloud? An architectural map of the Cloud landscape
CLOUD '09 Proceedings of the 2009 ICSE Workshop on Software Engineering Challenges of Cloud Computing
Engineering the cloud from software modules
CLOUD '09 Proceedings of the 2009 ICSE Workshop on Software Engineering Challenges of Cloud Computing
Investigation of the accuracy of search engine hit counts
Journal of Information Science
Data-intensive computing for competent genetic algorithms: a pilot study using meandre
Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Large scale data mining using genetics-based machine learning
Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers
Brute force and indexed approaches to pairwise document similarity comparisons with MapReduce
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
On single-pass indexing with MapReduce
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
PLDA: Parallel Latent Dirichlet Allocation for Large-Scale Applications
AAIM '09 Proceedings of the 5th International Conference on Algorithmic Aspects in Information and Management
Cesar-FD: An Effective Stateful Fault Detection Mechanism in Drug Discovery Grid
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
The Eucalyptus Open-Source Cloud-Computing System
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Performance Issues in Parallelizing Data-Intensive Applications on a Multi-core Cluster
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
PortLand: a scalable fault-tolerant layer 2 data center network fabric
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
BCube: a high performance, server-centric network architecture for modular data centers
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
ROAR: increasing the flexibility and performance of distributed search
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
Safe and effective fine-grained TCP retransmissions for datacenter communication
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
D3N: programming distributed computationin pocket switched networks
Proceedings of the 1st ACM workshop on Networking, systems, and applications for mobile handhelds
Why should we integrate services, servers, and networking in a data center?
Proceedings of the 1st ACM workshop on Research on enterprise networking
Query interactions in database workloads
Proceedings of the Second International Workshop on Testing Database Systems
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Mining in a mobile environment
Proceedings of the Third International Workshop on Knowledge Discovery from Sensor Data
Efficient Clustering of Web-Derived Data Sets
MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
An Approach to Web-Scale Named-Entity Disambiguation
MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A Data Parallel Algorithm for XML DOM Parsing
XSym '09 Proceedings of the 6th International XML Database Symposium on Database and XML Technologies
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
MapReduce Programming Model for .NET-Based Cloud Computing
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Implementing Parallel Google Map-Reduce in Eden
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
A Vision for Next Generation Query Processors and an Associated Research Agenda
Globe '09 Proceedings of the 2nd International Conference on Data Management in Grid and Peer-to-Peer Systems
Journal of Computing Sciences in Colleges
Data-intensive text processing with MapReduce
NAACL-Tutorials '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Tutorial Abstracts
Extending AspectJ for separating regions
GPCE '09 Proceedings of the eighth international conference on Generative programming and component engineering
SETQA-NLP '09 Proceedings of the Workshop on Software Engineering, Testing, and Quality Assurance for Natural Language Processing
Web page clustering using heuristic search in the web graph
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Fast, easy, and cheap: construction of statistical machine translation models with MapReduce
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Exploring large-data issues in the curriculum: a case study with MapReduce
TeachCL '08 Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics
Sinfonia: A new paradigm for building scalable distributed systems
ACM Transactions on Computer Systems (TOCS)
MapReduce and parallel DBMSs: friends or foes?
Communications of the ACM - Amir Pnueli: Ahead of His Time
MapReduce: a flexible data processing tool
Communications of the ACM - Amir Pnueli: Ahead of His Time
FAWN: a fast array of wimpy nodes
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
The multikernel: a new OS architecture for scalable multicore systems
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Distributed aggregation for data-parallel computing: interfaces and implementations
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Quincy: fair scheduling for distributed computing clusters
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Combinatorial Framework for Similarity Search
SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
Grace: safe multithreaded programming for C/C++
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Proceedings of the First Asia-Pacific Symposium on Internetware
Living with the Law: Can Automation give us Moore with Less?
ASE '08 Proceedings of the 2008 23rd IEEE/ACM International Conference on Automated Software Engineering
The nature of data center traffic: measurements & analysis
Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference
Composing and executing parallel data-flow graphs with shell pipes
Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science
Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science
SPIDER: a system for scalable, parallel / distributed evaluation of large-scale RDF data
Proceedings of the 18th ACM conference on Information and knowledge management
Lessons learned from a year's worth of benchmarks of large data clouds
Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
Nephele: efficient parallel data processing in the cloud
Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
Highly scalable genome assembly on campus grids
Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
Query processing of massive trajectory data based on mapreduce
Proceedings of the first international workshop on Cloud data management
Leveraging a scalable row store to build a distributed text index
Proceedings of the first international workshop on Cloud data management
Implementation of an Orchestration Language as a Haskell Domain Specific Language
Electronic Notes in Theoretical Computer Science (ENTCS)
What is analytic infrastructure and why should you care?
ACM SIGKDD Explorations Newsletter
Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
Mercury: a reflective middleware for automatic parallelization of Bags-of-Tasks
Proceedings of the 8th International Workshop on Adaptive and Reflective MIddleware
MDCube: a high performance network structure for modular data center interconnection
Proceedings of the 5th international conference on Emerging networking experiments and technologies
Detecting network neutrality violations with causal inference
Proceedings of the 5th international conference on Emerging networking experiments and technologies
Marvin: Distributed reasoning over large-scale Semantic Web data
Web Semantics: Science, Services and Agents on the World Wide Web
Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human
Arabic cross-document coreference detection
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Proceedings of the VLDB Endowment
Building a high-level dataflow system on top of Map-Reduce: the Pig experience
Proceedings of the VLDB Endowment
PLANET: massively parallel learning of tree ensembles with MapReduce
Proceedings of the VLDB Endowment
MAD skills: new analysis practices for big data
Proceedings of the VLDB Endowment
How best to build web-scale data managers?
Proceedings of the VLDB Endowment
Adaptively parallelizing distributed range queries
Proceedings of the VLDB Endowment
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads
Proceedings of the VLDB Endowment
A distributed pool architecture for genetic algorithms
CEC'09 Proceedings of the Eleventh conference on Congress on Evolutionary Computation
Phrase clustering for discriminative learning
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
The Quest for Parallel Reasoning on the Semantic Web
AMT '09 Proceedings of the 5th International Conference on Active Media Technology
Scalable Distributed Reasoning Using MapReduce
ISWC '09 Proceedings of the 8th International Semantic Web Conference
RAPID: Enabling Scalable Ad-Hoc Analytics on the Semantic Web
ISWC '09 Proceedings of the 8th International Semantic Web Conference
DisTec: Towards a Distributed System for Telecom Computing
CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
Cloud Computing Boosts Business Intelligence of Telecommunication Industry
CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
Distributed Structured Database System HugeTable
CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
An Efficient Cloud Computing-Based Architecture for Freight System Application in China Railway
CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
Evaluating MapReduce on Virtual Machines: The Hadoop Case
CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
Parallel K-Means Clustering Based on MapReduce
CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
A Data Distribution Aware Task Scheduling Strategy for MapReduce System
CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
A Skeletal Parallel Framework with Fusion Optimizer for GPGPU Programming
APLAS '09 Proceedings of the 7th Asian Symposium on Programming Languages and Systems
The infinite HMM for unsupervised PoS tagging
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Optimizing relational algebra operations using generic equivalence discriminators and lazy products
Proceedings of the 2010 ACM SIGPLAN workshop on Partial evaluation and program manipulation
Ranking and semi-supervised classification on large scale graphs using map-reduce
TextGraphs-4 Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing
Ganesha: blackBox diagnosis of MapReduce systems
ACM SIGMETRICS Performance Evaluation Review
DiskReduce: RAID for data-intensive scalable computing
Proceedings of the 4th Annual Workshop on Petascale Data Storage
Case studies in storage access by loosely coupled petascale applications
Proceedings of the 4th Annual Workshop on Petascale Data Storage
The case for RAMClouds: scalable high-performance storage entirely in DRAM
ACM SIGOPS Operating Systems Review
Learning URL patterns for webpage de-duplication
Proceedings of the third ACM international conference on Web search and data mining
On compressing the textual web
Proceedings of the third ACM international conference on Web search and data mining
FPMR: MapReduce framework on FPGA
Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Empirical evaluation of latency-sensitive application performance in the cloud
MMSys '10 Proceedings of the first annual ACM SIGMM conference on Multimedia systems
Learning based opportunistic admission control algorithm for MapReduce as a service
Proceedings of the 3rd India software engineering conference
A breadth-first course in multicore and manycore programming
Proceedings of the 41st ACM technical symposium on Computer science education
Extraction of user profile based on the hadoop framework
WiCOM'09 Proceedings of the 5th International Conference on Wireless communications, networking and mobile computing
DEDUCE: at the intersection of MapReduce and stream processing
Proceedings of the 13th International Conference on Extending Database Technology
An experimental study of time-constrained aggregate queries
Proceedings of the 13th International Conference on Extending Database Technology
Xbase: cloud-enabled information appliance for healthcare
Proceedings of the 13th International Conference on Extending Database Technology
FPGAs: a new point in the database design space
Proceedings of the 13th International Conference on Extending Database Technology
LazyBase: freshness vs. performance in information management
ACM SIGOPS Operating Systems Review
ACM SIGOPS Operating Systems Review
An energy case for hybrid datacenters
ACM SIGOPS Operating Systems Review
Mining dependency in distributed systems through unstructured logs analysis
ACM SIGOPS Operating Systems Review
ACM Transactions on Information Systems (TOIS)
How to improve XML web services performance?
Proceedings of the International Conference and Workshop on Emerging Trends in Technology
A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes
Data Mining and Knowledge Discovery
Heuristics for multi-round divisible loads scheduling with limited memory
Parallel Computing
RunTest: assuring integrity of dataflow processing in cloud computing infrastructures
ASIACCS '10 Proceedings of the 5th ACM Symposium on Information, Computer and Communications Security
Boom analytics: exploring data-centric, declarative programming for the cloud
Proceedings of the 5th European conference on Computer systems
HadoopToSQL: a mapReduce query optimizer
Proceedings of the 5th European conference on Computer systems
Design and use of htalib: a library for hierarchically tiled arrays
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Test case generation for the task tree type of architecture
Information and Software Technology
Scalable techniques for document identifier assignment in inverted indexes
Proceedings of the 19th international conference on World wide web
Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce
Proceedings of the 19th international conference on World wide web
Building taxonomy of web search intents for name entity queries
Proceedings of the 19th international conference on World wide web
Access: news and blog analysis for the social sciences
Proceedings of the 19th international conference on World wide web
Decoupling storage and computation in Hadoop with SuperDataNodes
ACM SIGOPS Operating Systems Review
Harnessing input redundancy in a MapReduce framework
Proceedings of the 2010 ACM Symposium on Applied Computing
Semi-join computation on distributed file systems using map-reduce-merge model
Proceedings of the 2010 ACM Symposium on Applied Computing
A typed calculus for querying distributed XML documents
TGC'06 Proceedings of the 2nd international conference on Trustworthy global computing
Estimating clustering indexes in data streams
ESA'07 Proceedings of the 15th annual European conference on Algorithms
Towards scalable architectures for clickstream data warehousing
DNIS'07 Proceedings of the 5th international conference on Databases in networked information systems
Towards scalable RDF graph analytics on MapReduce
Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
SPARQL basic graph pattern processing with iterative MapReduce
Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
Semplore: an IR approach to scalable hybrid query of semantic web data
ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Empowering automatic semantic annotation in grid
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Service combinators for farming virtual machines
COORDINATION'08 Proceedings of the 10th international conference on Coordination models and languages
Semantic sitemaps: efficient and flexible access to datasets on the semantic web
ESWC'08 Proceedings of the 5th European semantic web conference on The semantic web: research and applications
Energy-efficient cluster computing with FAWN: workloads and implications
Proceedings of the 1st International Conference on Energy-Efficient Computing and Networking
New challenges of parallel job scheduling
JSSPP'07 Proceedings of the 13th international conference on Job scheduling strategies for parallel processing
A RESTful messaging system for asynchronous distributed processing
Proceedings of the First International Workshop on RESTful Design
FlumeJava: easy, efficient data-parallel pipelines
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Stateful bulk processing for incremental analytics
Proceedings of the 1st ACM symposium on Cloud computing
Comet: batched stream processing for data intensive distributed computing
Proceedings of the 1st ACM symposium on Cloud computing
Skew-resistant parallel processing of feature-extracting scientific user-defined functions
Proceedings of the 1st ACM symposium on Cloud computing
Fluxo: a system for internet service programming by non-expert developers
Proceedings of the 1st ACM symposium on Cloud computing
Nephele/PACTs: a programming model and execution framework for web-scale analytical processing
Proceedings of the 1st ACM symposium on Cloud computing
Towards automatic optimization of MapReduce programs
Proceedings of the 1st ACM symposium on Cloud computing
Making cloud intermediate data fault-tolerant
Proceedings of the 1st ACM symposium on Cloud computing
Robust and flexible power-proportional storage
Proceedings of the 1st ACM symposium on Cloud computing
Pregel: a system for large-scale graph processing
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Automatic contention detection and amelioration for data-intensive operations
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
ParaTimer: a progress indicator for MapReduce DAGs
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Indexing multi-dimensional data in a cloud system
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Overview of sciDB: large scale array storage, processing and analysis
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A comparison of join algorithms for log processing in MaPreduce
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Ricardo: integrating R and Hadoop
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Data warehousing and analytics infrastructure at facebook
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Continuous analytics over discontinuous streams
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Online aggregation and continuous query support in MapReduce
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
MapDupReducer: detecting near duplicates over massive datasets
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Large graph processing in the cloud
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
KAdvice: infering synchronization patterns from an existing codebase
Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering
Predictable time-sharing for DryadLINQ cluster
Proceedings of the 7th international conference on Autonomic computing
Proceedings of the 24th ACM International Conference on Supercomputing
Interaction-based programming towards translucent clouds: position paper
APLWACA '10 Proceedings of the 2010 Workshop on Analysis and Programming Languages for Web Applications and Cloud Applications
Assigning tasks for efficiency in Hadoop: extended abstract
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Middleware'09 Proceedings of the ACM/IFIP/USENIX 10th international conference on Middleware
Web-scale computer vision using MapReduce for multimedia data mining
Proceedings of the Tenth International Workshop on Multimedia Data Mining
APHID: An architecture for private, high-performance integrated data mining
Future Generation Computer Systems
Parallel programming framework for large batch transaction processing on scale-out systems
Proceedings of the 3rd Annual Haifa Experimental Systems Conference
ASSET queries: a declarative alternative to MapReduce
ACM SIGMOD Record
A simple framework to generate parallel application for geospatial processing
Proceedings of the 1st International Conference and Exhibition on Computing for Geospatial Research & Application
On distributing symmetric streaming computations
ACM Transactions on Algorithms (TALG)
Flood: elastic streaming MapReduce
Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems
Design patterns for efficient graph algorithms in MapReduce
Proceedings of the Eighth Workshop on Mining and Learning with Graphs
Fast parallelization of differential evolution algorithm using MapReduce
Proceedings of the 12th annual conference on Genetic and evolutionary computation
Toward a cost-effective cloud storage service
ICACT'10 Proceedings of the 12th international conference on Advanced communication technology
A general approach to data-intensive computing using the Meandre component-based framework
Proceedings of the 1st International Workshop on Workflow Approaches to New Data-centric Science
The impact of virtualization on network performance of amazon EC2 data center
INFOCOM'10 Proceedings of the 29th conference on Information communications
Extremely large-scale sensing applications for planetary WSNs
Proceedings of the 2nd ACM International Workshop on Hot Topics in Planet-scale Measurement
Temporal click model for sponsored search
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Efficient partial-duplicate detection based on sequence matching
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Adaptive system anomaly prediction for large-scale hosting infrastructures
Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing
Brief announcement: modelling MapReduce for optimal execution in the cloud
Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing
Malstone: towards a benchmark for analytics on large data clouds
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
VDB-MR: MapReduce-based distributed data integration using virtual database
Future Generation Computer Systems
Suspending, migrating and resuming HPC virtual clusters
Future Generation Computer Systems
Misco: a MapReduce framework for mobile systems
Proceedings of the 3rd International Conference on PErvasive Technologies Related to Assistive Environments
Middleware support for many-task computing
Cluster Computing
Designing Accelerator-Based Distributed Systems for High Performance
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
WORKEM: Representing and Emulating Distributed Scientific Workflow Execution State
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Service Oriented Approach to High Performance Scientific Computing
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
A Capabilities-Aware Programming Model for Asymmetric High-End Systems
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
File-Access Characteristics of Data-Intensive Workflow Applications
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
SciCloud: Scientific Computing on the Cloud
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
A Map-Reduce System with an Alternate API for Multi-core Environments
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
An Analysis of Traces from a Production MapReduce Cluster
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Cell broadband engine processor performance optimization: tracing tools implementation and use
IBM Journal of Research and Development
MapReduce for the cell broadband engine architecture
IBM Journal of Research and Development
Workload and network-optimized computing systems
IBM Journal of Research and Development
Generic and automatic address configuration for data center networks
Proceedings of the ACM SIGCOMM 2010 conference
Symbiotic routing in future data centers
Proceedings of the ACM SIGCOMM 2010 conference
c-Through: part-time optics in data centers
Proceedings of the ACM SIGCOMM 2010 conference
Topology-aware resource allocation for data-intensive workloads
Proceedings of the first ACM asia-pacific workshop on Workshop on systems
Energy-aware routing in data center network
Proceedings of the first ACM SIGCOMM workshop on Green networking
Proceedings of the second ACM SIGCOMM workshop on Networking, systems, and applications on mobile handhelds
MRAP: a novel MapReduce-based framework to support HPC analytics applications with access patterns
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
DiscFinder: a data-intensive scalable cluster finder for astrophysics
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
An overview of the Open Science Data Cloud
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Weaver: integrating distributed computing abstractions into scientific workflows using Python
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
ROARS: a scalable repository for data intensive scientific computing
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Parallel processing of data from very large-scale wireless sensor networks
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Massive Semantic Web data compression with MapReduce
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Very large pattern databases for heuristic search
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Pydoop: a Python MapReduce and HDFS API for Hadoop
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Pairwise Element Computation with MapReduce
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Multi-GPU volume rendering using MapReduce
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
MR-scope: a real-time tracing tool for MapReduce
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Concordia: a Google for malware
Proceedings of the Sixth Annual Workshop on Cyber Security and Information Intelligence Research
MapCG: writing parallel program portable between CPU and GPU
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
AM++: a generalized active message framework
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Panache: a parallel file system cache for global file access
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Refactoring human roles solves systems problems
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
In search of an API for scalable file systems: under the table or above it?
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
Cloud analytics: do we really need to reinvent the storage stack?
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
Mochi: visual log-analysis based tools for debugging hadoop
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
A common substrate for cluster computing
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
DryadInc: reusing work in large-scale computations
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
Towards optimizing hadoop provisioning in the cloud
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
HotDep'08 Proceedings of the Fourth conference on Hot topics in system dependability
HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
On availability of intermediate data in cloud computations
HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
FLUXO: a simple service compiler
HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
Volley: automated data placement for geo-distributed cloud services
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Experiences with CoralCDN: a five-year operational view
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Hedera: dynamic flow scheduling for data center networks
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
TAPP'10 Proceedings of the 2nd conference on Theory and practice of provenance
Stout: an adaptive interface to scalable cloud storage
USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
The utility coprocessor: massively parallel computation from the coffee shop
USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
Scalable I/O - a well-architected way to do scalable, secure and virtualized I/O
WIOV'08 Proceedings of the First conference on I/O virtualization
SALSA: analyzing logs as state machines
WASL'08 Proceedings of the First USENIX conference on Analysis of system logs
Bayesian Browsing Model: Exact Inference of Document Relevance from Petabyte-Scale Data
ACM Transactions on Knowledge Discovery from Data (TKDD)
Distributed training strategies for the structured perceptron
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Variational inference for adaptor grammars
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Online generation of locality sensitive hash signatures
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
ACLDemos '10 Proceedings of the ACL 2010 System Demonstrations
Manimal: relational optimization for data-intensive programs
Procceedings of the 13th International Workshop on the Web and Databases
Reliable data-center scale computations
Proceedings of the 4th International Workshop on Large Scale Distributed Systems and Middleware
Parallel bulk insertion for large-scale analytics applications
Proceedings of the 4th International Workshop on Large Scale Distributed Systems and Middleware
Evaluating point-based POMDP solvers on multicore machines
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on gait analysis
A technology to expose a cluster as a service in a cloud
AusPDC '10 Proceedings of the Eighth Australasian Symposium on Parallel and Distributed Computing - Volume 107
Distributing frequency-dependent data stream computations
CATS '09 Proceedings of the Fifteenth Australasian Symposium on Computing: The Australasian Theory - Volume 94
Task superscalar: using processors as functional units
HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
See spot run: using spot instances for mapreduce workflows
HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Scripting the cloud with skywriting
HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Towards energy proportional cloud for data processing frameworks
SustainIT'10 Proceedings of the First USENIX conference on Sustainable information technology
Lessons from implementing the biCGStab method with SkeTo library
Proceedings of the fourth international workshop on High-level parallel programming and applications
Generic multiset programming for language-integrated querying
Proceedings of the 6th ACM SIGPLAN workshop on Generic programming
The YouTube video recommendation system
Proceedings of the fourth ACM conference on Recommender systems
International Journal of Ad Hoc and Ubiquitous Computing
A time-aware type system for data-race protection and guaranteed initialization
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Task types for pervasive atomicity
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Language virtualization for heterogeneous parallel computing
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Towards personal high-performance geospatial computing (HPC-G): perspectives and a case study
Proceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems
Distributed asynchronous online learning for natural language processing
CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Improving gender classification of blog authors
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Combining optimizations in automated low power design
Proceedings of the Conference on Design, Automation and Test in Europe
Adaptive query execution for data management in the cloud
CloudDB '10 Proceedings of the second international workshop on Cloud data management
Towards a data-centric view of cloud security
CloudDB '10 Proceedings of the second international workshop on Cloud data management
Efficient data consolidation in grid networks and performance analysis
Future Generation Computer Systems
A model of computation for MapReduce
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Large scale parallel document mining for machine translation
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Large-scale music tag recommendation with explicit multiple attributes
Proceedings of the international conference on Multimedia
Proceedings of the international conference on Multimedia
Compression, indexing, and retrieval for massive string data
CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Implementation of a stream-based IP flow record query language
AIMS'10 Proceedings of the Mechanisms for autonomous management of networks and services, and 4th international conference on Autonomous infrastructure, management and security
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Scalable clustering algorithm for N-body simulations in a shared-nothing cluster
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
A study of transcoding on cloud environments for video content delivery
Proceedings of the 2010 ACM multimedia workshop on Mobile cloud media computing
Proceedings of the ACM SIGSPATIAL International Workshop on GeoStreaming
Selecting representative IP addresses for internet topology studies
IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
Scafida: a scale-free network inspired data center architecture
ACM SIGCOMM Computer Communication Review
Distributed SLCA-based XML keyword search by map-reduce
DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Parallel collision detection algorithm based on OBB tree and mapreduce
Edutainment'10 Proceedings of the Entertainment for education, and 5th international conference on E-learning and games
Proceedings of the FSE/SDP workshop on Future of software engineering research
Large-scale multimodal mining for healthcare with mapreduce
Proceedings of the 1st ACM International Health Informatics Symposium
XML structural similarity search using mapreduce
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Comparing Hadoop and Fat-Btree based access method for small file I/O applications
WAIM'10 Proceedings of the 11th international conference on Web-age information management
JAWS: Job-Aware Workload Scheduling for the Exploration of Turbulence Simulations
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Elastic Cloud Caches for Accelerating Service-Oriented Computations
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Scalable repositories for virtual clusters
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Coniunge et impera: multiple-graph mining for query-log analysis
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
MapReduce for information retrieval evaluation: "let's quickly test this on 12 TB of data"
CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
Evaluating IPv6 adoption in the internet
PAM'10 Proceedings of the 11th international conference on Passive and active measurement
High-performance Computing in China: Research and Applications
International Journal of High Performance Computing Applications
A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers
Software—Practice & Experience - Focus on Selected PhD Literature Reviews in the Practical Aspects of Software Technology
On the feasibility of dynamic rescheduling on the Intel Distributed Computing Platform
Proceedings of the 11th International Middleware Conference Industrial track
Private searching on MapReduce
TrustBus'10 Proceedings of the 7th international conference on Trust, privacy and security in digital business
VL2: a scalable and flexible data center network
Communications of the ACM
A capabilities-aware framework for using computational accelerators in data-intensive computing
Journal of Parallel and Distributed Computing
Energy management for MapReduce clusters
Proceedings of the VLDB Endowment
HaLoop: efficient iterative data processing on large clusters
Proceedings of the VLDB Endowment
Dremel: interactive analysis of web-scale datasets
Proceedings of the VLDB Endowment
Runtime measurements in the cloud: observing, analyzing, and reducing variance
Proceedings of the VLDB Endowment
The performance of MapReduce: an in-depth study
Proceedings of the VLDB Endowment
MRShare: sharing across multiple queries in MapReduce
Proceedings of the VLDB Endowment
Hadoop++: making a yellow elephant run like a cheetah (without it even noticing)
Proceedings of the VLDB Endowment
Behavioral simulations in MapReduce
Proceedings of the VLDB Endowment
DataGarage: warehousing massive performance data on commodity servers
Proceedings of the VLDB Endowment
Cheetah: a high performance, custom data warehouse on top of MapReduce
Proceedings of the VLDB Endowment
Massively parallel data analysis with PACTs on Nephele
Proceedings of the VLDB Endowment
ICTCP: Incast Congestion Control for TCP in data center networks
Proceedings of the 6th International COnference
SecondNet: a data center network virtualization architecture with bandwidth guarantees
Proceedings of the 6th International COnference
Frontiers of Computer Science in China
Macroscopic characterisations of Web accessibility
The New Review of Hypermedia and Multimedia - Web Accessibility
Knuckles: bringing the database to the data
International Journal of Computational Science and Engineering
Integrating MapReduce and RDBMSs
Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative Research
Web data processing on the cloud
Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative Research
Large-scale incremental processing using distributed transactions and notifications
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Reining in the outliers in map-reduce clusters using Mantri
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Piccolo: building fast, distributed programs with partitioned tables
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
SEATTLE: A Scalable Ethernet Architecture for Large Enterprises
ACM Transactions on Computer Systems (TOCS)
SnowFlock: Virtual Machine Cloning as a First-Class Cloud Primitive
ACM Transactions on Computer Systems (TOCS)
Topology-aware resource allocation for data-intensive workloads
ACM SIGCOMM Computer Communication Review
On the expressiveness and trade-offs of large scale tuple stores
OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems: Part II
The case for object databases in cloud data management
ICOODB'10 Proceedings of the Third international conference on Objects and databases
A multi-core software API for embedded MPSoC environments
MTPP'10 Proceedings of the Second Russia-Taiwan conference on Methods and tools of parallel programming multicomputers
Dynamic proportional share scheduling in Hadoop
JSSPP'10 Proceedings of the 15th international conference on Job scheduling strategies for parallel processing
An efficient distributed subgraph mining algorithm in extreme large graphs
AICI'10 Proceedings of the 2010 international conference on Artificial intelligence and computational intelligence: Part I
Efficient distributed test architectures for large-scale systems
ICTSS'10 Proceedings of the 22nd IFIP WG 6.1 international conference on Testing software and systems
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Parallel implementation of classification algorithms based on MapReduce
RSKT'10 Proceedings of the 5th international conference on Rough set and knowledge technology
CADRA: context aware data retrieval architecture
International Journal of Advanced Intelligence Paradigms
Scheduling divisible MapReduce computations
Journal of Parallel and Distributed Computing
Scalable Speculative Parallelization on Commodity Clusters
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Identifying topical authorities in microblogs
Proceedings of the fourth ACM international conference on Web search and data mining
Efficient indexing of repeated n-grams
Proceedings of the fourth ACM international conference on Web search and data mining
Batch query processing for web search engines
Proceedings of the fourth ACM international conference on Web search and data mining
Multidimensional mining of large-scale search logs: a topic-concept cube approach
Proceedings of the fourth ACM international conference on Web search and data mining
Learning website hierarchies for keyword enrichment in contextual advertising
Proceedings of the fourth ACM international conference on Web search and data mining
Automatic image semantic interpretation using social action and tagging data
Multimedia Tools and Applications
Parallel programming for multimedia applications
Multimedia Tools and Applications
Programming in Manticore, a heterogenous parallel functional language
CEFP'09 Proceedings of the Third summer school conference on Central European functional programming school
Recovery tasks: an automated approach to failure recovery
RV'10 Proceedings of the First international conference on Runtime verification
Signal/collect: graph algorithms for the (semantic) web
ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
Programming Support Innovations for Emerging Distributed Applications
Lifeline-based global load balancing
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Challenges and opportunities for efficient computing with FAWN
ACM SIGOPS Operating Systems Review
Future Generation Computer Systems
Parallelized K-Means clustering algorithm for self aware mobile ad-hoc networks
Proceedings of the 2011 International Conference on Communication, Computing & Security
Utilization of map-reduce for parallelization of resource scheduling using MPI: PRS
Proceedings of the 2011 International Conference on Communication, Computing & Security
CPLDP: an efficient large dataset processing system built on cloud platform
ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II
Map-reduce extensions and recursive queries
Proceedings of the 14th International Conference on Extending Database Technology
Proceedings of the 14th International Conference on Extending Database Technology
Big data and cloud computing: current state and future opportunities
Proceedings of the 14th International Conference on Extending Database Technology
RanKloud: a scalable ranked query processing framework on hadoop
Proceedings of the 14th International Conference on Extending Database Technology
Covariance in Unsupervised Learning of Probabilistic Grammars
The Journal of Machine Learning Research
Dremel: interactive analysis of web-scale datasets
Communications of the ACM
WebMapReduce: an accessible and adaptable tool for teaching map-reduce computing
Proceedings of the 42nd ACM technical symposium on Computer science education
Mechanisms that separate algorithms from implementations for parallel patterns
Proceedings of the 2010 Workshop on Parallel Programming Patterns
Energy-delay based provisioning for large datacenters: an energy-efficient and cost optimal approach
Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering
Scalable and cost-effective interconnection of data-center servers using dual server ports
IEEE/ACM Transactions on Networking (TON)
Parallel skyline computation on multicore architectures
Information Systems
PLDA+: Parallel latent dirichlet allocation with data placement and pipeline processing
ACM Transactions on Intelligent Systems and Technology (TIST)
RecLab: a system for eCommerce recommender research with real data, context and feedback
Proceedings of the 2011 Workshop on Context-awareness in Retrieval and Recommendation
A stochastic learning-to-rank algorithm and its application to contextual advertising
Proceedings of the 20th international conference on World wide web
Counting triangles and the curse of the last reducer
Proceedings of the 20th international conference on World wide web
Identifying breakpoints in public opinion
Proceedings of the First Workshop on Social Media Analytics
Communications of the ACM
Scarlett: coping with skewed content popularity in mapreduce clusters
Proceedings of the sixth conference on Computer systems
CPRS: A cloud-based program recommendation system for digital TV platforms
Future Generation Computer Systems
ASTERIX: towards a scalable, semistructured data platform for evolving-world models
Distributed and Parallel Databases
Strategies for preparing computer science students for the multicore world
Proceedings of the 2010 ITiCSE working group reports
Topology switching for data center networks
Hot-ICE'11 Proceedings of the 11th USENIX conference on Hot topics in management of internet, cloud, and enterprise networks and services
TritonSort: a balanced large-scale sorting system
Proceedings of the 8th USENIX conference on Networked systems design and implementation
CIEL: a universal execution engine for distributed data-flow computing
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Mesos: a platform for fine-grained resource sharing in the data center
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Implicitly threaded parallelism in manticore
Journal of Functional Programming
An overview of business intelligence technology
Communications of the ACM
Automatic optimization for MapReduce programs
Proceedings of the VLDB Endowment
Proceedings of the First Workshop on Building Analysis Datasets and Gathering Experience Returns for Security
A load-aware scheduler for MapReduce framework in heterogeneous cloud environments
Proceedings of the 2011 ACM Symposium on Applied Computing
Towards improved load balancing for data intensive distributed computing
Proceedings of the 2011 ACM Symposium on Applied Computing
Cloud application logging for forensics
Proceedings of the 2011 ACM Symposium on Applied Computing
A fast approach for parallel deduplication on multicore processors
Proceedings of the 2011 ACM Symposium on Applied Computing
A MapReduce workflow system for architecting scientific data intensive applications
Proceedings of the 2nd International Workshop on Software Engineering for Cloud Computing
An application architecture to facilitate multi-site clinical trial collaboration in the cloud
Proceedings of the 2nd International Workshop on Software Engineering for Cloud Computing
ASDF: an automated, online framework for diagnosing performance problems
Architecting dependable systems VII
Aspects of data-intensive cloud computing
From active data management to event-based systems and more
A hadoop-based packet trace processing tool
TMA'11 Proceedings of the Third international conference on Traffic monitoring and analysis
Distributed and fault-tolerant execution framework for transaction processing
Proceedings of the 4th Annual International Conference on Systems and Storage
sMapReduce: a programming pattern for wireless sensor networks
Proceedings of the 2nd Workshop on Software Engineering for Sensor Network Applications
Knowledge-Based Systems
Ripple: A publish/subscribe service for multidata item updates propagation in the cloud
Journal of Network and Computer Applications
Parallel evaluation of conjunctive queries
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Theory of data stream computing: where to go
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient parallel skyline processing using hyperplane projections
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
A latency and fault-tolerance optimizer for online parallel query plans
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Schedule optimization for data processing flows on the cloud
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Processing theta-joins using MapReduce
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Llama: leveraging columnar storage for scalable join processing in the MapReduce framework
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Fast personalized PageRank on MapReduce
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
A platform for scalable one-pass analytics using MapReduce
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Nova: continuous Pig/Hadoop workflows
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
A batch of PNUTS: experiences connecting cloud batch and serving systems
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Emerging trends in the enterprise data analytics: connecting Hadoop and DB2 warehouse
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Efficient processing of data warehousing queries in a split execution environment
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
RAFT at work: speeding-up mapreduce applications under task and node failures
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Filtering: a method for solving graph problems in MapReduce
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
On a local protocol for concurrent file transfers
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Semantically enriched event based model for web usage mining
WISE'10 Proceedings of the 11th international conference on Web information systems engineering
Structuring the unstructured middle with chunk computing
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Disk-locality in datacenter computing considered irrelevant
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Optimizing data partitioning for data-parallel computing
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Non-deterministic parallelism considered useful
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Garbage collection auto-tuning for Java mapreduce on multi-cores
Proceedings of the international symposium on Memory management
The tao of parallelism in algorithms
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Steno: automatic optimization of declarative queries
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
An automatic parallelization framework for algebraic computation systems
Proceedings of the 36th international symposium on Symbolic and algebraic computation
Clause-iteration with MapReduce to scalably query datagraphs in the SHARD graph-store
Proceedings of the fourth international workshop on Data-intensive distributed computing
Making a case for distributed file systems at Exascale
Proceedings of the third international workshop on Large-scale system and application performance
Adaptive data-driven service integrity attestation for multi-tenant cloud systems
Proceedings of the Nineteenth International Workshop on Quality of Service
OLIC: online information compression for scalable hosting infrastructure monitoring
Proceedings of the Nineteenth International Workshop on Quality of Service
Otus: resource attribution in data-intensive clusters
Proceedings of the second international workshop on MapReduce and its applications
Phoenix++: modular MapReduce for shared-memory systems
Proceedings of the second international workshop on MapReduce and its applications
Exploring MapReduce efficiency with highly-distributed data
Proceedings of the second international workshop on MapReduce and its applications
Tall and skinny QR factorizations in MapReduce architectures
Proceedings of the second international workshop on MapReduce and its applications
Rapid parallel genome indexing with MapReduce
Proceedings of the second international workshop on MapReduce and its applications
Full-text indexing for optimizing selection operations in large-scale data analytics
Proceedings of the second international workshop on MapReduce and its applications
MapReducing a genomic sequencing workflow
Proceedings of the second international workshop on MapReduce and its applications
The case for being lazy: how to leverage lazy evaluation in MapReduce
Proceedings of the 2nd international workshop on Scientific cloud computing
Magellan: experiences from a science cloud
Proceedings of the 2nd international workshop on Scientific cloud computing
Neptune: a domain specific language for deploying hpc software on cloud platforms
Proceedings of the 2nd international workshop on Scientific cloud computing
Just in time: adding value to the IO pipelines of high performance applications with JITStaging
Proceedings of the 20th international symposium on High performance distributed computing
InContext: simple parallelism for distributed applications
Proceedings of the 20th international symposium on High performance distributed computing
Towards efficient subgraph search in cloud computing environments
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications
Adapting skyline computation to the MapReduce framework: algorithms and experiments
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications
Optimized data placement for column-oriented data store in the distributed environment
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications
Batch text similarity search with MapReduce
APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
LinearDB: a relational approach to make data warehouse scale like MapReduce
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications: Part II
Towards quantitative analysis of data intensive computing: a case study of Hadoop
Proceedings of the 8th ACM international conference on Autonomic computing
Gatekeeper: supporting bandwidth guarantees for multi-tenant datacenter networks
WIOV'11 Proceedings of the 3rd conference on I/O virtualization
Large scale data mining using genetics-based machine learning
Proceedings of the 13th annual conference companion on Genetic and evolutionary computation
HiTune: dataflow-based performance analysis for big data cloud
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
In-situ MapReduce for log processing
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
G2: a graph processing system for diagnosing distributed systems
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
TidyFS: a simple and small distributed file system
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Italian for beginners: the next steps for SLO-based management
HotStorage'11 Proceedings of the 3rd USENIX conference on Hot topics in storage and file systems
SpamWatcher: a streaming social network analytic on the IBM wire-speed processor
Proceedings of the 5th ACM international conference on Distributed event-based system
Scheduling for real-time mobile MapReduce systems
Proceedings of the 5th ACM international conference on Distributed event-based system
A large scale distributed syntactic, semantic and lexical language model for machine translation
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Large-scale cross-document coreference using distributed inference and hierarchical models
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Fine-grained class label markup of search queries
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
CoHadoop: flexible data placement and its exploitation in Hadoop
Proceedings of the VLDB Endowment
Mavuno: a scalable and effective Hadoop-based paraphrase acquisition system
Proceedings of the Third Workshop on Large Scale Data Mining: Theory and Applications
Proceedings of the Ninth International Workshop on Dynamic Analysis
Cost optimized provisioning of elastic resources for application workflows
Future Generation Computer Systems
Regularized latent semantic indexing
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
No free lunch: brute force vs. locality-sensitive hashing for cross-lingual pairwise similarity
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Parallel learning to rank for information retrieval
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Parallelizing a convergent approximate inference method
Canadian AI'11 Proceedings of the 24th Canadian conference on Advances in artificial intelligence
Better never than late: meeting deadlines in datacenter networks
Proceedings of the ACM SIGCOMM 2011 conference
Managing data transfers in computer clusters with orchestra
Proceedings of the ACM SIGCOMM 2011 conference
Proceedings of the 15th International Software Product Line Conference, Volume 2
Spatial hardware implementation for sparse graph algorithms in GraphStep
ACM Transactions on Autonomous and Adaptive Systems (TAAS)
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
Semi-supervised ranking on very large graphs with rich metadata
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast clustering using MapReduce
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering very large multi-dimensional datasets with MapReduce
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
GBASE: a scalable and general graph management system
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Temporal multi-hierarchy smoothing for estimating rates of rare events
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
A correlation-aware data placement strategy for key-value stores
Proceedings of the 11th IFIP WG 6.1 international conference on Distributed applications and interoperable systems
Building a web-based parallel corpus and filtering out machine-translated text
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
New ideas track: testing mapreduce-style programs
Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
A multi-agent simulation framework on small Hadoop cluster
Engineering Applications of Artificial Intelligence
CloudVista: visual cluster exploration for extreme scale data in the cloud
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Energy proportionality and performance in data parallel computing clusters
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Scalable and automated workflow in mining large-scale severe-storm simulations
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Building a faceted browser in CouchDB using views on views and erlang metaprogramming
WFLP'11 Proceedings of the 20th international conference on Functional and constraint logic programming
Scalable OWL 2 reasoning for linked data
RW'11 Proceedings of the 7th international conference on Reasoning web: semantic technologies for the web of data
Towards systematic parallel programming over mapreduce
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Accelerating code on multi-cores with fastflow
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
ETLMR: a highly scalable dimensional ETL framework based on mapreduce
DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
Tagged mapreduce: efficiently computing multi-analytics using mapreduce
DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
An innovative approach to the development of E-government search services
EGOVIS'11 Proceedings of the Second international conference on Electronic government and the information systems perspective
N-party BAR Transfer: motivation, definition, and challenges
Proceedings of the 3rd International Workshop on Theoretical Aspects of Dynamic Distributed Systems
Disco: a computing platform for large-scale data analytics
Proceedings of the 10th ACM SIGPLAN workshop on Erlang
Balanced trees inhabiting functional parallel programming
Proceedings of the 16th ACM SIGPLAN international conference on Functional programming
Principles of distributed data management in 2020?
DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
An efficient quad-tree based index structure for cloud data management
WAIM'11 Proceedings of the 12th international conference on Web-age information management
Mining Concept Sequences from Large-Scale Search Logs for Context-Aware Query Suggestion
ACM Transactions on Intelligent Systems and Technology (TIST)
IDEAL'11 Proceedings of the 12th international conference on Intelligent data engineering and automated learning
Proceedings of the 2nd ACM Symposium on Cloud Computing
Incoop: MapReduce for incremental computations
Proceedings of the 2nd ACM Symposium on Cloud Computing
YCSB++: benchmarking and performance debugging advanced features in scalable table stores
Proceedings of the 2nd ACM Symposium on Cloud Computing
CoScan: cooperative scan sharing in the cloud
Proceedings of the 2nd ACM Symposium on Cloud Computing
PrIter: a distributed framework for prioritized iterative computations
Proceedings of the 2nd ACM Symposium on Cloud Computing
Making time-stepped applications tick in the cloud
Proceedings of the 2nd ACM Symposium on Cloud Computing
Small cache, big effect: provable load balancing for randomly partitioned cluster services
Proceedings of the 2nd ACM Symposium on Cloud Computing
Opportunistic flooding to improve TCP transmit performance in virtualized clouds
Proceedings of the 2nd ACM Symposium on Cloud Computing
Automatic management of partitioned, replicated search services
Proceedings of the 2nd ACM Symposium on Cloud Computing
Proceedings of the 2nd ACM Symposium on Cloud Computing
Indexing finite language representation of population genotypes
WABI'11 Proceedings of the 11th international conference on Algorithms in bioinformatics
Automatic physical database tuning middleware for web-based applications
ADBIS'11 Proceedings of the 15th international conference on Advances in databases and information systems
Cross-layer flow and congestion control for datacenter networks
Proceedings of the 3rd Workshop on Data Center - Converged and Virtual Ethernet Switching
Detecting failures in distributed systems with the Falcon spy network
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Extreme enumeration on GPU and in clouds: how many dollars you need to break SVP challenges
CHES'11 Proceedings of the 13th international conference on Cryptographic hardware and embedded systems
Private cloud computing techniques for inter-processing bioinformatics tools
ICHIT'11 Proceedings of the 5th international conference on Convergence and hybrid information technology
Improved sampling for triangle counting with MapReduce
ICHIT'11 Proceedings of the 5th international conference on Convergence and hybrid information technology
A distributed processing method for design patent retrieval system
WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part I
BitShred: feature hashing malware for scalable triage and semantic analysis
Proceedings of the 18th ACM conference on Computer and communications security
Morsa: a scalable approach for persisting and accessing large models
Proceedings of the 14th international conference on Model driven engineering languages and systems
Elastic phoenix: malleable mapreduce for shared-memory systems
NPC'11 Proceedings of the 8th IFIP international conference on Network and parallel computing
A way of key management in cloud storage based on trusted computing
NPC'11 Proceedings of the 8th IFIP international conference on Network and parallel computing
Large scale fuzzy pD* reasoning using mapreduce
ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part I
QueryPIE: backward reasoning for OWL horst over very large knowledge bases
ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part I
Zhishi.me: weaving chinese linking open data
ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part II
Simplified parallel domain traversal
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Hadoop acceleration through network levitated merge
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Purlieus: locality-aware resource allocation for MapReduce in a cloud
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
SciHadoop: array-based query processing in Hadoop
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
On the duality of data-intensive file system design: reconciling HDFS and PVFS
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Legal document clustering with built-in topic segmentation
Proceedings of the 20th ACM international conference on Information and knowledge management
Semi-indexing semi-structured data in tiny space
Proceedings of the 20th ACM international conference on Information and knowledge management
Continuous data stream query in the cloud
Proceedings of the 20th ACM international conference on Information and knowledge management
Block-based load balancing for entity resolution with MapReduce
Proceedings of the 20th ACM international conference on Information and knowledge management
Learning-based entity resolution with MapReduce
Proceedings of the third international workshop on Cloud data management
Incremental recomputations in MapReduce
Proceedings of the third international workshop on Cloud data management
The panel of experts cloud pattern
Proceedings of the third international workshop on Cloud data management
Proceedings of the ACM 14th international workshop on Data Warehousing and OLAP
Easy and effective parallel programmable ETL
Proceedings of the ACM 14th international workshop on Data Warehousing and OLAP
Scalable manipulation of archival web graphs
Proceedings of the 9th workshop on Large-scale and distributed informational retrieval
An implementation framework of mapreduce email social network analysis
Proceedings of the 6th ACM workshop on Wireless multimedia networking and computing
Performance evaluation of MapReduce using full virtualisation on a departmental cloud
International Journal of Applied Mathematics and Computer Science - SPECIAL SECTION: Efficient Resource Management for Grid-Enabled Applications
Programming micro-aerial vehicle swarms with karma
Proceedings of the 9th ACM Conference on Embedded Networked Sensor Systems
Cloud computing: programming model and information exchange mechanism
Proceedings of the 2011 International Conference on Innovative Computing and Cloud Computing
Processing of multimedia data using the P2G framework
MM '11 Proceedings of the 19th ACM international conference on Multimedia
Introducing scalable quantum approaches in language representation
QI'11 Proceedings of the 5th international conference on Quantum interaction
Scalable queries for large datasets using cloud computing: a case study
Proceedings of the 15th Symposium on International Database Engineering & Applications
Query optimization using column statistics in hive
Proceedings of the 15th Symposium on International Database Engineering & Applications
A parallel ACO algorithm to select terms to categorise longer documents
International Journal of Computational Science and Engineering
Searching and browsing Linked Data with SWSE: The Semantic Web Search Engine
Web Semantics: Science, Services and Agents on the World Wide Web
Building wavelet histograms on large data in MapReduce
Proceedings of the VLDB Endowment
Using the Gfarm File System as a POSIX Compatible Storage Platform for Hadoop MapReduce Applications
GRID '11 Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing
Evaluating the suitability of mapreduce for surface temperature analysis codes
Proceedings of the second international workshop on Data intensive computing in the clouds
Efficient processing of RDF graph pattern matching on MapReduce platforms
Proceedings of the second international workshop on Data intensive computing in the clouds
Dynamic split model of resource utilization in MapReduce
Proceedings of the second international workshop on Data intensive computing in the clouds
Design patterns for scientific applications in DryadLINQ CTP
Proceedings of the second international workshop on Data intensive computing in the clouds
Generic multiset programming with discrimination-based joins and symbolic Cartesian products
Higher-Order and Symbolic Computation
Enhancing application robustness in cloud data centers
Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research
Jeocrowd: collaborative searching of user-generated point datasets
Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
Parallel data processing with MapReduce: a survey
ACM SIGMOD Record
Cloudscape: language support to coordinate and control distributed applications in the cloud
Proceedings of the compilation of the co-located workshops on DSM'11, TMC'11, AGERE!'11, AOOPES'11, NEAT'11, & VMIL'11
Cloud computing and mapreduce for reliability and scalability of ubiquitous learning systems
Proceedings of the compilation of the co-located workshops on DSM'11, TMC'11, AGERE!'11, AOOPES'11, NEAT'11, & VMIL'11
Capturing topology in graph pattern matching
Proceedings of the VLDB Endowment
Relational approach for shortest path discovery over large graphs
Proceedings of the VLDB Endowment
Convergence in language design: a case of lightning striking four times in the same place
FLOPS'06 Proceedings of the 8th international conference on Functional and Logic Programming
Mining paraphrases from self-anchored web sentence fragments
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Database-centric programming for wide-area sensor systems
DCOSS'05 Proceedings of the First IEEE international conference on Distributed Computing in Sensor Systems
ClickRank: Learning Session-Context Models to Enrich Web Search Ranking
ACM Transactions on the Web (TWEB)
An enhanced ACO algorithm to select features for text categorization and its parallelization
Expert Systems with Applications: An International Journal
Of hammers and nails: an empirical comparison of three paradigms for processing large graphs
Proceedings of the fifth ACM international conference on Web search and data mining
Multi-pass sorted neighborhood blocking with MapReduce
Computer Science - Research and Development
New algorithms for join and grouping operations
Computer Science - Research and Development
A framework for utilising usage trends in the crawling and indexing process of search engines
International Journal of Knowledge and Web Intelligence
Out-of-core parallel frontier search with mapreduce
HPCS'09 Proceedings of the 23rd international conference on High Performance Computing Systems and Applications
Case study of scientific data processing on a cloud using hadoop
HPCS'09 Proceedings of the 23rd international conference on High Performance Computing Systems and Applications
Scalable splitting of massive data streams
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part II
Distribution rules for array database queries
DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
Riding the elephant: managing ensembles with hadoop
Proceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputers
MATE-EC2: a middleware for processing data with AWS
Proceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputers
Joshua 3.0: syntax-based machine translation with the Thrax grammar extractor
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Colorful triangle counting and a MapReduce implementation
Information Processing Letters
Densest subgraph in streaming and MapReduce
Proceedings of the VLDB Endowment
Shared work list: hacking amorphous data parallelism in UPC
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
WebPIE: A Web-scale Parallel Inference Engine using MapReduce
Web Semantics: Science, Services and Agents on the World Wide Web
Variable-Sized map and locality-aware reduce on public-resource grids
GPC'10 Proceedings of the 5th international conference on Advances in Grid and Pervasive Computing
CPRS: a cloud-based program recommendation system for digital TV platforms
GPC'10 Proceedings of the 5th international conference on Advances in Grid and Pervasive Computing
Distributed island-based query answering for expressive ontologies
GPC'10 Proceedings of the 5th international conference on Advances in Grid and Pervasive Computing
DPSP: distributed progressive sequential pattern mining on the cloud
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
OpenMP-style parallelism in data-centered multicore computing with R
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Aligning needles in a haystack: paraphrase acquisition across the web
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
GLADE: a scalable framework for efficient analytics
ACM SIGOPS Operating Systems Review
Interactive Dynamics for Visual Analysis
Queue - Micoprocessors
Clearing the clouds: a study of emerging scale-out workloads on modern hardware
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Iterative optimization for the data center
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
DVM: towards a datacenter-scale virtual machine
VEE '12 Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
Scalable and parallel reasoning in the semantic web
ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part II
Towards the ideal on-chip fabric for 1-to-many and many-to-1 communication
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
A stratified view of programming language parallelism for undergraduate CS education
Proceedings of the 43rd ACM technical symposium on Computer Science Education
Automated Mapping of the MapReduce Pattern onto Parallel Computing Platforms
Journal of Signal Processing Systems
A study on workload imbalance issues in data intensive distributed computing
DNIS'10 Proceedings of the 6th international conference on Databases in Networked Information Systems
ReStore: reusing results of MapReduce jobs
Proceedings of the VLDB Endowment
GreenHadoop: leveraging green energy in data-processing frameworks
Proceedings of the 7th ACM european conference on Computer Systems
LazyBase: trading freshness for performance in a scalable database
Proceedings of the 7th ACM european conference on Computer Systems
MadLINQ: large-scale distributed matrix computation for the cloud
Proceedings of the 7th ACM european conference on Computer Systems
Practical TDMA for datacenter ethernet
Proceedings of the 7th ACM european conference on Computer Systems
Nobody ever got fired for using Hadoop on a cluster
Proceedings of the 1st International Workshop on Hot Topics in Cloud Data Processing
Agent based cloud storage system
AIC'10/BEBI'10 Proceedings of the 10th WSEAS international conference on applied informatics and communications, and 3rd WSEAS international conference on Biomedical electronics and biomedical informatics
Cutting MapReduce cost with spot market
HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
The HybrEx model for confidentiality and privacy in cloud computing
HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
Large-scale incremental data processing with change propagation
HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
TransMR: data-centric programming beyond data parallelism
HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
HiTune: dataflow-based performance analysis for big data cloud
HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
In-situ MapReduce for log processing
HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
Chapter 14: building search computing applications
Search Computing
A universal calculus for stream processing languages
ESOP'10 Proceedings of the 19th European conference on Programming Languages and Systems
ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part II
The utility problem for lazy learners - towards a non-eager approach
ICCBR'10 Proceedings of the 18th international conference on Case-Based Reasoning Research and Development
OWL reasoning with WebPIE: calculating the closure of 100 billion triples
ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part I
JSAI-isAI'10 Proceedings of the 2010 international conference on New Frontiers in Artificial Intelligence
PerfXplain: debugging MapReduce job performance
Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment
A fast algorithm for constructing inverted files on heterogeneous platforms
Journal of Parallel and Distributed Computing
A parallel method for computing rough set approximations
Information Sciences: an International Journal
DAC: generic and automatic address configuration for data center networks
IEEE/ACM Transactions on Networking (TON)
Apriori-based frequent itemset mining algorithms on MapReduce
Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
The search for energy-efficient building blocks for the data center
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
RDFPath: path query processing on large RDF graphs with mapreduce
ESWC'11 Proceedings of the 8th international conference on The Semantic Web
High performance computing techniques for scaling image analysis workflows
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
Optimal trust mining and computing on keyed mapreduce
ESSoS'12 Proceedings of the 4th international conference on Engineering Secure Software and Systems
Mr. LDA: a flexible large scale topic modeling package using variational inference in MapReduce
Proceedings of the 21st international conference on World Wide Web
Distributed graph pattern matching
Proceedings of the 21st international conference on World Wide Web
Clustering and load balancing optimization for redundant content removal
Proceedings of the 21st international conference companion on World Wide Web
Scalable load balancing in cluster storage systems
Middleware'11 Proceedings of the 12th ACM/IFIP/USENIX international conference on Middleware
Resource-aware adaptive scheduling for mapreduce clusters
Middleware'11 Proceedings of the 12th ACM/IFIP/USENIX international conference on Middleware
Virtualizing stream processing
Middleware'11 Proceedings of the 12th ACM/IFIP/USENIX international conference on Middleware
Personal genomes: a new frontier in database research
DNIS'11 Proceedings of the 7th international conference on Databases in Networked Information Systems
DPillar: Dual-port server interconnection network for large scale data centers
Computer Networks: The International Journal of Computer and Telecommunications Networking
Community detection in Social Media
Data Mining and Knowledge Discovery
The HaLoop approach to large-scale iterative data analysis
The VLDB Journal — The International Journal on Very Large Data Bases
Dynamic routing of data stream tuples among parallel query plan running on multi-core processors
Distributed and Parallel Databases
Digital Preservation in Grids and Clouds: A Middleware Approach
Journal of Grid Computing
The curriculum forecast for Portland: cloudy with a chance of data
ACM SIGMOD Record
Serving large-scale batch computed data with project Voldemort
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Google's hybrid approach to research
Communications of the ACM
V-SMART-join: a scalable mapreduce framework for all-pair similarity joins of multisets and vectors
Proceedings of the VLDB Endowment
Distributed GraphLab: a framework for machine learning and data mining in the cloud
Proceedings of the VLDB Endowment
Proceedings of the 2011 ACM SIGPLAN X10 Workshop
A limits study of benefits from nanostore-based future data-centric system architectures
Proceedings of the 9th conference on Computing Frontiers
Approximate computation and implicit regularization for very large-scale data analysis
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Advanced partitioning techniques for massively distributed computation
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
SkewTune: mitigating skew in mapreduce applications
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Declarative error management for robust data-intensive applications
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
GUPT: privacy preserving data analysis made easy
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Managing and mining large graphs: patterns and algorithms
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Exploiting MapReduce-based similarity joins
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
GLADE: big data analytics made easy
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
ReStore: reusing results of MapReduce jobs in pig
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Walnut: a unified cloud object store
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Oracle in-database hadoop: when mapreduce meets RDBMS
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Large-scale machine learning at twitter
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Recurring job optimization in scope
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
CloudRAMSort: fast and efficient large-scale distributed RAM sort on shared-nothing cluster
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Minersoft: Software retrieval in grid and cloud computing infrastructures
ACM Transactions on Internet Technology (TOIT)
Improving performance of adaptive component-based dataflow middleware
Parallel Computing
NaaS: network-as-a-service in the cloud
Hot-ICE'12 Proceedings of the 2nd USENIX conference on Hot Topics in Management of Internet, Cloud, and Enterprise Networks and Services
Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Aiding the detection of fake accounts in large scale social online services
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
PACMan: coordinated memory caching for parallel jobs
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Optimizing data shuffling in data-parallel computation by understanding user-defined functions
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Orchestrating the deployment of computations in the cloud with conductor
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
The TCP outcast problem: exposing unfairness in data center networks
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Metronome: operating system level performance management via self-adaptive computing
Proceedings of the 49th Annual Design Automation Conference
Enabling e-science applications on the cloud with COMPSs
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing
Mapping application requirements to cloud resources
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing
A service-oriented taxonomical spectrum, cloudy challenges and opportunities of cloud computing
International Journal of Communication Systems
P2P-MapReduce: Parallel data processing in dynamic Cloud environments
Journal of Computer and System Sciences
Double dip map-reduce for processing cross validation jobs
Proceedings of the 27th Annual ACM Symposium on Applied Computing
A flexible parallel runtime for large scale block-based matrix multiplication
APWeb'12 Proceedings of the 14th international conference on Web Technologies and Applications
The efficiency of mapreduce in parallel external memory
LATIN'12 Proceedings of the 10th Latin American international conference on Theoretical Informatics
Inside "Big Data management": ogres, onions, or parfaits?
Proceedings of the 15th International Conference on Extending Database Technology
An optimization framework for map-reduce queries
Proceedings of the 15th International Conference on Extending Database Technology
Efficient parallel kNN joins for large data in MapReduce
Proceedings of the 15th International Conference on Extending Database Technology
Parallelizing top-down interprocedural analyses
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Panacea: towards holistic optimization of MapReduce applications
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Efficient SPARQL query processing in mapreduce through data partitioning and indexing
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Modeling transactional queries via templates
ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
ComMapReduce: an improvement of mapreduce with lightweight communication mechanisms
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part II
Halt or continue: estimating progress of queries in the cloud
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part II
Evaluating spatial keyword queries under the mapreduce framework
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications
Proceedings of the 15th International Conference on Database Theory
Compiler and runtime support for enabling reduction computations on heterogeneous systems
Concurrency and Computation: Practice & Experience
MapReduce in MPI for Large-scale graph algorithms
Parallel Computing
Exact and approximate computation of a histogram of pairwise distances between astronomical objects
Proceedings of the 2012 workshop on High-Performance Computing for Astronomy Date
C-MR: continuously executing MapReduce workflows on multi-core processors
Proceedings of third international workshop on MapReduce and its Applications Date
Pilot-MapReduce: an extensible and flexible MapReduce implementation for distributed data
Proceedings of third international workshop on MapReduce and its Applications Date
Parallel iterative compilation: using MapReduce to speedup machine learning in compilers
Proceedings of third international workshop on MapReduce and its Applications Date
SNP genotype calling with MapReduce
Proceedings of third international workshop on MapReduce and its Applications Date
Accelerate large-scale iterative computation through asynchronous accumulative updates
Proceedings of the 3rd workshop on Scientific Cloud Computing Date
Locality-aware dynamic VM reconfiguration on MapReduce clouds
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Fault tolerant parallel data-intensive algorithms
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Understanding the effects and implications of compute node related failures in hadoop
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Optimizing MapReduce for GPUs with effective shared memory usage
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Multi-resolution similarity hashing
Digital Investigation: The International Journal of Digital Forensics & Incident Response
Deadline-driven provisioning of resources for scientific applications in hybrid clouds with Aneka
Future Generation Computer Systems
Adapting scientific computing problems to clouds using MapReduce
Future Generation Computer Systems
Computers in Biology and Medicine
Finding and exploring memes in social media
Proceedings of the 23rd ACM conference on Hypertext and social media
Time and Cost Sensitive Data-Intensive Computing on Hybrid Clouds
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
A Map-Reduce Based Framework for Heterogeneous Processing Element Cluster Environments
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Resource Management for Elastic Cloud Workflows
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
A Workflow-Aware Storage System: An Opportunity Study
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Maestro: Replica-Aware Map Scheduling for MapReduce
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
ParaLite: Supporting Collective Queries in Database System to Parallelize User-Defined Executable
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
FunSQL: it is time to make SQL functional
Proceedings of the 2012 Joint EDBT/ICDT Workshops
Stormy: an elastic and highly available streaming service in the cloud
Proceedings of the 2012 Joint EDBT/ICDT Workshops
RDF data management in the Amazon cloud
Proceedings of the 2012 Joint EDBT/ICDT Workshops
Distributed KNN-graph approximation via hashing
Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Journal of Systems and Software
Preventing TCP incast throughput collapse at the initiation, continuation, and termination
Proceedings of the 2012 IEEE 20th International Workshop on Quality of Service
Large scale data mining using genetics-based machine learning
Proceedings of the 14th annual conference companion on Genetic and evolutionary computation
Designing good MapReduce algorithms
XRDS: Crossroads, The ACM Magazine for Students - Big Data
Big data and internships at Cloudera
XRDS: Crossroads, The ACM Magazine for Students - Big Data
Big data platforms: What's next?
XRDS: Crossroads, The ACM Magazine for Students - Big Data
Schönhage-Strassen algorithm with MapReduce for multiplying terabit integers
Proceedings of the 2011 International Workshop on Symbolic-Numeric Computation
From a calculus to an execution environment for stream processing
Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems
Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems
Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems
Efficient processing of k nearest neighbor joins using MapReduce
Proceedings of the VLDB Endowment
MapReduce indexing strategies: Studying scalability and efficiency
Information Processing and Management: an International Journal
GigaTensor: scaling tensor analysis up by 100 times - algorithms and discoveries
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Large-scale distributed non-negative sparse coding and sparse dictionary learning
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
PatentMiner: topic-driven patent analysis and mining
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
A self-adaptive computing framework for parallel maximum likelihood evaluation
The Journal of Supercomputing
Case study: stereo vision experiments with multi-core software API on embedded MPSoC environments
The Journal of Supercomputing
MapReduce for parallel reinforcement learning
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
MapReduce approach to collective classification for networks
ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part I
Computing in the fractal cloud: modular generic solvers for SAT and Q-SAT variants
TAMC'12 Proceedings of the 9th Annual international conference on Theory and Applications of Models of Computation
The only constant is change: incorporating time-varying network reservations in data centers
Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
A data flow language for hybrid query and programming languages
FLOPS'12 Proceedings of the 11th international conference on Functional and Logic Programming
The seven deadly sins of cloud computing research
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
A case for performance-centric network allocation
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Why let resources idle? aggressive cloning of jobs with dolly
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
MixApart: decoupled analytics for shared storage systems
HotStorage'12 Proceedings of the 4th USENIX conference on Hot Topics in Storage and File Systems
Composable reliability for asynchronous systems
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Managing large graphs on multi-cores with graph awareness
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Hybrid cloud support for large scale analytics and web processing
WebApps'12 Proceedings of the 3rd USENIX conference on Web Application Development
Systematic approach of using power save mode for cloud data processing services
International Journal of Ad Hoc and Ubiquitous Computing
A MapReduce-supported network structure for data centers
Concurrency and Computation: Practice & Experience
MapReduce-based similarity join for metric spaces
Proceedings of the 1st International Workshop on Cloud Intelligence
On saying "enough already!" in MapReduce
Proceedings of the 1st International Workshop on Cloud Intelligence
Parallelizing ListNet training using spark
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Stubby: a transformation-based optimizer for MapReduce workflows
Proceedings of the VLDB Endowment
Opening the black boxes in data flow optimization
Proceedings of the VLDB Endowment
Spinning fast iterative data flows
Proceedings of the VLDB Endowment
REX: recursive, delta-based data-centric computation
Proceedings of the VLDB Endowment
Parallel rough set based knowledge acquisition using MapReduce from big data
Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Delta-SimRank computing on MapReduce
Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
A software architecture for parallel list processing on grids
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
P2P approach to knowledge-based dynamic virtual organizations inception and management
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part II
Incremental DNA sequence analysis in the cloud
SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
PRISM: privacy-preserving search in mapreduce
PETS'12 Proceedings of the 12th international conference on Privacy Enhancing Technologies
Functory: a distributed computing library for objective caml
TFP'11 Proceedings of the 12th international conference on Trends in Functional Programming
HadoopRDF: a scalable semantic data analytical engine
ICIC'12 Proceedings of the 8th international conference on Intelligent Computing Theories and Applications
Distributed computation on dynamo-style distributed storage: riak pipe
Proceedings of the eleventh ACM SIGPLAN workshop on Erlang workshop
Software execution protection in the cloud
Proceedings of the 1st European Workshop on Dependable Cloud Computing
Distributed, real-time bayesian learning in online services
Proceedings of the sixth ACM conference on Recommender systems
Distributed formal concept analysis algorithms based on an iterative mapreduce framework
ICFCA'12 Proceedings of the 10th international conference on Formal Concept Analysis
AIMS'12 Proceedings of the 6th IFIP WG 6.6 international autonomous infrastructure, management, and security conference on Dependable Networks and Services
PQL: a purely-declarative java extension for parallel programming
ECOOP'12 Proceedings of the 26th European conference on Object-Oriented Programming
The unified logging infrastructure for data analytics at Twitter
Proceedings of the VLDB Endowment
The vertica analytic database: C-store 7 years later
Proceedings of the VLDB Endowment
Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads
Proceedings of the VLDB Endowment
Muppet: MapReduce-style processing of fast data
Proceedings of the VLDB Endowment
Avatara: OLAP for web-scale analytics products
Proceedings of the VLDB Endowment
MapReduce-based dimensional ETL made easy
Proceedings of the VLDB Endowment
CloudVista: interactive and economical visual cluster analysis for big data in the cloud
Proceedings of the VLDB Endowment
SkewTune in action: mitigating skew in MapReduce applications
Proceedings of the VLDB Endowment
MapReduce algorithms for big data analysis
Proceedings of the VLDB Endowment
Parallel implementation of ant-based clustering algorithm based on hadoop
ICSI'12 Proceedings of the Third international conference on Advances in Swarm Intelligence - Volume Part I
Algorithmic exploration of axiom spaces for efficient similarity search at large scale
SISAP'12 Proceedings of the 5th international conference on Similarity Search and Applications
Remote sensing image data storage and search method based on pyramid model in cloud
RSKT'12 Proceedings of the 7th international conference on Rough Sets and Knowledge Technology
ESM: efficient and scalable data center multicast routing
IEEE/ACM Transactions on Networking (TON)
Auto-parallelizing stateful distributed streaming applications
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Pushouts in software architecture design
Proceedings of the 11th International Conference on Generative Programming and Component Engineering
The only constant is change: incorporating time-varying network reservations in data centers
ACM SIGCOMM Computer Communication Review - Special october issue SIGCOMM '12
Application-driven energy-efficient architecture explorations for big data
Proceedings of the 1st Workshop on Architectures and Systems for Big Data
Hierarchical merge for scalable MapReduce
Proceedings of the 2012 workshop on Management of big data systems
Light-weight black-box failure detection for distributed systems
Proceedings of the 2012 workshop on Management of big data systems
Network-aware scheduling of mapreduce framework ondistributed clusters over high speed networks
Proceedings of the 2012 workshop on Cloud services, federation, and the 8th open cirrus summit
Federated cloud-based big data platform in telecommunications
Proceedings of the 2012 workshop on Cloud services, federation, and the 8th open cirrus summit
A scalable distributed syntactic, semantic, and lexical language model
Computational Linguistics
HFAA: a generic socket API for Hadoop file systems
Proceedings of the 2nd Workshop on Architectures and Systems for Big Data
HadoopPerceptron: a toolkit for distributed perceptron training and prediction with MapReduce
EACL '12 Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics
The intelius nickname collection: quantitative analyses from billions of public records
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors
ACM Transactions on Computer Systems (TOCS)
AutoMan: a platform for integrating human-based and digital computation
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Analyzing ultra-large-scale code corpus with boa
Proceedings of the 3rd annual conference on Systems, programming, and applications: software for humanity
Boa: analyzing ultra-large-scale code corpus
Proceedings of the 3rd annual conference on Systems, programming, and applications: software for humanity
ROARS: a robust object archival system for data intensive scientific computing
Distributed and Parallel Databases
Journal of Grid Computing
gbase: an efficient analysis platform for large graphs
The VLDB Journal — The International Journal on Very Large Data Bases
SCOPE: parallel databases meet MapReduce
The VLDB Journal — The International Journal on Very Large Data Bases
Programming model support for dependable, elastic cloud applications
HotDep'12 Proceedings of the Eighth USENIX conference on Hot Topics in System Dependability
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
GraphChi: large-scale graph computation on just a PC
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
COMET: code offload by migrating execution transparently
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Spotting code optimizations in data-parallel pipelines through PeriSCOPE
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Mitigating High Latency Outliers for Cloud-Based Telecommunication Services
Bell Labs Technical Journal
A study on data deduplication in HPC storage systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Accelerating MapReduce on a coupled CPU-GPU architecture
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
High performance RDMA-based design of HDFS over InfiniBand
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Combining in-situ and in-transit processing to enable extreme-scale scientific analysis
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
T: a data-centric cooling energy costs reduction approach for big data analytics cloud
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
SCALLA: A Platform for Scalable One-Pass Analytics Using MapReduce
ACM Transactions on Database Systems (TODS)
Partial Evaluation for Distributed XPath Query Processing and Beyond
ACM Transactions on Database Systems (TODS)
Scripting distributed scientific workflows using Weaver
Concurrency and Computation: Practice & Experience
OmpiJava: a tool for development of high-performance reasoning applications for the semantic web
Proceedings of the 2012 international workshop on Web-scale knowledge representation, retrieval and reasoning
Multimedia Applications and Security in MapReduce: Opportunities and Challenges
Concurrency and Computation: Practice & Experience
Cache-sensitive MapReduce DGEMM algorithms for shared memory architectures
Proceedings of the South African Institute for Computer Scientists and Information Technologists Conference
HEDC: a histogram estimator for data in the cloud
Proceedings of the fourth international workshop on Cloud data management
Type 2 slowly changing dimensions: a case study using the cooperating system
Proceedings of the fifteenth international workshop on Data warehousing and OLAP
Coflow: a networking abstraction for cluster applications
Proceedings of the 11th ACM Workshop on Hot Topics in Networks
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Large-scale discriminative language model reranking for voice-search
WLM '12 Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT
Improving large graph processing on partitioned graphs in the cloud
Proceedings of the Third ACM Symposium on Cloud Computing
Sailfish: a framework for large scale data processing
Proceedings of the Third ACM Symposium on Cloud Computing
Bridging the tenant-provider gap in cloud services
Proceedings of the Third ACM Symposium on Cloud Computing
Themis: an I/O-efficient MapReduce
Proceedings of the Third ACM Symposium on Cloud Computing
Untangling cluster management with Helix
Proceedings of the Third ACM Symposium on Cloud Computing
True elasticity in multi-tenant data-intensive compute clusters
Proceedings of the Third ACM Symposium on Cloud Computing
Designing good algorithms for MapReduce and beyond
Proceedings of the Third ACM Symposium on Cloud Computing
SymGrid: a framework for symbolic computation on the grid
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Large-scale clustering and complete facet and tag calculation
ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
Multiview hierarchical bayesian regression model andapplication to online advertising
Proceedings of the 21st ACM international conference on Information and knowledge management
Metaphor: a system for related search recommendations
Proceedings of the 21st ACM international conference on Information and knowledge management
Fast and scalable approximate spectral graph matching for correspondence problems
Information Sciences: an International Journal
Communications of the ACM
You can stop early with COLA: online processing of aggregate queries in the cloud
Proceedings of the 21st ACM international conference on Information and knowledge management
CloST: a hadoop-based storage system for big spatio-temporal data analytics
Proceedings of the 21st ACM international conference on Information and knowledge management
Efficient distributed locality sensitive hashing
Proceedings of the 21st ACM international conference on Information and knowledge management
AMADA: web data repositories in the amazon cloud
Proceedings of the 21st ACM international conference on Information and knowledge management
Environmental Modelling & Software
Join processing using Bloom filter in MapReduce
Proceedings of the 2012 ACM Research in Applied Computation Symposium
An approach to parallel class expression learning
RuleML'12 Proceedings of the 6th international conference on Rules on the Web: research and applications
Fast parallel algorithms for blocked dense matrix multiplication on shared memory architectures
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
A cloud architecture with an efficient scheduling technique
ICICA'12 Proceedings of the Third international conference on Information Computing and Applications
CC-MR --- finding connected components in huge graphs with mapreduce
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Cloud MapReduce for Monte Carlo bootstrap applied to Metabolic Flux Analysis
Future Generation Computer Systems
Design and implementation of GXP make - A workflow system based on make
Future Generation Computer Systems
A Distributed Cache for Hadoop Distributed File System in Real-Time Cloud Services
GRID '12 Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing
Accelerating Biomedical Data-Intensive Applications Using MapReduce
GRID '12 Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing
Developing a mobile recommender system
Proceedings of the 5th International Conference on PErvasive Technologies Related to Assistive Environments
Tuning ECN for data center networks
Proceedings of the 8th international conference on Emerging networking experiments and technologies
Datacast: a scalable and efficient reliable group data delivery service for data centers
Proceedings of the 8th international conference on Emerging networking experiments and technologies
Computing while charging: building a distributed computing infrastructure using smartphones
Proceedings of the 8th international conference on Emerging networking experiments and technologies
Using mapreduce to scale events correlation discovery for business processes mining
BPM'12 Proceedings of the 10th international conference on Business Process Management
Scalable load balancing in cluster storage systems
Proceedings of the 12th International Middleware Conference
Resource-aware adaptive scheduling for MapReduce clusters
Proceedings of the 12th International Middleware Conference
Virtualizing stream processing
Proceedings of the 12th International Middleware Conference
Regularized Latent Semantic Indexing: A New Approach to Large-Scale Topic Modeling
ACM Transactions on Information Systems (TOIS)
Optimizing large-scale Semi-Naïve datalog evaluation in hadoop
Datalog 2.0'12 Proceedings of the Second international conference on Datalog in Academia and Industry
Just-in-time data distribution for analytical query processing
ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems
Elastic Scalable Cloud Computing Using Application-Level Migration
UCC '12 Proceedings of the 2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing
A Hybrid Scheduling Algorithm for Data Intensive Workloads in a MapReduce Environment
UCC '12 Proceedings of the 2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing
Stochastic Tail-Phase Optimization for Bag-of-Tasks Execution in Clouds
UCC '12 Proceedings of the 2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing
Declarative secure distributed information systems
Computer Languages, Systems and Structures
Inexact subgraph isomorphism in MapReduce
Journal of Parallel and Distributed Computing
Accelerating text mining workloads in a MapReduce-based distributed GPU environment
Journal of Parallel and Distributed Computing
Abusing cloud-based browsers for fun and profit
Proceedings of the 28th Annual Computer Security Applications Conference
Optimizing and Tuning MapReduce Jobs to Improve the Large-Scale Data Analysis Process
International Journal of Intelligent Systems
MR-search: massively parallel heuristic search
Concurrency and Computation: Practice & Experience
Cogset: a high performance MapReduce engine
Concurrency and Computation: Practice & Experience
Scalable RDF data compression with MapReduce
Concurrency and Computation: Practice & Experience
Collaborative geospatial feature search
Proceedings of the 20th International Conference on Advances in Geographic Information Systems
TileHeat: a framework for tile selection
Proceedings of the 20th International Conference on Advances in Geographic Information Systems
Computing scientometrics in large-scale academic search engines with mapreduce
WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
A fast and high throughput SQL query system for big data
WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Toward scalable internet traffic measurement and analysis with Hadoop
ACM SIGCOMM Computer Communication Review
Mega-modeling for big data analytics
ER'12 Proceedings of the 31st international conference on Conceptual Modeling
TritonSort: A Balanced and Energy-Efficient Large-Scale Sorting System
ACM Transactions on Computer Systems (TOCS)
Streaming big data with self-adjusting computation
DDFP '13 Proceedings of the 2013 workshop on Data driven functional programming
SemanMR: big data processing framework based on semantics
Proceedings of the Fourth Asia-Pacific Symposium on Internetware
An effective and efficient parallel approach for random graph generation over GPUs
Journal of Parallel and Distributed Computing
A decentralized approach for mining event correlations in distributed system monitoring
Journal of Parallel and Distributed Computing
A RAMCloud Storage System based on HDFS: Architecture, implementation and evaluation
Journal of Systems and Software
Large-scale ranking and selection using cloud computing
Proceedings of the Winter Simulation Conference
Theia: visual signatures for problem diagnosis in large hadoop clusters
lisa'12 Proceedings of the 26th international conference on Large Installation System Administration: strategies, tools, and techniques
DISRAY: A distributed ray tracing by map-reduce
Computers & Geosciences
Bridging the gap between applications and networks in data centers
ACM SIGOPS Operating Systems Review
Optimizing parallel algorithms for all pairs similarity search
Proceedings of the sixth ACM international conference on Web search and data mining
Ursa: Scalable Load and Power Management in Cloud Storage Systems
ACM Transactions on Storage (TOS)
iBigTable: practical data integrity for bigtable in public cloud
Proceedings of the third ACM conference on Data and application security and privacy
ICWE'12 Proceedings of the 12th international conference on Current Trends in Web Engineering
Cloud Computing: Locally Sub-Clouds instead of Globally One Cloud
International Journal of Cloud Applications and Computing
Modeling and Analyzing User Contexts for Mobile Advertising
International Journal of Handheld Computing Research
Data-only flattening for nested data parallelism
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
An efficient programming model for memory-intensive recursive algorithms using parallel disks
Proceedings of the 37th International Symposium on Symbolic and Algebraic Computation
Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
Makeflow: a portable abstraction for data intensive computing on clusters, clouds, and grids
Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies
Computational Engineering in the Cloud: Benefits and Challenges
Journal of Organizational and End User Computing
Sort-based parallel loading of R-trees
Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data
MRBS: towards dependability benchmarking for hadoop mapreduce
Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Themis: energy efficient management of workloads in virtualized data centers
Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Multimedia information retrieval in a social context
PROMISE'12 Proceedings of the 2012 international conference on Information Retrieval Meets Information Visualization
Medical (visual) information retrieval
PROMISE'12 Proceedings of the 2012 international conference on Information Retrieval Meets Information Visualization
ESOP'13 Proceedings of the 22nd European conference on Programming Languages and Systems
Discovering math APIs by mining unit tests
FASE'13 Proceedings of the 16th international conference on Fundamental Approaches to Software Engineering
Paragon: QoS-aware scheduling for heterogeneous datacenters
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Unikernels: library operating systems for the cloud
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Breaking the MapReduce stage barrier
Cluster Computing
A study of unpredictability in fault-tolerant middleware
Computer Networks: The International Journal of Computer and Telecommunications Networking
Eagle-eyed elephant: split-oriented indexing in Hadoop
Proceedings of the 16th International Conference on Extending Database Technology
Computing n-gram statistics in MapReduce
Proceedings of the 16th International Conference on Extending Database Technology
Processing XML queries and updates on map/reduce clusters
Proceedings of the 16th International Conference on Extending Database Technology
CloudSVM: training an SVM classifier in cloud computing systems
ICPCA/SWS'12 Proceedings of the 2012 international conference on Pervasive Computing and the Networked World
A document-based data warehousing approach for large scale data mining
ICPCA/SWS'12 Proceedings of the 2012 international conference on Pervasive Computing and the Networked World
Exploiting and Evaluating MapReduce for Large-Scale Graph Mining
ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Using Pregel-like Large Scale Graph Processing Frameworks for Social Network Analysis
ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Proceedings of the International Conference on Management of Emergent Digital EcoSystems
Proceedings of the 5th ACM COMPUTE Conference: Intelligent & scalable system technologies
Incremental stream processing using computational conflict-free replicated data types
Proceedings of the 3rd International Workshop on Cloud Data and Platforms
There is no getting around it: you are building a distributed system
Communications of the ACM
Parallel evolutionary computation in bioinformatics applications
Computer Methods and Programs in Biomedicine
A simple aggregative algorithm for counting triangulations of planar point sets and related problems
Proceedings of the twenty-ninth annual symposium on Computational geometry
Interference and locality-aware task scheduling for MapReduce applications in virtual clusters
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
CamCubeOS: a key-based network stack for 3D torus cluster topologies
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
IBIS: interposed big-data I/O scheduler
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Efficient analytics on ordered datasets using MapReduce
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Iterative parallel data processing with stratosphere: an inside look
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
The big data ecosystem at LinkedIn
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
MRSG - A MapReduce simulator over SimGrid
Parallel Computing
Towards a scalable intrusion detection system based on parallel PSO clustering using mapreduce
Proceedings of the 15th annual conference companion on Genetic and evolutionary computation
Active disk meets flash: a case for intelligent SSDs
Proceedings of the 27th international ACM conference on International conference on supercomputing
Communication steps for parallel query processing
Proceedings of the 32nd symposium on Principles of database systems
Photon: fault-tolerant and scalable joining of continuous data streams
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Cumulon: optimizing statistical data analysis in the cloud
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Mind the gap: large-scale frequent sequence mining
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Shark: SQL and rich analytics at scale
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
TimeStream: reliable stream computation in the cloud
Proceedings of the 8th ACM European Conference on Computer Systems
Mizan: a system for dynamic load balancing in large-scale graph processing
Proceedings of the 8th ACM European Conference on Computer Systems
Choosy: max-min fair sharing for datacenter jobs with constraints
Proceedings of the 8th ACM European Conference on Computer Systems
CPI2: CPU performance isolation for shared compute clusters
Proceedings of the 8th ACM European Conference on Computer Systems
A bloat-aware design for big data applications
Proceedings of the 2013 international symposium on memory management
Relational large scale multi-label classification method for video categorization
Multimedia Tools and Applications
Distributed and Parallel Databases
Modeling performance of a parallel streaming engine: bridging theory and costs
Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering
A framework for partitioning and execution of data stream applications in mobile cloud computing
ACM SIGMETRICS Performance Evaluation Review
On distributed computation rate optimization for deploying cloud computing programming frameworks
ACM SIGMETRICS Performance Evaluation Review
High-resolution spatial interpolation on cloud platforms
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Large scale data mining using genetics-based machine learning
Proceedings of the 15th annual conference companion on Genetic and evolutionary computation
Scaling big data mining infrastructure: the twitter experience
ACM SIGKDD Explorations Newsletter
Zone-based data striping for cloud storage
IBM Journal of Research and Development
Case-based reasoning in comparative effectiveness research
IBM Journal of Research and Development
Effective straggler mitigation: attack of the clones
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Bobtail: avoiding long tails in the cloud
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Issues in big data testing and benchmarking
Proceedings of the Sixth International Workshop on Testing Database Systems
DeepSea: self-adaptive data partitioning and replication in scalable distributed data systems
Proceedings of the 2013 Sigmod/PODS Ph.D. symposium on PhD symposium
Scalable motif detection and aggregation
ADC '12 Proceedings of the Twenty-Third Australasian Database Conference - Volume 124
Energy efficiency for MapReduce workloads: an in-depth study
ADC '12 Proceedings of the Twenty-Third Australasian Database Conference - Volume 124
Proceedings of the 3rd international workshop on Emerging computational methods for the life sciences
Exploiting MapReduce and data compression for data-intensive applications
Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery
GPS: a graph processing system
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Memory array protection: check on read or check on write?
Proceedings of the Conference on Design, Automation and Test in Europe
Energy-efficient in-memory database computing
Proceedings of the Conference on Design, Automation and Test in Europe
Leveraging endpoint flexibility in data-intensive clusters
Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM
Speeding up distributed request-response workflows
Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM
Fast greedy algorithms in mapreduce and streaming
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
MapReduce with communication overlap (MaRCO)
Journal of Parallel and Distributed Computing
A case for dynamic memory partitioning in data centers
Proceedings of the Second Workshop on Data Analytics in the Cloud
Boa: a language and infrastructure for analyzing ultra-large-scale software repositories
Proceedings of the 2013 International Conference on Software Engineering
Green streams for data-intensive software
Proceedings of the 2013 International Conference on Software Engineering
Crowdsourcing MapReduce: JSMapReduce
Proceedings of the 22nd international conference on World Wide Web companion
Large-scale social-media analytics on stratosphere
Proceedings of the 22nd international conference on World Wide Web companion
Optimizing RDF(S) queries on cloud platforms
Proceedings of the 22nd international conference on World Wide Web companion
Towards highly scalable pregel-based graph processing platform with x10
Proceedings of the 22nd international conference on World Wide Web companion
SAMOA: a platform for mining big data streams
Proceedings of the 22nd international conference on World Wide Web companion
Dynamic memory allocation policies for postings in real-time Twitter search
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
StreamHub: a massively parallel architecture for high-performance content-based publish/subscribe
Proceedings of the 7th ACM international conference on Distributed event-based systems
Tutorial: stream processing optimizations
Proceedings of the 7th ACM international conference on Distributed event-based systems
Upper and lower bounds on the cost of a map-reduce computation
Proceedings of the VLDB Endowment
Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages
Proceedings of the 22nd international conference on World Wide Web
CopyCatch: stopping group attacks by spotting lockstep behavior in social networks
Proceedings of the 22nd international conference on World Wide Web
WTF: the who to follow service at Twitter
Proceedings of the 22nd international conference on World Wide Web
Automatic parallelization of canonical loops
Science of Computer Programming
A vlHMM approach to context-aware search
ACM Transactions on the Web (TWEB)
A distributed framework for scaling Up LSH-based computations in privacy preserving record linkage
Proceedings of the 6th Balkan Conference in Informatics
Large-scale computation not at the cost of expressiveness
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
The case for tiny tasks in compute clusters
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Protocol Responsibility Offloading to Improve TCP Throughput in Virtualized Environments
ACM Transactions on Computer Systems (TOCS)
Fast candidate generation for real-time tweet search with bloom filter chains
ACM Transactions on Information Systems (TOIS)
A scalable, non-parametric anomaly detection framework for Hadoop
Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference
An adaptive data transfer algorithm using block device reconfiguration in virtual MapReduce clusters
Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference
Cloud MapReduce for particle filter-based data assimilation for wildfire spread simulation
Proceedings of the High Performance Computing Symposium
Comparing NoSQL MongoDB to an SQL DB
Proceedings of the 51st ACM Southeast Conference
A study on Twitter user-follower network: a network based analysis
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Efficient mining of frequent itemsets in social network data based on MapReduce framework
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
New wine in old skins: the case for distributed operating systems in the data center
Proceedings of the 4th Asia-Pacific Workshop on Systems
Cache conscious star-join in MapReduce environments
Proceedings of the 2nd International Workshop on Cloud Intelligence
i2MapReduce: incremental iterative MapReduce
Proceedings of the 2nd International Workshop on Cloud Intelligence
Answering: techniques and deployment experience
IEEE/ACM Transactions on Networking (TON)
ICTCP: incast congestion control for TCP in data-center networks
IEEE/ACM Transactions on Networking (TON)
Distributed data management using MapReduce
ACM Computing Surveys (CSUR)
Web Portal for Matching Loan Requests and Investment Offers in Peer-To-Peer Lending
International Journal of Web Portals
Parallelizing the execution of sequential scripts
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
SIDR: structure-aware intelligent data routing in Hadoop
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
CooMR: cross-task coordination for efficient data management in MapReduce programs
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Towards efficient indexing of arbitrary similarity: vision paper
ACM SIGMOD Record
Scalable Data Processing for Community Sensing Applications
Mobile Networks and Applications
HAT: history-based auto-tuning MapReduce in heterogeneous environments
The Journal of Supercomputing
PATRIC: a parallel algorithm for counting triangles in massive networks
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
On segmentation of eCommerce queries
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Scalable Distributed Two-Layer Data Structures SD2DS
International Journal of Distributed Systems and Technologies
Data-Intensive Cloud Computing: Requirements, Expectations, Challenges, and Solutions
Journal of Grid Computing
Consolidated cluster systems for data centers in the cloud age: a survey and analysis
Frontiers of Computer Science: Selected Publications from Chinese Universities
Developing an optimized application hosting framework in Clouds
Journal of Computer and System Sciences
DrunkardMob: billions of random walks on just a PC
Proceedings of the 7th ACM conference on Recommender systems
MRPacker: an SQL to mapreduce optimizer
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Mining search and browse logs for web search: A Survey
ACM Transactions on Intelligent Systems and Technology (TIST) - Survey papers, special sections on the semantic adaptive social web, intelligent systems for health informatics, regular papers
Simplifying MapReduce data processing
International Journal of Computational Science and Engineering
Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Crowd crawling: towards collaborative data collection for large-scale online social networks
Proceedings of the first ACM conference on Online social networks
Scheduling data processing flows under budget constraint on the cloud
Proceedings of the 2013 Research in Adaptive and Convergent Systems
Matching bounds for the all-pairs MapReduce problem
Proceedings of the 17th International Database Engineering & Applications Symposium
Mining source code repositories with boa
Proceedings of the 2013 companion publication for conference on Systems, programming, & applications: software for humanity
Task fusion: improving utilization of multi-user clusters
Proceedings of the 2013 companion publication for conference on Systems, programming, & applications: software for humanity
Processing online aggregation on skewed data in mapreduce
Proceedings of the fifth international workshop on Cloud data management
When private set intersection meets big data: an efficient and scalable protocol
Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security
Proceedings of the 12th international conference on Generative programming: concepts & experiences
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
Privacy-preserving billing for e-ticketing systems in public transportation
Proceedings of the 12th ACM workshop on Workshop on privacy in the electronic society
Verifying computations with state
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Discretized streams: fault-tolerant streaming computation at scale
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Naiad: a timely dataflow system
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
The family of mapreduce and large-scale data processing systems
ACM Computing Surveys (CSUR)
EventWave: programming model and runtime support for tightly-coupled elastic cloud applications
Proceedings of the 4th annual Symposium on Cloud Computing
Limplock: understanding the impact of limpware on scale-out cloud systems
Proceedings of the 4th annual Symposium on Cloud Computing
Scale-up vs scale-out for Hadoop: time to rethink?
Proceedings of the 4th annual Symposium on Cloud Computing
Efficient distributed multi-dimensional index for big data management
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Combination of in-memory graph computation with mapreduce: a subgraph-centric method of pagerank
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Privacy-preserving logistic regression outsourcing in cloud computing
International Journal of Grid and Utility Computing
A fast algorithm for clustering with mapreduce
ISNN'13 Proceedings of the 10th international conference on Advances in Neural Networks - Volume Part I
MBrace: cloud computing with monads
Proceedings of the Seventh Workshop on Programming Languages and Operating Systems
Proceedings of the Seventh Workshop on Programming Languages and Operating Systems
Towards a general framework for secure MapReduce computation on hybrid clouds
Proceedings of the 4th annual Symposium on Cloud Computing
Big data begets big database theory
BNCOD'13 Proceedings of the 29th British National conference on Big Data
Representing mapreduce optimisations in the nested relational calculus
BNCOD'13 Proceedings of the 29th British National conference on Big Data
System level formal verification via model checking driven simulation
CAV'13 Proceedings of the 25th international conference on Computer Aided Verification
Greening data center networks with throughput-guaranteed power-aware routing
Computer Networks: The International Journal of Computer and Telecommunications Networking
A catalog of stream processing optimizations
ACM Computing Surveys (CSUR)
Strong simulation: Capturing topology in graph pattern matching
ACM Transactions on Database Systems (TODS)
TripCloud: an intelligent cloud-based trip recommendation system
SSTD'13 Proceedings of the 13th international conference on Advances in Spatial and Temporal Databases
PonIC: using stratosphere to speed up pig analytics
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
MROrder: flexible job ordering optimization for online mapreduce workloads
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Gunther: search-based auto-tuning of mapreduce
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
A distributed rule execution mechanism based on MapReduce in sematic web reasoning
Proceedings of the 5th Asia-Pacific Symposium on Internetware
Generating request streams on Big Data using clustered renewal processes
Performance Evaluation
A novel real-time framework for extracting patterns from trajectory data streams
Proceedings of the 4th ACM SIGSPATIAL International Workshop on GeoStreaming
On the core affinity and file upload performance of Hadoop
DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
A framework for an in-depth comparison of scale-up and scale-out
DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
P2EST: parallelization philosophies for evaluating spatio-temporal queries
Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data
Per-packet load-balanced, low-latency routing for clos-based data center networks
Proceedings of the ninth ACM conference on Emerging networking experiments and technologies
Explicit multipath congestion control for data center networks
Proceedings of the ninth ACM conference on Emerging networking experiments and technologies
Bullet trains: a study of NIC burst behavior at microsecond timescales
Proceedings of the ninth ACM conference on Emerging networking experiments and technologies
PIKACHU: how to rebalance load in optimizing mapreduce on heterogeneous clusters
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Managing the network with Merlin
Proceedings of the Twelfth ACM Workshop on Hot Topics in Networks
Hadoop's adolescence: an analysis of Hadoop usage in scientific workloads
Proceedings of the VLDB Endowment
Continuous cloud-scale query optimization and processing
Proceedings of the VLDB Endowment
Piranha: optimizing short jobs in Hadoop
Proceedings of the VLDB Endowment
F1: a distributed SQL database that scales
Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment
Scalable column concept determination for web tables using large knowledge bases
Proceedings of the VLDB Endowment
Hone: "Scaling down" Hadoop on shared-memory systems
Proceedings of the VLDB Endowment
Single image super-resolution based on space structure learning
Pattern Recognition Letters
SAPPHIRE: A toolkit for building efficient stream programs for medical video analysis
Computer Methods and Programs in Biomedicine
Active data: a data-centric approach to data life-cycle management
PDSW '13 Proceedings of the 8th Parallel Data Storage Workshop
Integrating big data into the computing curricula
Proceedings of the 45th ACM technical symposium on Computer science education
A big data based data storage systems for rock burst experiment
International Journal of Wireless and Mobile Computing
Quasar: resource-efficient and QoS-aware cluster management
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Optimized data management for e-learning in the clouds towards Cloodle
Proceedings of the Fourth Symposium on Information and Communication Technology
Campaign extraction from social media
ACM Transactions on Intelligent Systems and Technology (TIST) - Special Section on Intelligent Mobile Knowledge Discovery and Management Systems and Special Issue on Social Web Mining
Leveraging Social Feedback to Verify Online Identity Claims
ACM Transactions on the Web (TWEB)
Simplifying Scalable Graph Processing with a Domain-Specific Language
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Scalable progressive analytics on big data in the cloud
Proceedings of the VLDB Endowment
Distributed socialite: a datalog-based language for large-scale graph analysis
Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment
A Novel Cost-Effective Interconnection Networks of Modular Datacenters for the Cloud Computing
Proceedings of the Second International Conference on Innovative Computing and Cloud Computing
Is it really you?: user identification via adaptive behavior fingerprinting
Proceedings of the 4th ACM conference on Data and application security and privacy
DIMO: distributed index for matching multimedia objects using MapReduce
Proceedings of the 5th ACM Multimedia Systems Conference
Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication
An event-based platform for collaborative threats detection and monitoring
Information Systems
Achieving Accountable MapReduce in cloud computing
Future Generation Computer Systems
ComMapReduce: An improvement of MapReduce with lightweight communication mechanisms
Data & Knowledge Engineering
Journal of Parallel and Distributed Computing
Google hostload prediction based on Bayesian model with optimized feature combination
Journal of Parallel and Distributed Computing
An adaptable system for RGB-D based human body detection and pose estimation
Journal of Visual Communication and Image Representation
MobileFBP: Designing portable reconfigurable applications for heterogeneous systems
Journal of Systems Architecture: the EUROMICRO Journal
Joint virtual machine assignment and traffic engineering for green data center networks
ACM SIGMETRICS Performance Evaluation Review
GLB: lifeline-based global load balancing library in x10
Proceedings of the first workshop on Parallel programming for analytics applications
Acquisition of open-domain classes via intersective semantics
Proceedings of the 23rd international conference on World wide web
Parallel skyline queries over uncertain data streams in cloud computing environments
International Journal of Web and Grid Services
Supporting soft real-time parallel applications on multiprocessors
Journal of Systems Architecture: the EUROMICRO Journal
Proceedings of the 5th ACM/SPEC international conference on Performance engineering
Speeding up processing data from millions of smart meters
Proceedings of the 5th ACM/SPEC international conference on Performance engineering
Journal of Visual Communication and Image Representation
Eliminating unscalable communication in transaction processing
The VLDB Journal — The International Journal on Very Large Data Bases
Cloud Service Platform: Hospital Information eXchangeHIX
International Journal of Information Systems in the Service Sector
The Journal of Supercomputing
A Measurement Study of Data-Intensive Network Traffic Patterns in a Private Cloud
UCC '13 Proceedings of the 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing
Scalable and Real-Time Deep Packet Inspection
UCC '13 Proceedings of the 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing
Beyond IaaS and PaaS: An Extended Cloud Taxonomy for Computation, Storage and Networking
UCC '13 Proceedings of the 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing
A Scalable Distributed Framework for Efficient Analytics on Ordered Datasets
UCC '13 Proceedings of the 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing
International Journal of Approximate Reasoning
Automatic Skeleton-Driven Memory Affinity for Transactional Worklist Applications
International Journal of Parallel Programming
MixApart: decoupled analytics for shared storage systems
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Fundamenta Informaticae - Scalable Workflow Enactment Engines and Technology
Distributed media indexing based on MPI and MapReduce
Multimedia Tools and Applications
Balancing reducer workload for skewed data using sampling-based partitioning
Computers and Electrical Engineering
Design and implementation of a cloud computing service for finite element analysis
Advances in Engineering Software
A platform for eXtreme analytics
IBM Journal of Research and Development
GPFS-SNC: an enterprise cluster file system for big data
IBM Journal of Research and Development
IBM streams processing language: analyzing big data in motion
IBM Journal of Research and Development
Catch the whole lot in an action: rapid precise packet loss notification in data centers
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Libra: divide and conquer to verify forwarding tables in huge networks
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Aggregation and degradation in JetStream: streaming analytics in the wide area
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.11 |
MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day.