SCOPE: easy and efficient parallel processing of massive data sets
Proceedings of the VLDB Endowment
A comparison of approaches to large-scale data analysis
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
MapReduce and parallel DBMSs: friends or foes?
Communications of the ACM - Amir Pnueli: Ahead of His Time
Xbase: cloud-enabled information appliance for healthcare
Proceedings of the 13th International Conference on Extending Database Technology
An unobtrusive behavioral model of "gross national happiness"
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Boom analytics: exploring data-centric, declarative programming for the cloud
Proceedings of the 5th European conference on Computer systems
HadoopToSQL: a mapReduce query optimizer
Proceedings of the 5th European conference on Computer systems
Distributed indexing of web scale datasets for the cloud
Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
SPARQL basic graph pattern processing with iterative MapReduce
Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
Nephele/PACTs: a programming model and execution framework for web-scale analytical processing
Proceedings of the 1st ACM symposium on Cloud computing
Towards automatic optimization of MapReduce programs
Proceedings of the 1st ACM symposium on Cloud computing
Integrating hadoop and parallel DBMs
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Ricardo: integrating R and Hadoop
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Online aggregation and continuous query support in MapReduce
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A Map-Reduce System with an Alternate API for Multi-core Environments
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
An overview of the Open Science Data Cloud
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Massive Semantic Web data compression with MapReduce
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
See spot run: using spot instances for mapreduce workflows
HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
ESQP: an efficient SQL query processing for cloud data management
CloudDB '10 Proceedings of the second international workshop on Cloud data management
Benchmarking cloud-based data management systems
CloudDB '10 Proceedings of the second international workshop on Cloud data management
Comparing Hadoop and Fat-Btree based access method for small file I/O applications
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Merging file systems and data bases to fit the grid
Globe'10 Proceedings of the Third international conference on Data management in grid and peer-to-peer systems
Multidimensional arrays for warehousing data on clouds
Globe'10 Proceedings of the Third international conference on Data management in grid and peer-to-peer systems
The performance of MapReduce: an in-depth study
Proceedings of the VLDB Endowment
MRShare: sharing across multiple queries in MapReduce
Proceedings of the VLDB Endowment
Hadoop++: making a yellow elephant run like a cheetah (without it even noticing)
Proceedings of the VLDB Endowment
Cheetah: a high performance, custom data warehouse on top of MapReduce
Proceedings of the VLDB Endowment
Integrating MapReduce and RDBMSs
Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative Research
Piccolo: building fast, distributed programs with partitioned tables
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Online querying of d-dimensional hierarchies
Journal of Parallel and Distributed Computing
Big data and cloud computing: current state and future opportunities
Proceedings of the 14th International Conference on Extending Database Technology
An overview of business intelligence technology
Communications of the ACM
A load-aware scheduler for MapReduce framework in heterogeneous cloud environments
Proceedings of the 2011 ACM Symposium on Applied Computing
A cloud-enabled regional climate model evaluation system
Proceedings of the 2nd International Workshop on Software Engineering for Cloud Computing
An application architecture to facilitate multi-site clinical trial collaboration in the cloud
Proceedings of the 2nd International Workshop on Software Engineering for Cloud Computing
Parallel evaluation of conjunctive queries
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A platform for scalable one-pass analytics using MapReduce
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
CoHadoop: flexible data placement and its exploitation in Hadoop
Proceedings of the VLDB Endowment
An intermediate algebra for optimizing RDF graph pattern matching on MapReduce
ESWC'11 Proceedings of the 8th extended semantic web conference on The semanic web: research and applications - Volume Part II
New ideas track: testing mapreduce-style programs
Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
Brown Dwarf: A fully-distributed, fault-tolerant data warehousing system
Journal of Parallel and Distributed Computing
ETLMR: a highly scalable dimensional ETL framework based on mapreduce
DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
Data integration over NoSQL stores using access path based mappings
DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
Proceedings of the 2nd ACM Symposium on Cloud Computing
CoScan: cooperative scan sharing in the cloud
Proceedings of the 2nd ACM Symposium on Cloud Computing
Query optimization for massively parallel data processing
Proceedings of the 2nd ACM Symposium on Cloud Computing
PrIter: a distributed framework for prioritized iterative computations
Proceedings of the 2nd ACM Symposium on Cloud Computing
Comparing high level mapreduce query languages
APPT'11 Proceedings of the 9th international conference on Advanced parallel processing technologies
Proceedings of the ACM 14th international workshop on Data Warehousing and OLAP
Scalable queries for large datasets using cloud computing: a case study
Proceedings of the 15th Symposium on International Database Engineering & Applications
Query optimization using column statistics in hive
Proceedings of the 15th Symposium on International Database Engineering & Applications
Building wavelet histograms on large data in MapReduce
Proceedings of the VLDB Endowment
Efficient processing of RDF graph pattern matching on MapReduce platforms
Proceedings of the second international workshop on Data intensive computing in the clouds
Parallel data processing with MapReduce: a survey
ACM SIGMOD Record
Of hammers and nails: an empirical comparison of three paradigms for processing large graphs
Proceedings of the fifth ACM international conference on Web search and data mining
Executing multiple group by query using mapreduce approach: implementation and optimization
GPC'10 Proceedings of the 5th international conference on Advances in Grid and Pervasive Computing
GLADE: a scalable framework for efficient analytics
ACM SIGOPS Operating Systems Review
Social networking in developing regions
Proceedings of the Fifth International Conference on Information and Communication Technologies and Development
ReStore: reusing results of MapReduce jobs
Proceedings of the VLDB Endowment
Meeting service level objectives of Pig programs
Proceedings of the 2nd International Workshop on Cloud Computing Platforms
Jockey: guaranteed job latency in data parallel clusters
Proceedings of the 7th ACM european conference on Computer Systems
Abstract state machines for data-parallel computing
Conceptual Modelling and Its Theoretical Foundations
The spread of emotion via facebook
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
V-SMART-join: a scalable mapreduce framework for all-pair similarity joins of multisets and vectors
Proceedings of the VLDB Endowment
What next?: a half-dozen data management research goals for big data and the cloud
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Declarative error management for robust data-intensive applications
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Oracle in-database hadoop: when mapreduce meets RDBMS
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Re-optimizing data-parallel computing
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Optimizing data shuffling in data-parallel computation by understanding user-defined functions
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
SWIM '12 Proceedings of the 4th International Workshop on Semantic Web Information Management
An optimization framework for map-reduce queries
Proceedings of the 15th International Conference on Extending Database Technology
ComMapReduce: an improvement of mapreduce with lightweight communication mechanisms
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part II
Cost-benefit analysis of an SLA mapping approach for defining standardized Cloud computing goods
Future Generation Computer Systems
Optimizing Completion Time and Resource Provisioning of Pig Programs
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
ParaLite: Supporting Collective Queries in Database System to Parallelize User-Defined Executable
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Early accurate results for advanced analytics on MapReduce
Proceedings of the VLDB Endowment
Hybrid cloud support for large scale analytics and web processing
WebApps'12 Proceedings of the 3rd USENIX conference on Web Application Development
Cloud-Centric assured information sharing
PAISI'12 Proceedings of the 2012 Pacific Asia conference on Intelligence and Security Informatics
Towards a hybrid row-column database for a cloud-based medical data management system
Proceedings of the 1st International Workshop on Cloud Intelligence
Opening the black boxes in data flow optimization
Proceedings of the VLDB Endowment
HadoopRDF: a scalable semantic data analytical engine
ICIC'12 Proceedings of the 8th international conference on Intelligent Computing Theories and Applications
M3R: increased performance for in-memory Hadoop jobs
Proceedings of the VLDB Endowment
The vertica analytic database: C-store 7 years later
Proceedings of the VLDB Endowment
Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads
Proceedings of the VLDB Endowment
Auto-parallelizing stateful distributed streaming applications
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Automated profiling and resource management of pig programs for meeting service level objectives
Proceedings of the 9th international conference on Autonomic computing
SCOPE: parallel databases meet MapReduce
The VLDB Journal — The International Journal on Very Large Data Bases
Spotting code optimizations in data-parallel pipelines through PeriSCOPE
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
SCALLA: A Platform for Scalable One-Pass Analytics Using MapReduce
ACM Transactions on Database Systems (TODS)
Multimedia Applications and Security in MapReduce: Opportunities and Challenges
Concurrency and Computation: Practice & Experience
HEDC: a histogram estimator for data in the cloud
Proceedings of the fourth international workshop on Cloud data management
Sailfish: a framework for large scale data processing
Proceedings of the Third ACM Symposium on Cloud Computing
Balancing reducer skew in MapReduce workloads using progressive sampling
Proceedings of the Third ACM Symposium on Cloud Computing
On-the-fly task execution for speeding up pipelined mapreduce
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Using clouds for MapReduce measurement assignments
ACM Transactions on Computing Education (TOCE)
Just-in-time data distribution for analytical query processing
ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems
Cogset: a high performance MapReduce engine
Concurrency and Computation: Practice & Experience
Scalable RDF data compression with MapReduce
Concurrency and Computation: Practice & Experience
Towards building a high performance spatial query system for large scale medical imaging data
Proceedings of the 20th International Conference on Advances in Geographic Information Systems
Toward scalable internet traffic measurement and analysis with Hadoop
ACM SIGCOMM Computer Communication Review
Constructing a data accessing layer for in-memory data grid
Proceedings of the Fourth Asia-Pacific Symposium on Internetware
SemanMR: big data processing framework based on semantics
Proceedings of the Fourth Asia-Pacific Symposium on Internetware
MobiS: a distributed paradigm of mobile sensor data analytics for evaluating environmental exposures
Proceedings of the First ACM SIGSPATIAL International Workshop on Mobile Geographic Information Systems
Oozie: towards a scalable workflow management system for Hadoop
Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies
Turbine: a distributed-memory dataflow engine for extreme-scale many-task applications
Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies
Tiled-MapReduce: Efficient and Flexible MapReduce Processing on Multicore with Tiling
ACM Transactions on Architecture and Code Optimization (TACO)
Eagle-eyed elephant: split-oriented indexing in Hadoop
Proceedings of the 16th International Conference on Extending Database Technology
Communication steps for parallel query processing
Proceedings of the 32nd symposium on Principles of database systems
Cumulon: optimizing statistical data analysis in the cloud
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Simulation of database-valued markov chains using SimSQL
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
TimeStream: reliable stream computation in the cloud
Proceedings of the 8th ACM European Conference on Computer Systems
Optimus: a dynamic rewriting framework for data-parallel execution plans
Proceedings of the 8th ACM European Conference on Computer Systems
BlinkDB: queries with bounded errors and bounded response times on very large data
Proceedings of the 8th ACM European Conference on Computer Systems
Issues in big data testing and benchmarking
Proceedings of the Sixth International Workshop on Testing Database Systems
Exploiting in-network processing for big data management
Proceedings of the 2013 Sigmod/PODS Ph.D. symposium on PhD symposium
Early experiences in using a domain-specific language for large-scale graph analysis
First International Workshop on Graph Data Management Experiences and Systems
On benchmarking online social media analytical queries
First International Workshop on Graph Data Management Experiences and Systems
Reference representation techniques for large models
Proceedings of the Workshop on Scalability in Model Driven Engineering
Large-scale computation not at the cost of expressiveness
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Efficient social network data query processing on MapReduce
Proceedings of the 5th ACM workshop on HotPlanet
EMF modeling in traffic surveillance experiments
Proceedings of the Modelling of the Physical World Workshop
Cache conscious star-join in MapReduce environments
Proceedings of the 2nd International Workshop on Cloud Intelligence
Distributed data management using MapReduce
ACM Computing Surveys (CSUR)
MRPacker: an SQL to mapreduce optimizer
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Performance Modeling and Optimization of Deadline-Driven Pig Programs
ACM Transactions on Autonomous and Adaptive Systems (TAAS)
The family of mapreduce and large-scale data processing systems
ACM Computing Surveys (CSUR)
Proceedings of the 4th annual Symposium on Cloud Computing
Demonstration of Hadoop-GIS: a spatial data warehousing system over MapReduce
Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
Representing mapreduce optimisations in the nested relational calculus
BNCOD'13 Proceedings of the 29th British National conference on Big Data
PonIC: using stratosphere to speed up pig analytics
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
MR-runner: a modularized map-reduce job management tool
Proceedings of the 5th Asia-Pacific Symposium on Internetware
CRUCIBLE: towards unified secure on- and off-line analytics at scale
DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
Piranha: optimizing short jobs in Hadoop
Proceedings of the VLDB Endowment
Hadoop GIS: a high performance spatial data warehousing system over mapreduce
Proceedings of the VLDB Endowment
Scuba: diving into data at facebook
Proceedings of the VLDB Endowment
Unicorn: a system for searching the social graph
Proceedings of the VLDB Endowment
Medical data management in the SYSEO project
ACM SIGMOD Record
Efficient query evaluation on distributed graphs with Hadoop environment
Proceedings of the Fourth Symposium on Information and Communication Technology
Simplifying Scalable Graph Processing with a Domain-Specific Language
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Instant loading for main memory databases
Proceedings of the VLDB Endowment
Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication
ComMapReduce: An improvement of MapReduce with lightweight communication mechanisms
Data & Knowledge Engineering
Run-time performance optimization of a BigData query language
Proceedings of the 5th ACM/SPEC international conference on Performance engineering
SHadoop: Improving MapReduce performance by optimizing job execution mechanism in Hadoop clusters
Journal of Parallel and Distributed Computing
Exploiting inter-operation parallelism for matrix chain multiplication using MapReduce
The Journal of Supercomputing
The Journal of Supercomputing
Turbine: A Distributed-memory Dataflow Engine for High Performance Many-task Applications
Fundamenta Informaticae - Scalable Workflow Enactment Engines and Technology
A platform for eXtreme analytics
IBM Journal of Research and Development
Hi-index | 0.02 |
The size of data sets being collected and analyzed in the industry for business intelligence is growing rapidly, making traditional warehousing solutions prohibitively expensive. Hadoop [3] is a popular open-source map-reduce implementation which is being used as an alternative to store and process extremely large data sets on commodity hardware. However, the map-reduce programming model is very low level and requires developers to write custom programs which are hard to maintain and reuse.