Query evaluation techniques for large databases
ACM Computing Surveys (CSUR)
Implementation of the typed call-by-value λ-calculus using a stack of regions
POPL '94 Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Better static memory management: improving region-based analysis of higher-order languages
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Memory management with explicit regions
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
A region-based memory manager for prolog
Proceedings of the 2nd international symposium on Memory management
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Combining region inference and garbage collection
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Region-based memory management in cyclone
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Database Management Systems
Ensuring code safety without runtime checks for real-time control systems
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
An Implementation of Scoped Memory for Real-Time Java
EMSOFT '01 Proceedings of the First International Workshop on Embedded Software
Ownership types for safe region-based memory management in real-time Java
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Experience with safe manual memory-management in cyclone
Proceedings of the 4th international symposium on Memory management
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
The causes of bloat, the limits of health
Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications
Precise memory leak detection for java software using container profiling
Proceedings of the 30th international conference on Software engineering
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Jolt: lightweight dynamic analysis and removal of object churn
Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications
Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering
Go with the flow: profiling copies to find runtime bloat
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Genoa Proceedings of the 23rd European Conference on ECOOP 2009 --- Object-Oriented Programming
Four Trends Leading to Java Runtime Bloat
IEEE Software
Detecting inefficiently-used containers to avoid bloat
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Finding low-utility data structures
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
FlumeJava: easy, efficient data-parallel pipelines
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Pregel: a system for large-scale graph processing
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Spark: cluster computing with working sets
HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Performance analysis of idle programs
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Proceedings of the FSE/SDP workshop on Future of software engineering research
ASTERIX: towards a scalable, semistructured data platform for evolving-world models
Distributed and Parallel Databases
Hyracks: A flexible and extensible foundation for data-intensive computing
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Modeling runtime behavior in framework-based applications
ECOOP'06 Proceedings of the 20th European conference on Object-Oriented Programming
Static detection of loop-invariant data structures
ECOOP'12 Proceedings of the 26th European conference on Object-Oriented Programming
Finding reusable data structures
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Hi-index | 0.00 |
Over the past decade, the increasing demands on data-driven business intelligence have led to the proliferation of large-scale, data-intensive applications that often have huge amounts of data (often at terabyte or petabyte scale) to process. An object-oriented programming language such as Java is often the developer's choice for implementing such applications, primarily due to its quick development cycle and rich community resource. While the use of such languages makes programming easier, significant performance problems can often be seen --- the combination of the inefficiencies inherent in a managed run-time system and the impact of the huge amount of data to be processed in the limited memory space often leads to memory bloat and performance degradation at a surprisingly early stage. This paper proposes a bloat-aware design paradigm towards the development of efficient and scalable Big Data applications in object-oriented GC enabled languages. To motivate this work, we first perform a study on the impact of several typical memory bloat patterns. These patterns are summarized from the user complaints on the mailing lists of two widely-used open-source Big Data applications. Next, we discuss our design paradigm to eliminate bloat. Using examples and real-world experience, we demonstrate that programming under this paradigm does not incur significant programming burden. We have implemented a few common data processing tasks both using this design and using the conventional object-oriented design. Our experimental results show that this new design paradigm is extremely effective in improving performance --- even for the moderate-size data sets processed, we have observed 2.5x+ performance gains, and the improvement grows substantially with the size of the data set.