Bounds on Algorithm-Based Fault Tolerance in Multiple Processor Systems
IEEE Transactions on Computers - The MIT Press scientific computation series
An analysis of algorithm-based fault tolerance techniques
Journal of Parallel and Distributed Computing
Algorithm-Based Fault Tolerance on a Hypercube Multiprocessor
IEEE Transactions on Computers
A case for two-level distributed recovery schemes
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
A checkpointing strategy for scalable recovery on distributed parallel systems
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
CoCheck: Checkpointing and Process Migration for MPI
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Managing Checkpoints for Parallel Programs
IPPS '96 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
CLADE '03 Proceedings of the 1st International Workshop on Challenges of Large Applications in Distributed Environments
Evaluation of checkpoint mechanisms for massively parallel machines
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
Fault Tolerant Matrix Operations for Networks of Workstations Using Multiple Checkpointing
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Starfish: Fault-Tolerant Dynamic MPI Programs on Clusters of Workstations
HPDC '99 Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing
Checkpointing Message-Passing Interface(MPI) Parallel Programs
PRFTS '97 Proceedings of the 1997 Pacific Rim International Symposium on Fault-Tolerant Systems
Fail-Safe PVM: A Portable Package for Distributed Programming with Transparent Recovery
Fail-Safe PVM: A Portable Package for Distributed Programming with Transparent Recovery
Algorithm-Based Diskless Checkpointing for Fault-Tolerant Matrix Operations
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
CPVM -- Extending PVM for Consistent Checkpointing
PDP '96 Proceedings of the 4th Euromicro Workshop on Parallel and Distributed Processing (PDP '96)
Checkpointing for Peta-Scale Systems: A Look into the Future of Practical Rollback-Recovery
IEEE Transactions on Dependable and Secure Computing
Condition Numbers of Gaussian Random Matrices
SIAM Journal on Matrix Analysis and Applications
Scalable diskless checkpointing for large parallel systems
Scalable diskless checkpointing for large parallel systems
Algorithm-Based Fault Tolerance for Matrix Operations
IEEE Transactions on Computers
Output-sensitive decoding for redundant residue systems
Proceedings of the 2010 International Symposium on Symbolic and Algebraic Computation
Tolerating correlated failures for generalized Cartesian distributions via bipartite matching
Proceedings of the 8th ACM International Conference on Computing Frontiers
Application-specific fault tolerance via data access characterization
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Algorithm-based fault tolerance for dense matrix factorizations
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Reliability-aware core partitioning in chip multiprocessors
Journal of Systems Architecture: the EUROMICRO Journal
HADAB: enabling fault tolerance in parallel applications running in distributed environments
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Adoption protocols for fanout-optimal fault-tolerant termination detection
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Fault tolerance using lower fidelity data in adaptive mesh applications
Proceedings of the 3rd Workshop on Fault-tolerance for HPC at extreme scale
Parallel reduction to hessenberg form with algorithm-based fault tolerance
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
ACR: automatic checkpoint/restart for soft and hard error protection
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
X10-FT: Transparent fault tolerance for APGAS language and runtime
Parallel Computing
Estimation of error propagation in multiprocessor systems
Advances in Engineering Software
Hi-index | 0.00 |
In a large-scale multimedia storage system (LMSS) where the user requests for different multimedia objects may have different demands, placement and replication of the objects is an important factor, as it may result in an imbalance in loading across ...