Models of machines and computation for mapping in multicomputers
ACM Computing Surveys (CSUR)
Designing distributed applications with mobile code paradigms
ICSE '97 Proceedings of the 19th international conference on Software engineering
A Majority consensus approach to concurrency control for multiple copy databases
ACM Transactions on Database Systems (TODS)
Towards robust distributed systems (abstract)
Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing
Software Engineering with Agents: Pitfalls and Pratfalls
IEEE Internet Computing
Mnesia - A Distributed Robust DBMS for Telecommunications Applications
PADL '99 Proceedings of the First International Workshop on Practical Aspects of Declarative Languages
Network Distributed Computing: Fitscapes and Fallacies
Network Distributed Computing: Fitscapes and Fallacies
Highly available, fault-tolerant, parallel dataflows
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
A model for characterizing the scalability of distributed systems
ACM SIGOPS Operating Systems Review
Fault-tolerance in the borealis distributed stream processing system
ACM Transactions on Database Systems (TODS)
Restful web services vs. "big"' web services: making the right architectural decision
Proceedings of the 17th international conference on World Wide Web
Programming Erlang: Software for a Concurrent World
Programming Erlang: Software for a Concurrent World
An investigation of the Internet's IP-layer connectivity
Computer Communications
ERLANG Programming
A Case Study on Verifying a Supervisor Component Using McErlang
Electronic Notes in Theoretical Computer Science (ENTCS)
Erlang and OTP in Action
Hi-index | 0.00 |
Proper definition of suitable mechanisms to cope with network partition and to recover from node failure are among the most common problems when designing and implementing a fault-tolerant distributed system. The concern is even more serious when the different scenarios could not be predicted beforehand and are detected once the system is at deployment stage. There are a number of decisions that can be made when choosing the right contingency mechanisms to deal with these distribution-bounded problems. The factors that must be taken into account include not only the technology in use, the node layout, the message protocol and the properties of the messages to be exchanged, certain desired/demanded features such as latency, bandwidth,... but also the communications network reliability, and even the hardware where the system is running on. In this paper we present ADVERTISE, a distributed system for advertisement transmission to on-customer-home set-top boxes (STBs) over a Digital TV network (iDTV) of a cable operator. We use this system as a case study to explain how we addressed the aforementioned problems, and present a set of good practices that can be extrapolated to comparable systems.