Markov Reliability Models of Fault Tolerant Distributed Computed Systems
A hierachical view of fault-tolerant distributed computers is presented, viewing a distributed computing system as composed of interconnected, interacting, functional modules. Each module, modeled by a directed-state graph, is governed by internal random failure events and counteracting recovery processes, and also by coupling of external random events from other modules.