Fault Injection Experiments for Distributed Objects

01 January 1999

New Image

(Title was originally Threading Models in Distributed Objects and Their Effect on Reliability) This paper discusses the reliability of distributed objects in both CORBA and DCOM with respect to different threading models. We study how distributed objects behave in the presence of failures. One purpose of CORBA and DCOM is to encapsulate platform and communication details from application developers. As a result, it is difficult to comprehend how objects fail. Our goal is to understand failures in these systems and to provide effective failure detection and recovery mechanisms to improve the reliability and availability. The contents in this paper are presented as follows: (1) We describe and compare the threading architecture in CORBA (using IONA's MT-Orbix) and DCOM. (2) We investigate potential failures including thread hang/abend/crash, process hang/abend/crash and machine hang/abend/crash. (3) We present the results of fault injection experiments as perceived by clients when these failures occur to the server objects. (4) Based on these experiments, we propose a set of detection and recovery mechanisms. This is the first work that discusses object reliability at the thread level.