Reliability of Scrubbing Recovery Techniques for Memory Systems

New Image

This paper analyzes the problem of transient error recovery in fault tolerant memory systems using a scrubbing technique. This technique is based on single-error correction and double-error detection codes (SEC-DED). When a single error is detected in a memory word, the error is corrected and the word is rewritten in its original location. Two models are discussed: exponentially distributed scrubbing, where a memory word is assumed to be checked in an exponentially distributed time period, and deterministic scrubbing, where a memory word is checked periodically. Reliability and Mean Time To Failure (MTTF) equations are derived and estimated. The results of the scrubbing technique are compared with those of memory systems without redundancies and with only SEC-DED codes. A major contribution of the analysis is to provide easy to use expressions for MTTF of memories.