Compressed Coded Distributed Computing

17 June 2018

New Image

Communication overhead is one of the major performance bottlenecks in large-scale distributed computing systems, especially for machine learning applications. Conventionally, compression techniques are used to reduce the load of communication by combining intermediate results of the same computation task as much as possible. Recently, via the development of coded distributed computing (CDC), it has been shown that it is possible to code across intermediate results of different tasks to further reduce communication. We propose a new scheme, named compressed coded distributed computing (in short, compressed CDC), which jointly exploits these two techniques (i.e., combining intermediate results of the same computation and coding across intermediate results of different computations) to significantly reduce the communication load for computations with linear aggregation of intermediate results in the final stage that are prevalent in machine learning (e.g., distributed training where partial gradients are computed distributedly and then averaged in the final stage). In particular, compressed CDC first compresses/combines several intermediate results for a single computation, and then utilizes multiple such combined packets to create a coded multicast packet that is simultaneously useful for multiple computations. We characterize the achievable communication load of compressed CDC and show that it substantially outperforms both combining methods and CDC scheme.