Fundamental tradeoff between computation and communication in distributed computing

01 January 2016

New Image

We introduce a general distributed computing framework, motivated by commonly used structures like MapReduce, and formulate an information-theoretic tradeoff between computation and communication in such a framework. We characterize the optimal tradeoff to within a constant factor, for all system parameters. In particular, we propose a coded scheme, namely "Coded MapReduce" (CMR), which creates and exploits coding opportunities in data shuffling for distributed computing, reducing the communication load by a factor that is linearly proportional to the computation load. We then prove a lower bound on the minimum communication load, and demonstrate that CMR achieves this lower bound to within a constant factor. This result reveals a fundamental connection between computation and communication in distributed computing - the two are inverse-linearly proportional to each other.