Resource Allocation for Coded Distributed Computing
06 March 2017
A data center has an abundance of processing power (e.g., it hosts tens of thousands of servers). To execute a distributed computing job, it is natural to have multiple servers working on different parts of the job in parallel to reduce the computation time. On the other hand, in most cases the intermediate results computed by these servers needs to be collected by a server in order to execute a second stage computation. Thus, spreading the computation jobs onto too many servers may also increase the communication time. It is valuable to find the optimal way of assigning the computation resources to a problem, in order to solve it as soon as possible. We consider a simple distributed computing framework, where the goal is to compute Q arbitrary functions from N files, by decomposing the overall computation into a set of "Map" and "Reduce" functions. We study the minimum possible total execution time for completing an arbitrary distributed computing jobs, when one have access to arbitrary number of servers. We give the exact minimum possible execution time for all possible values of problem parameters, and we provide the corresponding optimal computing schemes. In most cases, the proposed optimal scheme can exactly achieve the minimum execution time using only a finite number of servers. We prove a matching information theoretic converse showing that our proposed scheme exactly achieve the minimum execution time. Besides, we also prove that among all possible computing schemes that achieves the minimum execution time, our proposed scheme also uses the exact minimum possible number of servers.