Design cost versus access cost trade-off in distributed storage systems: A combinatorial approach
01 January 2016
Motivated by a class of resource allocation problems in cloud computing and storage systems, and aiming to capture the trade-off between design (implementation or hardware) cost and access cost, we introduce and study a new class of combinatorial problems. To be more specific, we consider a distributed storage system, with N files and K isolated memory units, where each memory has space to store M files, for some integers N, K, and KM ⥠N. At each time, a client asks for an arbitrary subset of T files, for some integer T ⥠M, and would like to read these T files by accessing the minimum number of memories. The objective in this paper is to characterize the fundamental trade-off between K, the number of memory units, and L, the number of memories that the client has to access in order to recover its requested T files. Other scenarios such as minimizing energy in server farms, or minimizing seek time in disk can also be formulated within the same underlying combinatorial problem. Through some counting arguments, we show that the required number of memories K has to be at least L/e (N/(eML))T/L. Moreover, we show that there exists a solution which guarantees that the reader can read any T files by accessing to L memories, with K ⩽ 2 (Ne/M)âT/Lâ. In particular, we introduce a simple probabilistic approach to place the memories' content, which achieves approximately the same trade-off. Finally, we justify the analytic trade-off bounds derived in this paper through some simulation results.