Following the Data, Not the Function: Restructuring Function-as-a-Service with Explicit Data Consuming

12 September 2021

New Image

Serverless applications are usually composed of multiple short-lived, single-purpose functions exchanging asynchronous messages in reaction to events or changes of states. Existing function orchestration services coordinate function invocations following some predefined rules (e.g., state machines), while being oblivious to the underlying data exchange between functions. Such design has limited expressiveness and misses many opportunities for improved data locality. In this paper, we advocate data-centric orchestration where function invocations are triggered by the flow of data. In our design, the platform provides interface through which developers can control when and how the output of one or many functions triggers other functions for data consumption. By making data consumption explicit, complex function interactions can be easily implemented, and data locality can also be satisfied. As a manifestation of this design, we present Pheromone, a scalable, low-latency serverless platform. Pheromone schedules functions close to the input data with a two-level, shared-nothing scheduling hierarchy. It also supports zero-copy data sharing between local functions via shared memory. Compared to the existing platforms such as AWS Lambda, KNIX, and Cloudburst, Pheromone significantly reduces the function interaction latency, supports sharing large data, and scales to a large number of functions. Case studies further demonstrate that Pheromone enables easy implementations of a range of applications that are considered difficult to support in the existing platforms.