On the Performance Model of a Parallel Streaming Engine: Bridging Theory and Costs
01 April 2013
The Storm system is currently emerging as an important choice for performing parallel and distributed stream pro- cessing operations. This report analyzes the internals of the system and presents a set of performance models for describing the execution of distributed stream processing operations on Storm. The models describe the data flow, the data processing and the system management overhead where the cost information is presented at a fine granularity within the different steps of a job execution. The models can be used to estimate the performance of Storm-based stream processing jobs as well as to find the optimal configuration settings to use when running different jobs with different characteristics and requirements.