System Log Parsing: A Survey

22 August 2021

New Image

Nowadays, modern information and communication systems have become increasingly difficult to manage. Traditional runtime analysis techniques are burdensome to configure and can impose huge overhead. The ubiquitously available system logs contain plentiful runtime information and are thus widely exploited as an alternative source for system management. As log files usually encompass a myriad of raw data, manually inspecting log files is both laborious and error-prone. To cope with this daunting task, a lot of research endeavors have been devoted to automatic log analysis. However, most of these works only expect structured input and ignore the unstructured nature of raw log messages. Log parsing closes this gap by converting the free-text log messages to structured ones. In the last two decades, a large collection of log parsers have been implemented. However, their general performance characteristics and operational features are still unclear, making it extremely difficult for practitioners to choose the most suitable solutions. In this paper, we provide a comprehensive survey on log parsing. We begin with a taxonomy on existing log parsing solutions. Then we review their quantitative and qualitative performance features. Based on our literature study, we also envision several future challenges and research directions. We believe this paper provides a first-hand guideline for system administrators and domain experts to choose the most suitable log parsers or implement new ones based on individual needs.