Estimating Cardinality Distributions in Network Traffic

01 January 2008

New Image

Understanding the aggregate behaviour of network host connectivity patterns is important for network monitoring and traffic engineering. One characterization of such an aggregate behaviour is the host distributions of distinct communicating peers or flows. For example, during the worm outbreak, the port scanning activities would cause many hosts with increasing number of (one-way) peers (or flows), and hence a change in the host distributions of distinct communicating peers or flows. In this paper, we develop efficient streaming algorithms for estimating these host distributions of distinct elements for a high speed network of a large number of hosts. Our approach utilizes the minimal order statistics of hashed values (also called Flajolet-Martin sketches) as a compact data summary and develops maximum likelihood estimates of these distributions nonparametrically. By leveraging the aggregation of many hosts, we are able to obtain very accurate estimates of the cardinality distributions by maintaining a compact statistical summary that is as small as one number (at most 32 bits) per host. Both theoretical analysis and experimental evaluation are carried out to demonstrate their excellent performance.