Tracking Cardinality Distributions in Network Traffic
19 April 2009
Understanding the aggregate behavior of network host connectivities is important for network monitoring and traffic engineering. One characterization of such an aggregate behavior is the host distributions of distinct communicating peers or flows. For example, during the worm outbreak, the port scanning activities would cause many hosts with increasing number of (one-way) peers (or flows), and hence a change in the host distributions of distinct communicating peers or flows. In this paper, we develop an efficient streaming algorithm for tracking these host distributions of distinct elements, also called cardinality distributions, for a high speed network with a large number of hosts. Our approach utilizes the continuous Flajolet-Martin sketches, which is the minimal order statistics of hashed values, as a compact data summary and develops maximum likelihood estimates of these distributions. By leveraging the aggregation of many hosts, we are able to obtain very accurate estimates of the cardinality distributions by maintaining a compact statistical summary that is as small as one number (at most 32 bits) per host. Extensive experimental studies are carried out to demonstrate their excellent performance.