Scalable Content-Based Routing in Pub/Sub Systems

19 April 2009

New Image

In this paper, we develop a framework and algorithms for achieving scalable and communication-efficient dissemination of content in pub/sub systems. To maximize communication sharing across subscriptions, our routing framework groups subscriptions based on similarity, and transmits content matching one or more subscriptions in a group over a single dissemination tree for the group. We develop a cost model that uses published content samples in conjunction with the knowledge of consumer subscriptions to estimate the communication cost of a set of routing trees for subscription groups. The problem of computing a communication-optimal set of routing trees is then formulated as an optimization problem that seeks to find trees with the minimum cost. It turns out that the problem of computing a minimum-cost tree for a subscription group is a new generalization of the well-known steiner tree problem, and is an interesting problem in its own right. We develop an approximation algorithm that uses {it low-stretch} spanning trees to compute a tree whose communication cost is within a polylogarithmic factor of the optimum. We use this to compute trees for various subscription-grouping configurations generated using a greedy clustering strategy, and select the one with the lowest cost. Our experimental study demonstrates the effectiveness of our content routing approach compared to traditional routing based on content oblivious spanning trees.