Passive content curation based on HTTP logs
06 May 2014
Content curation refers to assisting users to identify relevant and interesting content in the overwhelming amount of online content available today. Existing content curation services rely either on experts or on the wisdom of the crowds to promote content. This paper designs HotNet, a passive, crowdsource-based content curation system. HotNet requires no active user engagement to promote content. Instead, it extracts the URLs users visit from traffic traversing an ISP network to identify popular content. A key challenge to design such passive curation system is to process network traffic in real-time to identify the small set of URLs that are interesting to users. HotNet contains a set of heuristics to identify the set of URLs users visit and to select the sub-set that are interesting to users. We evaluate HotNet using traces collected at a large European access ISP and in a deployment in a campus network.