Online Activity Categorization of Mobile Users

01 January 2016

New Image

Understanding a user online activity is extremely important for many business applications. There are a vast body of work that aims to characterize web activities in the hope to ultimately understand the user intent. However, many of these work are drawn from either public datasets that are less relevant for the modern mobile world or proprietary datasets collected on a limited scale. In this paper, we present our experiment on a large scale dataset collected in a wireless network of a national carrier, which consists of 7500 mobile users generating 20 million online requests. We note some unique challenges present in the real data for understanding the users online activity, such as the presence of a large volume of background activities (for example those attributed to content delivery and ad networks), URL shortening and redirection. Furthermore, the widespread encrypted / dynamic / personalized content makes obtaining the user assumed content much more difficult. We then propose a scheme for online activity characterization that can address these issues. Our method relies on URL expansion to reveal the true destination URL, host / domain classification to provide the right context, and finally using the extracted URL tokens to determine the most appropriate activity category. We demonstrate and validate the effectiveness of our approach on the real mobile data and the publicly available data such as the Yahoo Open Directory Project.