Know Your Personalization: Learning Topic level Personalization in Online Services
01 January 2013
Personalization is being used by majority of online ser- vices, such as search, advertising, social networking, shop- ping, news, etc., to lure users by giving better service ex- perience customized to their individual interests. A recent trend is to employ profile based personalization, where they build extensive profile for the user based on his past ac- tions with them and personalize the content based on this profile. Some common examples are search personalization (refer to filter bubble, SE papers), behavioral ad targeting, social search/ads, movie recommendations, etc. The history collected can be of various forms depending on the service - searches queries, urls visited, content liked and shared, con- versations in the form of comments and messages, etc. It may not be just the services he is actively using, but also the so called 3rd party tracking services such as ad-networks, that track him across websites and are present in practically all websites he visits. For a user, this raises a privacy concern - he does not know what part of his history has been collected and now being used to personalize his future content. This concern is only enhanced by following the capabilities of ever im- proving modern inference techniques, that can determine his interests and biases on different categories (that we gener- ically call topics) with an alarmingly high accuracy. E.g. filter bubble, republican/democrat, gay-or-not, FB-married example, pregnancy, etc. Moreover, as both the inference and personalization techniques and the data they operate on are the key differentiators of these services (their secret sauce), they do no reveal either of them, making it even harder for the user to understand what is going on. In this paper, we aim at finding the inferred topics for the users, based on the personalization performed by the service for him. We treat these services as black boxes and assume that we only have access to the output of the service, which is basically the content served by them to the user on a par- ticular url. Using this information, we develop a probabilistic model to learn what topics are inferred by the service provider and are used to personalize the search results.