Prioritized Relationship Mining in Heterogeneous Information Networks

15 April 2016

New Image

In recent years many applications require information from diverse sources to be linked and mined in a holistic way. This integrated information is often modeled as a Heterogeneous Information Network, and a fundamental problem in mining these networks is to find paths matching specific patterns in real time. In this paper, we propose an algorithm, called PRO-HEAPS, that leverages a combination of graph preprocessing techniques and A* search to find the most relevant $k$ instances matching an input query pattern, where the relevance can be defined using any user-defined interestingness metric. We show that our algorithm significantly outperforms a wide variety of baseline approaches, on several real-world graph data sets.