Finding k-lightest Weight Subgraphs in Heterogeneous Information Networks
20 February 2017
Querying and mining graph patterns in large heterogeneous information networks (HIN) is a fundamental problem that arises in a range of emerging applications. Given an input HIN and a query pattern with graph structure and node/edge labels, we are interested in finding the lightest subgraph in HIN that matches the query pattern and associated labels. The general problem of graph pattern matching and its variants are NP-hard and exact solutions for these queries are not known to be fast. However, it was recently shown that a frequent sub-case, where some of the nodes in the query patterns have a fixed mapping to a vertex in the input HIN and the query pattern is restricted to paths, can be solved in near real-time in practice by drastically pruning the search space of potentially matching instances. As a step towards extending this running time performance to arbitrary query patterns, we develop a careful generalization of the path algorithm to arbitrary tree queries. Compared to various tuned baselines on many diverse HINs, we show that our generalized algorithm is significantly faster, solving queries on large HINs in near real-time.