Engineering Motif Search for Large Graphs

Given a vertex-colored graph H and a multiset M of colors as input, the graph motif problem asks us to decide whether H has a connected induced subgraph whose multiset of colors agrees with M. The graph motif problem is NP-complete but known to admit randomized algorithms based on constrained multilinear sieving over GF(2^b) that run in time O(2^k k^2 m M(2^b)) and with a false-negative probability of at most k/2^(b-1) for a connected m-edge input and a motif of size k. On modern CPU microarchitectures such algorithms have practical edge-linear scalability to inputs with billions of edges for small motif sizes, as demonstrated by Björklund, Kaski, Kowalik, and Lauri [ALENEX'15]. This scalability to large graphs prompts the dual question whether it is possible to scale to large motif sizes. We present a vertex-localized variant of the constrained multilinear sieve that enables us to obtain, in time O(2^k k^2 m M(2^b)) and for every vertex simultaneously, whether the vertex participates in at least one match with the motif, with a per-vertex probability of at most k/2^(b-1) for a false negative. Furthermore, the algorithm is easily vector-parallelizable for up to 2^k threads, and parallelizable for up to 2^kn threads, where n is the number of vertices in H. Here M(2^b) is the time complexity to multiply in GF(2^b). We demonstrate with an open-source implementation that our variant of constrained multilinear sieving can be engineered for vector-parallel microarchitectures to yield hardware utilization that is bound by the available memory bandwidth. Our main engineering contributions are (a) a version of the recurrence for tightly labeled arborescences that can be executed as a sequence of memory-and-arithmetic coalescent parallel workloads on multiple GPUs, and (b) a bit-sliced low-level implementation for arithmetic in characteristic 2 to support (a). On a single NVIDIA Tesla P100 Accelerator (NVIDIA GP100 GPU with 3584 cores at 1189 MHz), our implementation achieves, at peak, an empirical global memory bandwidth of up to half a terabyte per second, essentially matching the peak empirical transfer bandwidth for ondevice memory. Furthermore, we obtain simultaneously an arithmetic bandwidth of over 300 billion GF(2^8)-multiplications per second. For example, with b = 8 we can find all vertices that participate in at least one match with a motif of size k = 10 on a one-million-vertex 20-regular graph (m = 10000000) in less than four seconds. With k = 20 and m = 1000000, we run in less than 25 minutes. When scaled up to a four-P100-card configuration (14336 cores), we obtain a three-tofour-fold speed-up, with up to two terabytes per second of empirical memory bandwidth and over 1.2 trillion GF(2^8)-multiplications per second when executing the sieve.

Authors

Juho Lauri

View Publication

Select your country

Engineering Motif Search for Large Graphs

Authors

Looking for Nokia licensed products support?

Looking for Nokia licensed products support?