8th TUC Meeting - Lijun Chang (University of New South Wales). Efficient Subgraph Matching by Postponing Cartesian Products.
Lijun Chang, DECRA Fellow at the University of New South Wales talked about how to make subgraph matching more efficient thanks to postponing Cartesian products. They key problem he explained was the extraction of subgraph isomorphic embeddings. The applications of this process are wide enough to cover protein interaction research, social network analysis and even chemical compound investigation. The testing of subgraph isomorphism is an NP-complete type of problem however, his team is focusing on enumerating all subgraph embeddings which, he explains, is even harder.
The challenges he is facing are the following ones:
- Redundant Cartesian Products by Dissimilar Vertices. Their solution was postponing Cartesian products by decomposing the query q into a dense subgraph and a forest and processing the dense subgraph first.
- Exponential size of the path-based data structure in TurboISO. It builds a data structure that materializes embeddings (all of them) in query paths in a data graph. They tried to give solution to this challenge by building a polynomial-size data structure called compact path-index (CPI).
The approach of Chang and his team is composed by a Core-First based framework and the previously mentioned CPI Matching. In core-first framework they decomposed the query q into core-forest where they define the core as the minimal connected subgraph containing all non-tree edges of q regarding any spanning tree.
The CPI is an auxiliary data structure that compactly stores candidate embeddings of query spanning trees and serves for computing and effective matching order. You can see examples of the CPI Structure in the slides Chang presented during the meeting (see below). The CPI is built upon a heuristic approach and it’s constructed via top-down processing and bottomup refinement to exploit the pruning power of both directions of every query edge.
Chang’s approach proved to outperform the state-of-the-art algorithms on real and synthetic graphs by extensive empirical studies.
The 9th TUC Meeting will be held at at SAP's HQ in Walldorf, Germany the 9-10th of February. Start planning your assistance!
See his full presentation and the slides on the links below:
And check the presentation: