8th TUC Meeting – George Fletcher (TU Eindhoven), gMark: Schema-driven data and workload generation for graph databases
George Fletcher, Associate Professor at the Eindhoven University of Technology, presented gMark, an open-source framework for generating synthetic graph instances and workloads. The main focus of gMark has been to tailor different graph data management scenarios, often driven by query workloads. Such as multi-query optimization, workload-driven graph database physical design or mapping discovery and query rewriting data integration systems.
Why is gMark special? Given a graph schema gMark generates synthetic instances of the schema and it also generates query workloads with targeted structure and runtime behaviour that apply to all the instances of the schema. The idea is to know the difficulty of different queries on arbitrary workloads. Fletcher’s team has adopted various successful aspects of the state-of-the-art and, as the Waterloo Diversity Benchmark, gMark is schema-driven, permitting finely tailored graph instances for specific application domains but not just that, it also allows tightly controlled generation of query workloads (unlike WDB). gMark is also similar to the LDBC’s Interactive SNB in the sense that it supports focused stress-testing of query optimization chokepoints through fine control of query parameters, mainly through the selectivity of the queries.
Professor Fletcher also introduced the new features that his team has been introducing in gMark. These include support for flexible generation of query workloads including recursive path queries (fundamental for graph analytics) and query selectivity estimation solution in a purely instance-independent schema-driven fashion. This making it more scalable, predictable and easier to explain and understand. Fletcher pointed out that using gMark the team has discovered performance difficulties of existing graph DBMS’s on evaluating a basic class of graph queries (regular path queries).
Finally, the next steps for the team are supporting richer queries (support of constants, additional shapes, aggregation for BI workloads and extension of selectivity estimation to higher rarity queries).
The 9th TUC Meeting will be held at at SAP's HQ in Walldorf, Germany the 9-10th of February. Start planning your assistance, you’ve got until Thursday!
As always, slides and full presentation can be found below: