George Fletcher, Associate Professor at the Eindhoven University of Technology, presented gMark, an open-source framework for generating synthetic graph instances and workloads. The main focus of gMark has been to tailor different graph data management scenarios, often driven by query workloads. Such as multi-query optimization, workload-driven graph database physical design or mapping discovery and query rewriting data integration systems.
Marcus Paradies, Software developer at SAP extended the talk Arnau Prat gave about the SNB, in this case about the Intelligence workload. In contrast with the 17+4 queries the Interactive workload has, the Business Intelligence (BI) workload consists on 24 queries that can be seen as OLAP-style against the OLTP-style of the Interactive one. The BI focuses on analytic queries and they touch the whole graph.
Sergey Edunov, Software Engineer at Facebook gave a great talk on how and why his company generating large-scale social graphs. The underlying reasons to start such an ambitious project are capacity planning to make sure that their system will be able to handle a graph that keeps growing year after year and fair evaluation of their system against the ones being implemented by other companies.
Weining Qian, professor at East China Normal University presented his talk on Statistical Characteristics of Real-Life Knowledge graphs during the 8th TUC Meeting held at Oracle’s facilities in Redwood City, California.
Qian explained that term knowledge graph was introduced by Google in 2012 and it has been an evolution of the semantic web. Professor Qian then introduced the main question of his talk: how can we efficiently manage knowledge graphs? Are the existing benchmarks sufficient to test them since most of these benchmarks focus only on Social Networks?
Peter Boncz, Research Scientist at the Centrum Wiskunde & Informatica in the Netherlands, talked about the updates on the Graph Query Language Task Force after being alive for a year. This Task Force was created to answer an issue detected during the benchmark meetings, all the workload is created in English text because there is no common graph query language.
Lijun Chang, DECRA Fellow at the University of New South Wales talked about how to make subgraph matching more efficient thanks to postponing Cartesian products. They key problem he explained was the extraction of subgraph isomorphic embeddings. The applications of this process are wide enough to cover protein interaction research, social network analysis and even chemical compound investigation. The testing of subgraph isomorphism is an NP-complete type of problem however, his team is focusing on enumerating all subgraph embeddings which, he explains, is even harder.
During the 8th TUC Meeting Eugene Chong from Oracle USA explained what his team and himself had done to improve RDF query processing in their database.
Jerven Bolleman, Lead Software Developer at Swiss-Prot Group, explained why are they offering a free SPARQL and RDF endpoint for the world to use and why is it hard to optimize it. The data biologists use tends to be extremely ambiguous and dirty, additionally, scientists are always trying to find new questions to ask, thus why the difficulty regarding the optimization of UniProt, they wouldn’t be offering the right service to their users by optimizing the query patterns. Furthermore, since UniProt is publicly funded, all the data needs to be public.