Juan Sequeda, Co-founder of Capsenta, gave an interesting talk on how can we integrate data using graphs and semantics (semantic data virtualization). As Mr. Sequeda said, the idea is to integrate data without needing to move it around. Juan started off his presentation talking about the huge gap that exists between the IT departments, guardians of the data and the business development departments, trying to extract insights about the data. He used a clear example to illustrate this gap:
During the 8th TUC Meeting held at Oracle’s facilities in Redwood City, California, Zhe Wu, Software Architect at Oracle Spatial and Graph, explained how is his team trying to bridge RDF Graph and Property Data Models.
After making a brief overview about what is a graph he presented Oracle’s Graph strategy, they basically treat graphs as another data type on every platform (Hadoop, Oracle’s own database and, of course, in the Cloud). He also explained that his team is developing in 3 directions at the same time:
Yinlong started his talk with an introduction of his new position at Huawei, what is the company doing and more specifically how is it involved with Big Data Research and graphs. He also explained that his research center is currently working on Big Data Analytics and Management from 4 sides: Natural Language Processing, Graph analyrics, Machine Learning and Deep Learning. His team at the same time, focuses on 4 market segments that include financial graph analytics, consumer data gathered from smartphones and other portable devices, telecommunications and cloud technology.
George Fletcher, Associate Professor at the Eindhoven University of Technology, presented gMark, an open-source framework for generating synthetic graph instances and workloads. The main focus of gMark has been to tailor different graph data management scenarios, often driven by query workloads. Such as multi-query optimization, workload-driven graph database physical design or mapping discovery and query rewriting data integration systems.
Marcus Paradies, Software developer at SAP extended the talk Arnau Prat gave about the SNB, in this case about the Intelligence workload. In contrast with the 17+4 queries the Interactive workload has, the Business Intelligence (BI) workload consists on 24 queries that can be seen as OLAP-style against the OLTP-style of the Interactive one. The BI focuses on analytic queries and they touch the whole graph.
Sergey Edunov, Software Engineer at Facebook gave a great talk on how and why his company generating large-scale social graphs. The underlying reasons to start such an ambitious project are capacity planning to make sure that their system will be able to handle a graph that keeps growing year after year and fair evaluation of their system against the ones being implemented by other companies.
Peter Boncz, Research Scientist at the Centrum Wiskunde & Informatica in the Netherlands, talked about the updates on the Graph Query Language Task Force after being alive for a year. This Task Force was created to answer an issue detected during the benchmark meetings, all the workload is created in English text because there is no common graph query language.
Lijun Chang, DECRA Fellow at the University of New South Wales talked about how to make subgraph matching more efficient thanks to postponing Cartesian products. They key problem he explained was the extraction of subgraph isomorphic embeddings. The applications of this process are wide enough to cover protein interaction research, social network analysis and even chemical compound investigation. The testing of subgraph isomorphism is an NP-complete type of problem however, his team is focusing on enumerating all subgraph embeddings which, he explains, is even harder.
During the 8th TUC Meeting Eugene Chong from Oracle USA explained what his team and himself had done to improve RDF query processing in their database.
Jerven Bolleman, Lead Software Developer at Swiss-Prot Group, explained why are they offering a free SPARQL and RDF endpoint for the world to use and why is it hard to optimize it. The data biologists use tends to be extremely ambiguous and dirty, additionally, scientists are always trying to find new questions to ask, thus why the difficulty regarding the optimization of UniProt, they wouldn’t be offering the right service to their users by optimizing the query patterns. Furthermore, since UniProt is publicly funded, all the data needs to be public.