Juan Sequeda, Co-founder of Capsenta, gave an interesting talk on how can we integrate data using graphs and semantics (semantic data virtualization). As Mr. Sequeda said, the idea is to integrate data without needing to move it around. Juan started off his presentation talking about the huge gap that exists between the IT departments, guardians of the data and the business development departments, trying to extract insights about the data. He used a clear example to illustrate this gap:
During the 8th TUC Meeting held at Oracle’s facilities in Redwood City, California, Zhe Wu, Software Architect at Oracle Spatial and Graph, explained how is his team trying to bridge RDF Graph and Property Data Models.
After making a brief overview about what is a graph he presented Oracle’s Graph strategy, they basically treat graphs as another data type on every platform (Hadoop, Oracle’s own database and, of course, in the Cloud). He also explained that his team is developing in 3 directions at the same time:
Yinlong started his talk with an introduction of his new position at Huawei, what is the company doing and more specifically how is it involved with Big Data Research and graphs. He also explained that his research center is currently working on Big Data Analytics and Management from 4 sides: Natural Language Processing, Graph analyrics, Machine Learning and Deep Learning. His team at the same time, focuses on 4 market segments that include financial graph analytics, consumer data gathered from smartphones and other portable devices, telecommunications and cloud technology.
George Fletcher, Associate Professor at the Eindhoven University of Technology, presented gMark, an open-source framework for generating synthetic graph instances and workloads. The main focus of gMark has been to tailor different graph data management scenarios, often driven by query workloads. Such as multi-query optimization, workload-driven graph database physical design or mapping discovery and query rewriting data integration systems.
Marcus Paradies, Software developer at SAP extended the talk Arnau Prat gave about the SNB, in this case about the Intelligence workload. In contrast with the 17+4 queries the Interactive workload has, the Business Intelligence (BI) workload consists on 24 queries that can be seen as OLAP-style against the OLTP-style of the Interactive one. The BI focuses on analytic queries and they touch the whole graph.
Sergey Edunov, Software Engineer at Facebook gave a great talk on how and why his company generating large-scale social graphs. The underlying reasons to start such an ambitious project are capacity planning to make sure that their system will be able to handle a graph that keeps growing year after year and fair evaluation of their system against the ones being implemented by other companies.
Weining Qian, professor at East China Normal University presented his talk on Statistical Characteristics of Real-Life Knowledge graphs during the 8th TUC Meeting held at Oracle’s facilities in Redwood City, California.
Qian explained that term knowledge graph was introduced by Google in 2012 and it has been an evolution of the semantic web. Professor Qian then introduced the main question of his talk: how can we efficiently manage knowledge graphs? Are the existing benchmarks sufficient to test them since most of these benchmarks focus only on Social Networks?
Peter Boncz, Research Scientist at the Centrum Wiskunde & Informatica in the Netherlands, talked about the updates on the Graph Query Language Task Force after being alive for a year. This Task Force was created to answer an issue detected during the benchmark meetings, all the workload is created in English text because there is no common graph query language.
Lijun Chang, DECRA Fellow at the University of New South Wales talked about how to make subgraph matching more efficient thanks to postponing Cartesian products. They key problem he explained was the extraction of subgraph isomorphic embeddings. The applications of this process are wide enough to cover protein interaction research, social network analysis and even chemical compound investigation. The testing of subgraph isomorphism is an NP-complete type of problem however, his team is focusing on enumerating all subgraph embeddings which, he explains, is even harder.
During the 8th TUC Meeting Eugene Chong from Oracle USA explained what his team and himself had done to improve RDF query processing in their database.
Jerven Bolleman, Lead Software Developer at Swiss-Prot Group, explained why are they offering a free SPARQL and RDF endpoint for the world to use and why is it hard to optimize it. The data biologists use tends to be extremely ambiguous and dirty, additionally, scientists are always trying to find new questions to ask, thus why the difficulty regarding the optimization of UniProt, they wouldn’t be offering the right service to their users by optimizing the query patterns. Furthermore, since UniProt is publicly funded, all the data needs to be public.
Martin Zand, Professor of Medicine and Public Health Sciences at the Rochester enter for Health Informatics, switched the focus of the presentations talking as a user of graph databases. Zand pinpointed the relevance of using graph in healthcare comparing 3 characteristics of healthcare to their counterpart with graphs:
- Healthcare is delivered by networks.
- Patients traverse those networks.
- The topology of the networks influences outcomes.
The talk of Dr. Zand was structured around the presentation of 3 uses cases:
Tim Hegeman from TU Delft presented a very interesting talk about Social Network Benchmark analytics. Graphalytics is a benchmark developed by TU Delft for graph analytics, complex and holistic graph computations.
As per today, over 100 graph analytics systems exist, Hegeman explains, but they’re not comprehensive and there's where Graphalytics excels. It consists on algorithms and datasets (workload) that have been selected using a 2-stage process to ensure the representativity of the workload. The stages of the process were:
Arnau Prat, Lead Researcher at DAMA-UPC from the Technological University of Catalonia presented a talk on the Interactive Workload of the Social Network Benchmark. One of the key aspects of his talk was the introduction of the SNB Data Generator, tool that generates a Facebook-degree social network distribution (groups, posts, likes…). This synthetic social network follows the principle of homophily, isn’t uniform and allows a fair comparison and reproducibility of benchmark executions while being also scalable by using Apache Hadoop.
Last 22nd and 23rd of June took place the 8th edition of the Technical User Community Meeting held in Oracle headquarters at Redwood Shore (California).
During these two days LDBC hosted more than 20 presentations from key members of the industry such as Oracle, Facebook, Neo4j, SAP or Huawei and research regarding the updates on the work within the council, and graphs & RDF applications. We are going to share all of them as independent blog posts during the following weeks.
Thanks to Oracle for hosting this event!
LDBC is proud to announce the new LDBC Graphalytics Benchmark draft specification. LDBC Graphalytics is the first industry-grade graph data management benchmark.
We are glad to announce the tentative agenda for the next 8th TUC Meeting that will take place in the heart of Silicon Valley at the Oracle Conference Center in Redwood Shores, California on Wednesday and Thursday June 22-23, 2016.
GRADES2016 workshop on Graph Data management Experiences & Systems will be held next 24th of June 2016, just before SIGMOD/PODS 2016, in Redwood Shores in the Oracle Conference Center. In the two days preceding GRADES, Wednesday June 22 and Thursday June 23, LDBC will organize a 2-day Technical User Community (TUC) meeting for academics, industry and practitioners in the area of graph data management.
We are welcoming Martin Junghanns to LDBC blog today sharing his project at Leipzig University to transform the output of LDBC datagen into Apache Flink's data sets that can be used by Flink's DataSet API.
The LDBC consortium is pleased to announce its Seventh Technical User Community (TUC) meeting. This will be a two-day event at IBM Thomas J. Watson Research Center, Yorktown Heights, New York on Monday 9th and 10th of November.
During the second day of the sixth TUC meeting, Boris Motik from University of Oxford presented his talk “Parallel and incremental materialisation of RDF/DATALOG in RDFox”. Like the slides of the other TUC meeting talks, this presentation is available on the LDBC Slideshare profile.
During last TUC&nbsp;Meeting in Barcelona we were glad to welcome Smrati Gupta from CA technologies, a leading company that creates systems software that runs in mainframe, distributed computing, virtual machine and cloud computing environments.
Andreas Both from Unister presented another great talk on the second day of the 6th LDBC Technical User Community (TUC) meeting held in Barcelona. His talk “E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data” revolved around an e-commerce use case.
Moritz Kaufmann from the Technische Universität München (TUM) participated during the last LDBC TUC Meeting in Barcelona with his presentation "LDBC SNB Benchmark Auditing".
During the 6th TUC Meeting in Barcelona we were glad to welcome Arnau Prat from Universitat Politècnica de Catalunya (Barcelonatech) and Sparsity Technologies with the presentation "LDBC Social Network Benchmark Interactive Workload”.
During the second day of the 6th LDBC TUC meeting held in Barcelona we welcomed John Snelson from MarkLogic with his presentation “MarkLogic Overview and Use Cases”.
The number of datasets published in the Web of Data as part of the Linked Data Cloud is constantly increasing. The Linked Data paradigm is based on the unconstrained publication of information by different publishers, and the interlinking of web resources through “same-as” links which specify that two URIs correspond to the same real world object. In the vast number of data sources participating in the Linked Data Cloud, this information is not explicitly stated but is discovered using instance matching techniques and tools.
On the second day of the 6th LDBC TUC Meeting that took place in Barcelona we welcomed Yinglong Xia from IBM Research with his presentation “Recent Updates on IBM System G – GraphBIG and Temporal Data”.
In this post we will look at running the LDBC Social Network Benchmark (SNB) on Virtuoso.
Peter Haase from Metaphacts kicked-off the afternoon presentations during last LDBC TUC Meeting in Barcelona with the presentation "Querying the Wikidata Knowledge Graph"
Next 31st of May the GRADES workshop will take place in Melbourne within the ACM/SIGMOD presentation. GRADES started as an initiative of the Linked Data Benchmark Council in the SIGMOD/PODS 2013 held in New York. Read more information about the event here.
Note: this post is a continuation of "SNB Interactive Part 1 - What is SNB Interactive Really About?" post by Orri Erling.
SNB Interactive is the wild frontier, with very few rules. This is necessary, among other reasons, because there is no standard property graph data model, and because the contestants support a broad mix of programming models, ranging from in-process APIs to declarative query.
LDBC is presenting two papers at the next edition of the ACM SIGMOD/PODS conference held in Melbourne from May 31st to June 4th, 2015. The annual SCM SIGMOD/PODS conference is a leading international forum for database researchers, practitioners, developers, and users to explore cutting-edge ideas and results, and to exchange techniques, tools and experiences.
During last TUC Meeting in Barcelona we were glad to welcome Mark D. Wilkinson from Universidad Politécnica de Madrid with his presentation "SADI: A design-pattern for “native” Linked-Data Semantic Web Services".
Check the slides and video to learn more about how SADI uses OWL and RDF and about SHARE the health research environment that answers SPARQL queries with SADI.
This post is the first in a series of blogs analyzing the LDBC Social Network Benchmark Interactive workload. This is written from the dual perspective of participating in the benchmark design and of building the OpenLink Virtuoso implementation of same.
The second part of presentations during the first day at the TUC Meeting in Barcelona started with the presentation from Jerven Bolleman from the Swiss Institute of Bioinformatics called "20 billion triples in production".
Watch Jerven Bolleman talking about why and how the Uniprot SPARQL endpoint allows working with billion of triples from biological datasets.
To end the first day slot of the morning presentations during last 6th TUC Meeting in Barcelona Claudio Martella from VUA was presenting Lighthouse: Large-scale graph pattern matching on Giraph.
Watch Claudio Martella to learn more about Lighthouse and how it uses Giraph and Cypher.
For the third presentation this last 6th TUC Meeting held in Barcelona we were pleased to welcome Tomer Sagi from HP.
Watch his presentation called "HP Labs: Titan DB on LDBC Interactive" where Tomer introduced the history of the research performed at HP along with his latest work fields.
The second presentation during last LDBC's 6th TUC Meeting that took place in Barcelona was called SPIMBENCH: A Scalable, Schema-Aware, Instance Matching Benchmark for the Semantic Publishing Domain and presented by Tzanina Saveta from FORTH.
Watch Tzanina Saveta presenting the benchmark for the Semantic Publishing domain.
In a previous 3-part blog series we touched upon the difficulties of executing the LDBC SNB Interactive (SNB) workload, while achieving good performance and scalability. What we didn't discuss is why these difficulties were unique to SNB, and what aspects of the way we perform workload execution are scientific contributions - novel solutions to previously unsolved problems. This post will highlight the differences between SNB and more traditional database benchmark workloads. Additionally, it will motivate why we chose to develop a new wo
Last 19th and 20th of March took place the sixth edition of the Technical User Community Meeting held in Barcelona. During these two days LDBC hosted more than 15 presentations from key members of the industry and research regarding graphs and RDF that we are going to share as independent blog posts during the following weeks.
Watch the first presentation with Venelin Kotsev presenting the details of the evolution of the SPB.
As discussed in previous posts, one of the features that makes Datagen more realistic is the fact that the activity volume of the simulated Persons is not uniform, but forms spikes. In this blog entry I want to explain more in depth how this is actually implemented inside of the generator.
This blog entry is about one of the features of DATAGEN that makes it different from other synthetic graph generators that can be found in the literature: the community structure of the graph.
Why do leading media companies, like the BBC and publishers, from FT to DK and Elsevier, use triplestores?
The Linked Data paradigm has become the prominent enabler for sharing huge volumes of data using Semantic Web technologies, and has created novel challenges for non-relational data management systems, such as RDF and graph engines. Efficient data access through queries is perhaps the most important data management task, and is enabled through query optimization techniques, which amount to the discovery of optimal or close to optimal execution plans for a given query.
When talking about DATAGEN and other graph generators with social network characteristics, our attention is typically borrowed by the friendship subgraph and/or its structure. However, a social graph is more than a bunch of people being connected by friendship relations, but has a lot more of other things is worth to look at. With a quick view to commercial social networks like Facebook, Twitter or Google+, one can easily identify a lot of other elements such as text images or even video assets. More importantly, all these elements form other subgraphs within the social network!
The 5th LDBC Technical User Community (TUC) meeting took place in Athens on 14.11.2014 being well attended by both Graph and RDF databases industry and academia. In the morning session, members of the LDBC project gave an update on the status of the project and its benchmarks:
The SNB Driver part 1 post introduced, broadly, the challenges faced when developing a workload driver for the LDBC SNB benchmark. In this blog we'll drill down deeper into the details of what it means to execute "dependent queries" during benchmark execution, and how this is handled in the driver. First of all, as many driver-specific terms will be used, below is a listing of their definitions. There is no need to read them in detail, it is just there to serve as a point of reference.