Technical documentation, code, features about Graph and RDF benchmarking
LDBC develops all its benchmarks in open source and invites the developer community to participate. Either by downloading, compiling and using the benchmark data generators and drivers, or by reporting on results and issues in our forum. But, LDBC is also open for code contributions and suggestions for improvement.
The wiki of the TUC (Technical User Community) of LDBC is the place where preliminary information about benchmark designs are shared, and this wiki is public.
It is possible to physically meet LDBC as part of the TUC meetings that it organizes. TUC meetings are 1-day gatherings in which users of graph and RDF technologies come together to talk about data management challenges they face. In these meetings, LDBC also presents the state of progress of its benchmark development task forces and solicits user feedback on these.
We provide below a list of events of interest to graph data management developers:
- 14 November 2014, the upcoming LDBC TUC meeting is going to be held in Athens (Greece). You are invited to contribute your experiences on addressing graph data management problems. Special theme is an experiences track with the draft SNB and SPB benchmarks. LDBC task forces will update the audience on the latest benchmark development news.
Related Benchmarking Projects
Here we discuss important related benchmarking projects, and shortly explain the commonalities and differences with ongoing LDBC work in the LDBC Social Network Benchmark (SNB) and Semantic Publishing Benchmark (SPB).
graph benchmarking projects:
- RMAT generator produces complex graphs that have certain social network-like properties (power laws, short diameter). However, there is little emphasis on attribute values in the graph nor on value correlations or correlations between attribute values and graph structure (as in SNB).
- GRAPH500: this is becoming an important High Performance Computing (HPC) benchmark. For the taste of LDBC, there is little attention to the actual data (model, value distribution and correlations), nor on database functionality. Still, it is relevant for the future Graph Analytics workload of SNB.
- graphbench.org is a new initiative to create benchmarks for large-scale graph algorithms (such as clustering, PageRank). LDBC is cooperating with graphbench.org in the development of the Graph Analytics workload of SNB.
- graphitti is a research effort similar to graphbench.org which focuses on Graph Analytics in the form of algorithms (not database queries). LDBC is also cooperating with the graphitti folks with respect to the future Graph Analytics workload of SNB.
- LinkBench is a benchmark proposed by Facebook to mimic its workload on the Facebook friends graph stored in MySQL (behind TAO/memcached). This benchmark is really MySQL focused and the workload consists of small changes and lookups to the graph structure only. The LinkBench graph structure itself is not an accurate model of the Facebook graph, it is tuned so that just the MySQL workload is realistic. Its data generator is not intended for other workloads such as BI or Graph Analytics.
RDF benchmarking projects:
- BSBM: This is arguably the most advanced SPARQL benchmark (before LDBC SPB ;-), and consists of Explore/Update (OLTP) and Business Intelligence (BI) workloads. This benchmark has already influenced RDF systems to get better transactional and analytical performance and accelerated implementation of SPARQL1.1 (whose features are required for the BI workload). Regrettably, BSBM has very regular data, so that SQL systems, which can also execute the queries, are always in an advantage. The BI workload has been developed earlier by current LDBC members.
- DBpedia Benchmark: this benchmark is very nice as it uses real and highly correlated data. Its workload, however, is rather simple and only contains short lookup queries. This is a read-only benchmark.
- LUBM: this benchmark has the edge in terms of its use of reasoning (it has an ontology and does OWL reasoning). Query-wise it is restricted as it is pre SPARQL1.1 and thus does not do aggregations or subqueries. This is a read-only benchmark (no updates).
- SP2bench: though the data model is based on the DBLP scientific paper dataset which is highly correlated, SP2bench regrettably populates its dataset with uniform data. The queries are a very mixed bag with some returning huge results, other very small. This is a read-only benchmark.