Running the Semantic Publishing Benchmark on Sesame, a Step-by-Step Guide
Until now we have discussed several aspects of the Semantic Publishing Benchmark (SPB) such as the difference in performance between virtual and real servers configuration, how to choose an appropriate query mix for a benchmark run and our experience with using SPB in the development process of GraphDB for finding performance issues.
In this post we provide a step-by-step guide on how to run SPB using the Sesame RDF data store on a fresh install of Ubuntu Server 14.04.1. The scenario is easy to adapt to other RDF triple stores which support the Sesame Framework used for querying and analyzing RDF data.
We start with a fresh server installation, but before proceeding with setup of the Sesame Data Store and SPB benchmark we need the following pieces of software up and running:
- Apache Ant 1.8 or higher
- OpenJDK 6 or Oracle JDK 6 or higher
- Apache Tomcat 7 or higher
If you already have these components installed on your machine you can directly proceed to the next section: Installing Sesame
Following are sample commands which can be used to install the required software components:
sudo apt-get install git
sudo apt-get install ant
sudo apt-get install default-jdk
sudo apt-get install tomcat7
Optionally Apache Tomcat Server can be downloaded as a zipped file and extracted in a location of choice.
After a successful installation of Apache Tomcat you should be able to get the default splash page “It works” when you open your web browser and enter the following address: http://<your_ip_address>:8080
We will use current Sesame version 2.7.14. You can download it here or run following command:
wget "http://sourceforge.net/projects/sesame/files/Sesame%202/2.7.14/openrdf-s..." -O openrdf-sesame-2.7.14-sdk.tar.gz
Then extract the Sesame tarball:
tar -xvzf openrdf-sesame-2.7.14-sdk.tar.gz
To deploy sesame you have to copy the two war files that are in openrdf-sesame-2.7.14/war to /var/lib/tomcat7/webapps
From openrdf-sesame-2.7.14/war you can do it with command:
cp openrdf-*.war <tomcat_install>/webapps
Sesame applications write and store configuration files in a single directory and the tomcat server needs permissions for it. You can find more information about this directory here.
By default the configuration directory is: /usr/share/tomcat7/.aduna
Create the directory:
sudo mkdir /usr/share/tomcat7/.aduna
Then change the ownership:
sudo chown tomcat7 /usr/share/tomcat7/.aduna
And finally you should give the necessary permissions:
sudo chmod o+rwx /usr/share/tomcat7/.aduna
Now when you go to: http://<your_ip_address>:8080/openrdf-workbench/repositories
You should get a screen like this:
You can download the SPB code and find brief documentation on GitHub:
A detailed documentation is located here:
SPB offers many configuration options which control various features of the benchmark e.g.:
- query mixes
- dataset size
- loading datasets
- number of agents
- validating results
- test conformance to OWL2-RL ruleset
- update rate of agents
Here we demonstrate how to generate a dataset and execute a simple test run with it.
First download the SPB source code from the repository:
git clone https://github.com/ldbc/ldbc_spb_bm.git
Then in the ldbc_spb_bm directory build the project:
If you simply execute the command:
you’ll get a list of all available build configurations for the SPB test driver, but for the purpose of this step-by-step guide, configuration shown above is sufficient.
Depending on generated dataset size a bigger java heap size may be required for the Sesame Store. You can change it by adding following arguments to Tomcat's startup files e.g. in catalina.sh:
export JAVA_OPTS="-d64 -Xmx4G"
To run the Benchmark you need to create a repository in the Sesame Data Store, similar to the following screenshot:
Then we need to point the benchmark test driver to the SPARQL endpoint of that repository. This is done in ldbc_spb_bm/dist/test.properties file.
The default value of datasetSize in the properties is set to be 10M, but for the purpose of this guide we will decrease it to 1M.
You need to change
Also the URLs of the SPARQL endpoint for the repository
First step, before measuring the performance of a triple store, is to load the reference-knowledge data, generate a 1M dataset, load it into the repository and finally generate query substitution parameters.
These are the settings to do that, following parameters will 'instruct' the SPB test driver to perform all the actions described above:
#Benchmark Operational Phases
To run the benchmark execute the following:
java -jar semantic_publishing_benchmark-basic-standard.jar test.properties
When the initial run has finished, we should have a 1M dataset loaded into the repository and a set of files with query substitution parameters.
Next we we will measure the performance of Sesame Data Store by changing some configuration properties:
#Benchmark Configuration Parameters
#Benchmark Operational Phases
After the benchmark test run has finished result files are saved in folder: dist/logs
There you will find three types of results: the result summary of the benchmark run (semantic_publishing_benchmark_results.log), brief results and detailed results.
In semantic_publishing_benchmark_results.log you will find the results distributed per seconds. They should be similar to the listing bellow:
Seconds : 300 (completed query mixes : 0)
9 inserts (avg : 22484 ms, min : 115 ms, max : 81389 ms)
9 operations (9 CW Inserts (0 errors), 0 CW Updates (1 errors), 0 CW Deletions (2 errors))
2 Q1 queries (avg : 319 ms, min : 188 ms, max : 451 ms, 0 errors)
This step-by-step guide gave an introduction on how to setup and run the SPB on a Sesame Data Store. Further details can be found in the reference documentation listed above.
If you have any troubles running the benchmark, don't hesitate to comment or use our social media channels.
In a future post we will go through some of the parameters of SPB and check their performance implications.