Sysbench 1.x Fundamentals

Sysbench is the hammer frequently used to benchmark servers and services. Within the past year sysbench version 1 was released and with it came some syntax and usage changes. Unfortunately, there’s not a lot of documentation out there for using Sysbench so I had to learn the hard way. The best way to learn more about Sysbench and how powerful it can be is to search through the issue log on Github for in-depth questions and answers. Hopefully, I can prevent you from experiencing some of the same headaches and ease your learning curve.

Installing Sysbench

Sysbench Resources:
Sysbench on Github
Sysbench 1.0 Release Slides
Sysbench Author’s Blog
Sysbench Issues Log (Github)

Don’t rely on your OS distro to grab and install the current version of Sysbench. Installing from yum or apt you are more likely to get version 0.4 or maybe 0.5. There are adequate installation directions on the Sysbench Github page for most OSes for both binary packages or installing from source. When installing from source pay extra attention to requirements. Also, take note of flags to disable MySQL (--without-mysql) if you don’t have it installed, have it installed in non-standard locations (--with-mysql-includes and --with-mysql-libs), or to enable PostgreSQL (--with-pgsql). If you only intend on testing CPU, Memory and File IO then there is no need to install MySQL on your machine.

Sysbench CPU Tests

When testing your CPU sysbench accepts up to 3 different configuration flags:

--time			How long you want sysbench to run
--threads		Set up to the number of logical cores availabe
--cpu-max-prime		Calculate prime numbers up to this value

By default, sysbench defaults to testing for 10 seconds but you can customize this value for as long as you’d like. What’s happening is it is calculating the number of events that happen within the time specified for the test (either default value or if you pass a --time value). An event is defined by calculating all the prime numbers up to the --cpu-max-prime value.

You want to be sure you set --threads to the number of vCPU cores you have on your cloud server, or on a physical server, you need to account for the total number of logical cores available on your server rather than just the physical cores. As processors assigned processor speeds will differ between providers it’s important to test so you have an apple to apple comparison. It can’t hurt to test from 1 thread up to the number of logical cores you have available but keep in mind testing above that value won’t really give you any valuable data. Multithreaded CPUs, or a cloud server with multiple vCPUs, will not show their true performance unless you run the test with the correct amount of threads. Sysbench defaults to 1 so you want to be careful to define this for your tests.

As mentioned above, --cpu-max-prime defines how much work is done per event. That is, a single event calculates all prime numbers up to the value of  --cpu-max-prime. The default value is 10,000.

The latency stats give you an idea of the performance of your CPU/vCPU. It’s the time it took to calculate prime numbers up to --cpu-max-prime (an event). You can then compare performance by looking at the average latency, which is the average time it took CPU to execute a single event. The total number of events can also be a valid metric of performance so long as the total time is the same. The lower the result, the faster the CPU.

An example output of a CPU test is:

# sysbench --cpu-max-prime=200000 --num-threads=1 --time=60 cpu run
sysbench 1.1.0 (using bundled LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time

Prime numbers limit: 200000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:    12.16

    events/s (eps):                      12.1554
    time elapsed:                        60.0557s
    total number of events:              730

Latency (ms):
         min:                                   79.10
         avg:                                   82.26
         max:                                  184.28
         95th percentile:                       82.96
         sum:                                60052.92

Threads fairness:
    events (avg/stddev):           730.0000/0.00
    execution time (avg/stddev):   60.0529/0.00

Sysbench Memory Tests

For your memory tests, sysbench accepts the following options:

--memory-block-size=SIZE    	size of memory block for test [1K]
--memory-total-size=SIZE    	total size of data to transfer [100G]
--memory-scope=STRING       	memory access scope {global,local} [global]
--memory-hugetlb[=on|off]   	allocate memory from HugeTLB pool [off]
--memory-oper=STRING        	type of memory operations {read, write, none} [write]
--memory-access-mode=STRING 	memory access mode {seq,rnd} [seq]

Prior to sysbench 1.0 I read about some seeing issues with cache-misses when using small block size values (at the default of 1K). I, personally, don’t see that happening on my test system today but it’s worth keeping in mind. Test to block sizes you expect but if you don’t know then perhaps test up to 1M block sizes.

At the end of the day, you really want to compare the number of events (how many times blocks of the specified size were read or written to memory) in the time specified. Default --time is 10 seconds but you can always change that for your tests as needed. Also, the default total size (--memory-total-size) of data read from or written to memory is 100G. Sysbench will stop at whichever value is hit first. Generally, the higher the events per second, the better your memory throughput will be.

Example output of what you can expect:

# sysbench --memory-oper=read --memory-block-size=1M memory run
sysbench 1.1.0 (using bundled LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time

Running memory speed test with the following options:
  block size: 1024KiB
  total size: 102400MiB
  operation: read
  scope: global

Initializing worker threads...

Threads started!

Total operations: 102400 (18505.50 per second)

102400.00 MiB transferred (18505.50 MiB/sec)

    events/s (eps):                      18505.4984
    time elapsed:                        5.5335s
    total number of events:              102400

Latency (ms):
         min:                                    0.05
         avg:                                    0.05
         max:                                    0.24
         95th percentile:                        0.06
         sum:                                 5435.72

Threads fairness:
    events (avg/stddev):           102400.0000/0.00
    execution time (avg/stddev):   5.4357/0.00

Sysbench File IO Tests

Testing file IO performance with sysbench offers a number of configuration options in addition to --time and --threads. You can see these with the command sysbench fileio help:

--file-num=N                  number of files to create [128]
--file-block-size=N           block size to use in all IO operations [16384]
--file-total-size=SIZE        total size of files to create [2G]
--file-test-mode=STRING       test mode {seqwr, seqrewr, seqrd, rndrd, rndwr, rndrw}
--file-io-mode=STRING         file operations mode {sync,async,mmap} [sync]
--file-extra-flags=[LIST,...] list of additional flags to use to open files {sync,dsync,direct} []
--file-fsync-freq=N           do fsync() after this number of requests (0 - don't use fsync()) [100]
--file-fsync-all[=on|off]     do fsync() after each write operation [off]
--file-fsync-end[=on|off]     do fsync() at the end of test [on]
--file-fsync-mode=STRING      which method to use for synchronization {fsync, fdatasync} [fsync]
--file-merged-requests=N      merge at most this number of IO requests if possible (0 -don't merge) [0]
--file-rw-ratio=N             reads/writes ratio for combined test [1.5]

Before you start your testing you need sysbench to prepare the files will use during its tests. This will create the actual files and you need to specify, at the very least, --file-total-size.

# sysbench --file-total-size=12G fileio prepare
sysbench 1.1.0 (using bundled LuaJIT 2.1.0-beta3)

128 files, 98304Kb each, 12288Mb total
Creating files for the test...
Extra file open flags: (none)
Creating file test_file.0
Creating file test_file.1
Creating file test_file.2
Creating file test_file.126
Creating file test_file.127
12884901888 bytes written in 200.46 seconds (61.30 MiB/sec).

You should set --file-total-size to twice your system memory, if not more, to prevent all reads coming from in-memory cache. Real world would have some reads coming from memory but if you want to try to eliminate this as much as possible you can try to balance --file-num with --file-total-size to create larger files. You could also use --file-extra-flags=direct to force direct IO which will be slower because you’re not taking advantage of system buffering. Preparing the same files as above but with direct IO took 758.20 seconds with an average of 16.21 MiB/sec.

You’ll also want to test a few time with increasing your thread count to see where your IOPS and throughput really cap out. At some point, your latency will begin creeping up but you won’t see any gains in throughput. Many times you’ll see you can run 2x or more threads per CPU/vCPU on your system before you no longer see benefits to additional threads. 

The following output is from a random read test with 4k blocks and 2 threads on my AWS t2.micro test instance:

# sysbench --file-total-size=12G --file-block-size=4k --file-test-mode=rndrw --threads=2 fileio run
sysbench 1.1.0 (using bundled LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 2
Initializing random number generator from current time

Extra file open flags: (none)
128 files, 96MiB each
12GiB total file size
Block size 4KiB
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Initializing worker threads...

Threads started!

         read:  IOPS=1865.89 7.29 MiB/s (7.64 MB/s)
         write: IOPS=1243.92 4.86 MiB/s (5.10 MB/s)
         fsync: IOPS=3970.56

Latency (ms):
         min:                                  0.00
         avg:                                  0.28
         max:                                  6.64
         95th percentile:                      0.81
         sum:                              19919.38

When you are done with testing, sysbench can delete all the test files for you. Be sure to pass the --file-total-size flag and value as well:

# sysbench --file-total-size=12G fileio cleanup
sysbench 1.1.0 (using bundled LuaJIT 2.1.0-beta3)

Removing test files...

Sysbench Database Tests

Database tests are implemented via Lua scripts. Technically, you could implement your own CPU, Memory, and IO tests via Lua as well if you so desired but for databases, it is a requirement. However, sysbench ships with a handful of Lua scripts and MySQL support. On my test VM they were installed in /usr/local/share/sysbench with the following provided tests:

-rwxr-xr-x 1 root root 1.5K Mar 16 19:34 bulk_insert.lua
-rw-r--r-- 1 root root  15K Mar 16 19:34 oltp_common.lua
-rwxr-xr-x 1 root root 1.3K Mar 16 19:34 oltp_delete.lua
-rwxr-xr-x 1 root root 2.4K Mar 16 19:34 oltp_insert.lua
-rwxr-xr-x 1 root root 1.3K Mar 16 19:34 oltp_point_select.lua
-rwxr-xr-x 1 root root 1.7K Mar 16 19:34 oltp_read_only.lua
-rwxr-xr-x 1 root root 1.8K Mar 16 19:34 oltp_read_write.lua
-rwxr-xr-x 1 root root 1.1K Mar 16 19:34 oltp_update_index.lua
-rwxr-xr-x 1 root root 1.2K Mar 16 19:34 oltp_update_non_index.lua
-rwxr-xr-x 1 root root 1.5K Mar 16 19:34 oltp_write_only.lua
-rwxr-xr-x 1 root root 1.9K Mar 16 19:34 select_random_points.lua
-rwxr-xr-x 1 root root 2.1K Mar 16 19:34 select_random_ranges.lua

Keep an eye out for a follow-up post with a more in-depth discussion on using sysbench against your database. There are different things you’ll need to keep in mind depending on if your database is on the same instance or remote (say like AWS RDS or Aurora). I’ll keep it a bit more simple for right now.

The basics are you need to have already created a database and an associated user. You will then use sysbench to:

  • Prepare the tables
  • Run the test, and
  • Cleanup after your testing is complete.

When you run the prepare command, sysbench will connect to your database and create the tables needed for your testing. If the database is on the same host you are testing from you would want to specify --mysql-socket. Otherwise you will want to define the --mysql-host and --mysql-port instead. The value of --tables is how many tables sysbench should create and the --table-size value is how many rows to be inserted in each table.

# sysbench --db-driver=mysql --mysql-user=USERNAME --mysql-password=PASSWORD \
--mysql-socket=/var/lib/mysql/mysql.sock --mysql-db=DB_NAME --table_size=25000 \
--tables=250  /usr/local/share/sysbench/oltp_read_only.lua prepare

Once this finishes you can proceed with running your test. The syntax is nearly identical. You are going to swap our ‘prepare’ with ‘run’ and you will add options for --threads (the number of simultaneous threads you want to run) and --time (how many seconds you want the test to execute for). You’ll want to increment up the number of threads until your transactions per second and/or the latency starts to head in a direction you’re not happy with.

# sysbench --db-driver=mysql --mysql-user=USERNAME --mysql-password=PASSWORD \
--mysql-socket=/var/lib/mysql/mysql.sock --mysql-db=DB_NAME --table_size=25000 \
--tables=250 --threads=16 --time=60 /usr/local/share/sysbench/oltp_read_only.lua run

Your output will look something like the following. Again, note you’ll want to pay attention to the transactions per second value and the 95% percentile latency:

sysbench 1.1.0 (using bundled LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 16
Initializing random number generator from current time

Initializing worker threads...

Threads started!

SQL statistics:
    queries performed:
        read:                            1012074
        write:                           0
        other:                           144582
        total:                           1156656
    transactions:                        72291  (1202.41 per sec.)
    queries:                             1156656 (19238.55 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

    events/s (eps):                      1202.4095
    time elapsed:                        60.1218s
    total number of events:              72291

Latency (ms):
         min:                                    7.41
         avg:                                   13.29
         max:                                  222.27
         95th percentile:                       18.28
         sum:                               960486.73

Threads fairness:
    events (avg/stddev):           4518.1875/70.20
    execution time (avg/stddev):   60.0304/0.04

Cleaning up your database of these sysbench tables is quick and easy. Be sure you include the <code>–tables</code> value or it’ll default to 1 and only drop one of the tables:

#sysbench --db-driver=mysql --mysql-user=USERNAME --mysql-password=PASSWORD \
--mysql-socket=/var/lib/mysql/mysql.sock --mysql-db=DB_NAME --tables=250 \ /usr/local/share/sysbench/oltp_read_only.lua cleanup

If you have been using sysbench and have suggestions, other options, or usage feel free to comment below so we can all benefit. Likewise, if you have a question, fire away. I’ll do my best to answer.