The time period up to the 2010 MySQL Users conference was as usual packed with hard work. The last two conferences have been very focused on getting scalability of the MySQL Server and InnoDB improved. This year had a nasty surprise after the conference in the form of ash cloud from an icelandic volcano. When you're tired from hard work and just want to go home and get some rest then the concept of staying at a hotel room with absolutely no idea of when one could return home is no fun at all. So when I finally returned home I was happy that summer was close by, vacation days were available (swedes have a lot of vacation :)).
Now the summer is gone and I am rested up again, ready to take on new challenges and we've had some really interesting meet-ups in the MySQL team to discuss future developments. The renewed energy also is sufficient to write up some of the stories from the work I did during the summer :)
During the summer I had the opportunity to also get some work done on scalability of the MySQL Cluster product as well. Given that I once was the founder of this product it was nice to return again and check where it stands in scalability terms.
The objective was to compare MySQL Cluster to the Memory engine. The result of the exercise was almost obvious from the start. The memory engine having a table lock will have very limited scalability on any workload that contains writes. It will however have very good scalability on read-only workloads as this isn't limited by the table lock since readers don't contend each other. The Cluster engine should have good and fairly even results on read and write workloads.
Much to my surprise the early results showed a completely different story. The Memory engine gave me a performance of about 1-2 tps to start with. The early results of MySQL Cluster was also very dismaying. I covered the Memory engine in a previous blog, so in this blog I will focus on the MySQL Cluster benchmarks.
So the simple task of benchmarking as usual turned into some debugging of where the performance problems comes from.
In the first experiment I used the default configuration of the Fedora Linux OS, I also used the default set-up of the MySQL Cluster storage engine. It turned out that there are huge possibilities in adapting those defaults.
First the Fedora has a feature called cpuspeed. By default this feature is activated. The feature provides power management by scaling down the CPU frequency on an entire socket. The problem is that when you run the MySQL Server with few threads, it doesn't react to the workload and scales down frequency although there is a lot of work to do. So for the MySQL Server in general this means about half the throughput on up to 16 threads. However the impact on MySQL Cluster is even worse. The performance drops severely on all thread counts. Most likely this impact comes from the very short execution times of the NDB data nodes. It's possible that this small execution times is too short to even reach the radar of the power management tools in Linux.
So a simple sudo /etc/init.d/cpuspeed stop generated a major jump in performance of the MySQL Cluster in a simple Sysbench benchmark (all the benchmarks discussed here used 1 data node and all things running on one machine unless otherwise stated).
The next simple step was to add the MySQL Cluster configuration parameter MaxNoOfExecutionThreads to the configuration scripts and set this to the maximum which is 8. This means that one thread will handle receive on the sockets, one thread will handle transaction coordination and four threads will handle local database handling. There will also be a couple of other threads which are of no importance to a benchmark.
These two configuration together added about ~3.5x in increased performance. Off to a good start, but still performance isn't at all where I want it to be.
In NDB there is a major scalability bottleneck in the mutex protecting all socket traffic. It's the NDB API's variant of the big kernel mutex. There is however one method of decreasing the impact of this mutex by turning the MySQL Server into several client nodes from an NDB perspective. This is done by adding the --ndb-cluster-connection-pool parameter when starting the MySQL Server. We achieved the best performance when setting this to 8, in a bigger cluster it would probably make more sense to set it to 2 or 3 since this resolves most of the scalability issues without using up so much nodes in the NDB cluster.
Changing this from 1 to 8 added another ~2x in performance. So now the performance is up a decent 8x from the first experiments. No wonder I was dismayed by the early results.
However the story isn't done yet :)
MySQL Cluster has another feature whereby the various threads can be locked to CPUs. By using this feature we can achieve two things, the first is that the NDB data nodes doesn't share CPUs with the MySQL Server. This has some obvious benefits from CPU cache point of view for both node types. We can also avoid that the data node threads are moved from CPU to CPU which is greatly advantegous in busy workloads. So we locked the data nodes to 6 cores. The configuration variable we used to achieve this is LockExecuteThreadToCpu which is set to a comma separated list of CPU ids.
I also locked the MySQL Server and Sysbench to different set of CPUs using the taskset program available in Linux.
Using this locking of NDB data node threads to CPUs achieved another 80% boost in performance. So the final result gave us a decent 14x performance improvement.
So in summary things that matters greatly to performance of MySQL Cluster for Sysbench with a single data node.
1) Ensure the Linux cpuspeed isn't activated
2) Make sure to set MaxNoOfExecutionThreads to 8
3) Make sure the --ndb-cluster-configuration-pool parameter to the MySQL Server using around 8 nodes per MySQL Server
4) Lock NDB Data node threads to CPUs by using the LockExecuteThreadToCpu.
5) Lock MySQL Server and Sysbench processes to different sets of CPUs from NDB Data nodes and each other.
Doing this experiments also generated a lot of interesting ideas on how to improve things even further.