When experimenting with the Google patches a few years ago I found that tcmalloc had a fairly large impact on performance of the MySQL Server. So the question I asked myself was obviously whether the libc malloc have regained some of the lost territory (also had regressions of major drop in performance in certain libc versions). Using tcmalloc used to have a 5-10% positive impact on performance, the matter of the fact is that this gain remains. I lost 8-10% in performance on all thread counts tested (16, 32, 64, 128 and 256) by not using tcmalloc in running Sysbench RW.
The experiments are performed on a fairly high-end x86 box with 4 sockets. I run the sysbench program on the same machine as the MySQL Server runs on. So this means that it's interesting to check whether I get better performance by locking the MySQL Server to 3 of the 4 sockets and let sysbench use its own socket compared to not control CPU usage at all.
What I discovered is a mixed picture. Performance when locking to CPU's was much more stable although top performance was better without locking. Performance at 16 threads improved 3% and at 32 threads it improved 7%. But at higher thread counts the performance was better for the locked scenario, 10% at 64 threads and 4% at 256 threads. I used the Linux feature taskset to lock the MySQL Server and Sysbench to certain CPUs.
So the conclusion is that locking to CPUs gives a more stable environment. When the number of threads increases the scheduler is allowed to use more CPUs than what is beneficial for MySQL execution. I've seen this also in other experiments that making sure that MySQL reuses the CPU caches as much as possible is very important for performance. Thus when MySQL competes with other programs on use of CPUs and there are many concurrent MySQL threads it's usually not beneficial to performance since the CPU caches will be too cold.
Using Unix sockets instead of TCP/IP sockets is very beneficial for MySQL performance still. I haven't made any recent experiments in this area but the difference is definitely significant. I have also seen OS bottlenecks sometimes appear when using TCP/IP sockets. This is an area for further investigation which I have had on my TODO list for a while. It's also interesting to experiment with different communication mechanisms when the Sysbench program and MySQL runs on different computers. However this is for future testing.