Thursday, July 31, 2008

1: Making MySQL Cluster scale perfectly in the DBT2 benchmark: Initial discussion

Since 2006 H1 I've been working on benchmarking MySQL
Cluster using the DBT2 test suite. Initially this meant
a fair amount of work on the test suite itself and also
a set of scripts to start and stop NDB data nodes, MySQL
Servers and all the other processes of the DBT2 test.
(These scripts and the DBT2 tests I'm using is available
for download at www.iclaustron.com)

Initially I worked with an early version of MySQL Cluster
based on version 5.1 and this meant that I hit a number
of the performance bugs that had appeared there in the
development process. Nowadays the stability is really good
so in the most case I've spent my time focusing on what
is required to use in the operating system and the
benchmark application for optimum scalability.

Early on I discovered some basic features that were required
to get optimum performance of MySQL Cluster in those cases.
One of them is to simply use partitioning properly. In the
case of DBT2 most tables (everyone except the ITEM table) can
be partitioned on the Warehouse id. So the new feature I
developed as part of 5.1 came in handy here. It's possible to
use both PARTITION BY KEY (warehouse_id) or PARTITION BY
HASH (warehouse_id). Personally I prefer PARTITION BY HASH
since it spreads the warehouses perfectly amongst the data
nodes. However in 5.1 this isn't a fully supported so one has
to start the MySQL Server using the flag --new to use this
feature with MySQL Cluster.

The second one was the ability to use the transaction
coordinator on the same node as the warehouse the
transaction is handling. This was handled by a new
feature introducted in MySQL Cluster Carrier Grade
Edition 6.3 whereby the transaction coordinator is
started on the node where the first query is targeted.
This works perfectly for DBT2 and for many other
applications and it's fairly easy to change your
application if it doesn't fit immediately.

The next feature was to ensure that sending uses as
big buffers as possible and also to avoid wake-up
costs. Both those features meant changes to the
scheduler in the data nodes of the MySQL Cluster.
These changes works very well in most cases where
there is sufficient CPU resources for the data nodes.
This feature was also introduced in MySQL Cluster CGE
version 6.3.

Another feature which is very important to achieve
optimum scalability is to ensure that the MySQL Server
starts scans only on the data nodes where it will
actually find the data. This is done through the use
of partition pruning as introduced in MySQL version
5.1. Unfortunately there was a late bug introduced
which I recently discovered which gave decreased
scalability for DBT2 (this is bug#37934 which contains
a patch which fixes the bug, it hasn't been pushed yet
to any 6.3 version).

With these features there were still a number of scalability
issues remaining in DBT2. One was the obvious one that the
ITEM table is spread on all data nodes and thus reads of the
ITEM table will use network sockets that isn't so "hot".
There are two solutions to this, one is that MySQL Cluster
implements some tables as fully replicated on all data nodes.
This might arrive some time in the future, the other variant
uses standard MySQL techniques. One places the table in
another storage engine, e.g. InnoDB, and uses replication to
spread the updates to all the MySQL Servers in the cluster.
This technique should be a technique that can be applied to
many web applications where there are tables that need to be
in MySQL Cluster to handle availability issues and that the
data is required to be updated through proper transactions, but
there are also other tables which can be updated in a lazy
manner.

Finally there is one more remaining issue and this is when the
MySQL Server doesn't work on partitioned data. That is in the
case of DBT2 if all MySQL Servers can access data in a certain
node group then the data nodes will have more network sockets to
work with which will increase cost of networking. This limits
scalability as well.

In the case of DBT2 this can be avoided by using a spread
parameter that ensures that a certain MySQL Server only uses a
certain node group in the MySQL Cluster. In a generic application
this would be handled by an intelligent load balancer that
ensures that MySQL Servers works on different partitions of
the data in the application.

What I will present in future blogs is some data on how much the
effects mentioned above have on the scalability of the DBT2
benchmark for MySQL Cluster.

What is more surprising is that there is also a number of other
issues related to the use of the operating system which aren't
obvious at all. I will present those as well and what those mean
in terms of scalability for MySQL Cluster using DBT2.

Finally in a real application there will seldom be a perfect
scalability occuring, so in any real application it's also
important to minimize the impact of scalability issues. The
main technology to use here is cluster interconnects and I
will show how the use of cluster interconnects affects
scalability issues in MySQL Cluster.

Note numbers from these DBT2 are merely used to be used here to
compare different configurations of MySQL Cluster.

1 comment:

Unknown said...

Hello,

I'm Susan, of the TechnoSnack's team and I wish to inform you that we are opening a new blog aggregator about Computers & Internet news.
We put it on-line some hours ago and the link is: http://www.technosnack.com.

The main objective of this project is creation of a "virtual dashboard" of posts coming from many specialized blog and information about Computers & Internet world, with news about Linux, Windows, Mac, Open sources, Security, Graphics, Symbian and more on...

The key feature is that news come directly from blogosphere. We wish to show a preview of posts, with a link "Read more..." to signed blogs. If users are interested in news, they are redirected to your blog and can read entire post directly from your blog!


So, the different signed blogs can increase their visibility and reach more visitors, all over the world!

We think that in a little of time it can send more visitors to re gistered blogs, contributing to diffusion of know-how about Computer and Technology world.

I visited your blog and I think it has very interesting and useful posts!


So, are you interested in this idea, with your blog?
If yes, then you can register your blog, using the specific "Registration Form"!

REGISTRATION IS ABSOLUTELY FREE!

The only thing we ask to you is to insert TechnoSNACK banner in your blog to promote this project. Or, if you prefer, you can insert a link in your blogroll.

If you like (we whould be happy, but it is not mandatory :-), you can write a post regarding TechnoSNACK project in your blog, to promote this idea.


Bye!
Susan - TechnoSnack's Team