Introduction
MariaDB is a drop-in replacement for MySQL. MariaDB Galera Cluster is a synchronous multi-master cluster for MariaDB. It is available on Linux only, and only supports the XtraDB/InnoDB storage engines.
This blog post explains step by step how to setup MariaDB Galera Cluster 10.0 with 3 nodes running on CentOS 6.5 x86_64 resulting in a HA (high-availability) database cluster.
Cluster details
I’m using 3 freshly deployed VMs running a minimal install of CentOS 6.5 x86_64.
Cluster node 1 has hostname db1 and IP address 10.0.0.9
Cluster node 2 has hostname db2 and IP address 10.0.0.11
Cluster node 3 has hostname db3 and IP address 10.0.0.13
Step 1 – set SELinux in permissive mode
It’s currently not possible to have SELinux in enforcing mode because there are no policies (yet) for MariaDB Galera Cluster 10.0. I have filed BZ1117233. Hopefully the SELinux developers will soon add proper policies. For now put SELinux into permissive mode on all nodes:
1 |
sudo setenforce 0 |
Update: SELinux policies for MariaDB Galera 10 will be in RHEL 6.6 and CentOS 6.6.
Step 2 – configure the firewall
MariaDB Galera Cluster on each cluster node must be able to talk to MariaDB Galera Cluster on the other nodes so we need to open some firewall ports. Execute the following commands on all nodes:
1 2 3 4 |
sudo lokkit --port 3306:tcp sudo lokkit --port 4444:tcp sudo lokkit --port 4567:tcp sudo lokkit --port 4568:tcp |
Please not that these rules allow any host on your local IP network to access the database nodes. For production make it more secure by specifying the source IP range from which connections to the database nodes are accepted.
Step 3 – install the MariaDB repository
I’m using the MariaDB Galera Cluster RPM packages from mariadb.org. Add the MariaDB RPM repository by executing the following command on all nodes:
1 2 3 4 5 6 7 8 9 |
sudo cat >> /etc/yum.repos.d/MariaDB.repo << EOF # MariaDB 10.0 CentOS repository list - created 2014-07-02 22:44 UTC # http://mariadb.org/mariadb/repositories/ [mariadb] name = MariaDB baseurl = http://yum.mariadb.org/10.0/centos6-amd64 gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB gpgcheck=1 EOF |
Step 4 – install MariaDB Galera Cluster 10.0 software
If you did a CentOS 6 minimal installation then make sure you install the socat package from the EPEL repository before proceeding with installing the MariaDB Galera Cluster 10.0 software. You can find the socat package at http://dl.fedoraproject.org/pub/epel/6/x86_64/repoview/socat.html.
Either install it directly from EPEL with the following command (for x86_64):
1 |
sudo yum install http://dl.fedoraproject.org/pub/epel/6/x86_64/socat-1.7.2.3-1.el6.x86_64.rpm |
Or install the EPEL repository and then install it the regular way:
1 2 |
sudo yum install epel-release sudo yum install socat |
On CentOS 7 the socat package is part of the distro and is installed as part of the minimal install so should already be there. If it’s not then just install it with:
1 |
sudo yum install socat |
Install the MariaDB Galera Cluster 10.0 software by executing the following command on all nodes:
1 |
sudo yum install MariaDB-Galera-server MariaDB-client rsync galera |
Step 5 – setup MariaDB security
The default installation is not very secure so let’s improve that. Start the mysql service (yes the init script in MariaDB 10.0 is still called mysql) on all nodes:
1 |
sudo service mysql start |
Next we need to run the mysql_secure_installation script so we can improve the security. Execute the following command on all nodes:
1 |
sudo /usr/bin/mysql_secure_installation |
For this tutorial I changed the password to ‘dbpass’ and accepted all defaults (so answered yes to all questions).
Step 6 – setup MariaDB Galera Cluster users
Now we need to setup some users that must be able to access the database. The ‘sst_user’ is the user which a database node will use for authenticating to another database node in the State Transfer Snapshot (SST) phase. Execute the following command on all nodes:
1 2 3 4 5 6 7 |
mysql -u root -p mysql> DELETE FROM mysql.user WHERE user=''; mysql> GRANT ALL ON *.* TO 'root'@'%' IDENTIFIED BY 'dbpass'; mysql> GRANT USAGE ON *.* to sst_user@'%' IDENTIFIED BY 'dbpass'; mysql> GRANT ALL PRIVILEGES on *.* to sst_user@'%'; mysql> FLUSH PRIVILEGES; mysql> quit |
Please not that the ‘%’ means that the root or sst_user is allowed to access the database from any host. For production make it more secure by specifying the hostname(s) or IP addresses from which those users can access the database.
Step 7 – create the MariaDB Galera Cluster config
First stop the mysql services on all nodes:
1 |
sudo service mysql stop |
Now let’s create the MariaDB Galera Cluster configuration by executing the following command on all nodes (read the IMPORTANT NOTE after the config about required changes for db2, and db3):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
sudo cat >> /etc/my.cnf.d/server.cnf << EOF binlog_format=ROW default-storage-engine=innodb innodb_autoinc_lock_mode=2 innodb_locks_unsafe_for_binlog=1 query_cache_size=0 query_cache_type=0 bind-address=0.0.0.0 datadir=/var/lib/mysql innodb_log_file_size=100M innodb_file_per_table innodb_flush_log_at_trx_commit=2 wsrep_provider=/usr/lib64/galera/libgalera_smm.so wsrep_cluster_address="gcomm://10.0.0.9,10.0.0.11,10.0.0.13" wsrep_cluster_name='galera_cluster' wsrep_node_address='10.0.0.9' wsrep_node_name='db1' wsrep_sst_method=rsync wsrep_sst_auth=sst_user:dbpass EOF |
IMPORTANT NOTE: when executing this command on db2 and db3 do not forget to adjust the wsrep_node_address and wsrep_node_name variables.
On db2 they should be:
1 2 |
wsrep_node_address='10.0.0.11' wsrep_node_name='db2' |
On db3 they should be:
1 2 |
wsrep_node_address='10.0.0.13' wsrep_node_name='db3' |
Step 8 – initialize the first cluster node
On node db1 only start MariaDB with the special ‘‐‐wsrep-new-cluster’ option so the primary node of the cluster is initialized:
1 |
sudo /etc/init.d/mysql start --wsrep-new-cluster |
Check if everything went ok by executing the following command on db1 only:
1 |
mysql -u root -p -e "show status like 'wsrep%'" |
And the output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
+------------------------------+--------------------------------------+ | Variable_name | Value | +------------------------------+--------------------------------------+ | wsrep_local_state_uuid | 9fc9f2d9-02a7-11e4-8042-b34db0fd0791 | | wsrep_protocol_version | 5 | | wsrep_last_committed | 34476 | | wsrep_replicated | 0 | | wsrep_replicated_bytes | 0 | | wsrep_repl_keys | 0 | | wsrep_repl_keys_bytes | 0 | | wsrep_repl_data_bytes | 0 | | wsrep_repl_other_bytes | 0 | | wsrep_received | 2 | | wsrep_received_bytes | 135 | | wsrep_local_commits | 0 | | wsrep_local_cert_failures | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_avg | 0.000000 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_avg | 0.500000 | | wsrep_local_cached_downto | 18446744073709551615 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_paused | 0.000000 | | wsrep_flow_control_sent | 0 | | wsrep_flow_control_recv | 0 | | wsrep_cert_deps_distance | 0.000000 | | wsrep_apply_oooe | 0.000000 | | wsrep_apply_oool | 0.000000 | | wsrep_apply_window | 0.000000 | | wsrep_commit_oooe | 0.000000 | | wsrep_commit_oool | 0.000000 | | wsrep_commit_window | 0.000000 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_cert_index_size | 0 | | wsrep_causal_reads | 0 | | wsrep_cert_interval | 0.000000 | | wsrep_incoming_addresses | 10.0.0.9:3306 | | wsrep_cluster_conf_id | 1 | | wsrep_cluster_size | 1 | | wsrep_cluster_state_uuid | 9fc9f2d9-02a7-11e4-8042-b34db0fd0791 | | wsrep_cluster_status | Primary | | wsrep_connected | ON | | wsrep_local_bf_aborts | 0 | | wsrep_local_index | 0 | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 25.3.5(rXXXX) | | wsrep_ready | ON | | wsrep_thread_count | 2 | +------------------------------+--------------------------------------+ |
The important information are the following lines:
1 2 3 4 |
wsrep_local_state_comment | Synced <-- cluster is synced wsrep_incoming_addresses | 10.0.0.9:3306 <-- node db1 is a provider wsrep_cluster_size | 1 <-- cluster consists of 1 node wsrep_ready | ON <-- good :) |
There is also a lot of information in the logfile which is located in /var/lib/mysql and it’s called db1.<FQDN>.err. Here are the last couple of lines after initializing db1:
1 2 3 4 |
140708 21:04:08 [Note] WSREP: Synchronized with group, ready for connections 140708 21:04:08 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 140708 21:04:08 [Note] /usr/sbin/mysqld: ready for connections. Version: '10.0.12-MariaDB-wsrep' socket: '/var/lib/mysql/mysql.sock' port: 3306 MariaDB Server, wsrep_25.10.r4002 |
Step 9 – add the other cluster nodes
Make 100% sure that both nodes db2 and db3 have the correct configuration in /etc/my.cnf.d/server.cnf under the [mariadb-10.0] heading as described in step 7.
With the correct configuration in place, all that is required to make db2 and db3 a member of the cluster is to start them like you would start any regular service. On db2 issue the following command:
1 |
sudo service mysql start |
On node db2 check /var/lib/mysql/db2.<FQDN>.err and you should see:
1 2 3 4 |
140708 21:20:30 [Note] WSREP: Member 1.0 (db2) synced with group. 140708 21:20:30 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 34476) 140708 21:20:30 [Note] WSREP: New cluster view: global state: 9fc9f2d9-02a7-11e4-8042-b34db0fd0791:34476, view# 2: Primary, number of nodes: 2, my index: 1, protocol version 3 140708 21:20:30 [Note] WSREP: SST complete, seqno: 34476 |
And on node db1 check /var/lib/mysql/db1.<FQDN>.err and you should see:
1 |
140708 21:20:30 [Note] WSREP: Member 1.0 (db2) synced with group. |
Check what has changed in the cluster status by executing the following command on db1 or db2:
1 |
mysql -u root -p -e "show status like 'wsrep%'" |
And you should see that node db2 is now known as the cluster size is ‘2’ and the IP address of node db2 is listed:
1 2 3 4 5 |
| wsrep_local_state_comment | Synced | | wsrep_incoming_addresses | 10.0.0.9:3306,10.0.0.11:3306 | | wsrep_cluster_size | 2 | | wsrep_connected | ON | | wsrep_ready | ON | |
Repeat the same step for node db3. On node db3 only execute the following command:
1 |
sudo service mysql start |
On node db3 check /var/lib/mysql/db3.<FQDN>.err and you should see something like this:
1 2 3 4 5 6 7 8 |
140708 21:32:04 [Note] WSREP: 1.0 (db1): State transfer to 0.0 (db3) complete. 140708 21:32:04 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 34476) 140708 21:32:04 [Note] WSREP: Member 1.0 (db1) synced with group. 140708 21:32:04 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 34476) 140708 21:32:04 [Note] WSREP: Synchronized with group, ready for connections 140708 21:32:04 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 140708 21:32:08 [Note] WSREP: 0.0 (db3): State transfer from 1.0 (db1) complete. 140708 21:32:08 [Note] WSREP: Member 0.0 (db3) synced with group. |
Check what has changed in the cluster status by executing the following command on for example db1:
1 |
mysql -u root -p -e "show status like 'wsrep%'" |
And you should see that node db3 is now known as the cluster size is ‘3’ and the IP address of node db3 is listed:
1 2 3 4 5 |
| wsrep_local_state_comment | Synced | | wsrep_incoming_addresses | 10.0.0.13:3306,10.0.0.9:3306,10.0.0.11:3306 | | wsrep_cluster_size | 3 | | wsrep_connected | ON | | wsrep_ready | ON | |
Step 10 – verify replication
The cluster is now up and running. Let’s check if it’s actually working. On db1 create a database ‘clustertest’ by issuing the following command:
1 2 3 |
mysql -u root -p -e 'CREATE DATABASE clustertest;' mysql -u root -p -e 'CREATE TABLE clustertest.mycluster ( id INT NOT NULL AUTO_INCREMENT, name VARCHAR(50), ipaddress VARCHAR(20), PRIMARY KEY(id));' mysql -u root -p -e 'INSERT INTO clustertest.mycluster (name, ipaddress) VALUES ("db1", "10.0.0.9");' |
Check if the database, table and data exists:
1 2 3 4 5 6 7 |
mysql -u root -p -e 'SELECT * FROM clustertest.mycluster;' Enter password: +----+------+-----------+ | id | name | ipaddress | +----+------+-----------+ | 2 | db1 | 10.0.0.9 | +----+------+-----------+ |
That looks good. Now do the check on node db2 to see if it good replicated:
1 2 3 4 5 6 7 |
mysql -u root -p -e 'SELECT * FROM clustertest.mycluster;' Enter password: +----+------+-----------+ | id | name | ipaddress | +----+------+-----------+ | 2 | db1 | 10.0.0.9 | +----+------+-----------+ |
Everything was successfully replicated from db1 to db2. Now do the same check on node db3:
1 2 3 4 5 6 7 |
mysql -u root -p -e 'SELECT * FROM clustertest.mycluster;' Enter password: +----+------+-----------+ | id | name | ipaddress | +----+------+-----------+ | 2 | db1 | 10.0.0.9 | +----+------+-----------+ |
As you can see everything was successfully replicated by node db1 across all other nodes.
Step 11 – starting a cluster that is completely down
If the entire cluster (each node) is down then this is how you start the cluster.
First decide which node you consider the primary node. Normally that would be the node with the database which you consider (the most) correct. Then start that node with the following command. Note that the command will not touch or delete data. It will just start the first node as the primary cluster node.
1 |
sudo /etc/init.d/mysql start --wsrep-new-cluster |
Once the primary node is successfully started you can start the other nodes in the cluster with the regular command:
1 |
sudo /etc/init.d/mysql start |
And it’s probably a good idea to check if the cluster is fully operational by looking on all cluster nodes at the output of this command:
1 |
mysql -u root -p -e "show status like 'wsrep%'" |
Step 12 – MariaDB Galera limitations
MariaDB Galera Cluster software has some known limitations. Make sure to read them.
Severalnines has an excellent explanation what those limitations mean in practical terms and how they can be dealt with in their “Avoiding Deadlocks in Galera…” article.
A workaround to prevent the rollbacks and deadlocks is to use a single node for high concurrency writes and send updates to only that node while using multiple nodes for reads. This can be realized by adding HAProxy & Keepalived to the setup described in this tutorial and is also available on this blog.