2 Node Cluster

Overview

In this scenario, we are going to set up two Storware Backup & Recovery servers in High Availability, Active/Passive mode. This is possible by using techniques such as a pacemaker, corosync, and DRBD. At least a basic understanding of these is highly desirable. This how-to is intended for RPM-based systems such as Red Hat / CentOS. If you run Storware Backup & Recovery on a different OS, you may need to refer to your distribution docs.

Our environment is built of the following elements:

storware1 - first Storware Backup & Recovery server + Storware Backup & Recovery node, IP: 10.40.1.50
storware2 - second Storware Backup & Recovery server + Storware Backup & Recovery node, IP: 10.40.1.52
Cluster IP: 10.40.1.100 - We will use this IP to connect to our active Storware Backup & Recovery service. This IP will float between our servers and will point to an active instance.
DRBD (optionally with VDO) for data replication and deduplication between nodes.
MariaDB master <-> master replication

HA cluster setup

Preparing the environment

Stop and disable the Storware Backup & Recovery server, node and database as the cluster will manage these resources.

systemctl disable vprotect-server vprotect-node mariadb

Use yum to check if you have any updates pending

# yum update

It is a good idea to check /etc/hosts, especially if you installed Storware Backup & Recovery using the All in one quick installation method, as you might find an entry such as:
```
127.0.0.1 <your_hostname_here>
```
Delete it as this prevents the cluster from functioning properly (your nodes will not "see" each other).

Now we can proceed with installation of the required packages.

On both servers run

# yum install -y pacemaker pcs psmisc policycoreutils-python

Add a firewall rule to allow HA traffic - TCP ports 2224, 3121, and 21064, and UDP port 5405 (both servers)

# firewall-cmd --permanent --add-service=high-availability
success
# firewall-cmd --reload
success

While testing, depending on your environment, you may encounter problems related to network traffic, permissions, etc. While it might be a good idea to temporarily disable the firewall and SELinux, we do not recommend disabling that mechanism in the production environment as it creates significant security issues. If you choose to disable the firewall, bear in mind that Storware will no longer be available on ports 80/443. Instead, connect to ports 8080/8181 respectively.

# setenforce 0
# sed -i.bak "s/SELINUX=enforcing/SELINUX=permissive/g" /etc/selinux/config
# systemctl mask firewalld.service
# systemctl stop firewalld.service
# iptables --flush

Enable and start PCS daemon

# systemctl enable pcsd.service
# systemctl start pcsd.service

Cluster configuration

Earlier installation of a pcs package automatically creates a user hacluster with no password authentication. While this may be good for running locally, we will require a password for this account to perform the rest of the configuration, so let's

configure the same password on both nodes

# passwd hacluster
Changing password for user hacluster.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.

Corosync configuration

On node 1, issue a command to authenticate as a hacluster user:

[root@vprotect1 ~]# pcs cluster auth vprotect1 vprotect2
Username: hacluster
Password:
vprotect1: Authorized
vprotect2: Authorized

Generate and synchronize the corosync configuration

[root@vprotect1 ~]# pcs cluster setup --name mycluster vprotect1 vprotect2

Take a look at your output, which should look similar to below:

Destroying cluster on nodes: vprotect1, vprotect2...
vprotect1: Stopping Cluster (pacemaker)...
vprotect2: Stopping Cluster (pacemaker)...
vprotect1: Successfully destroyed cluster
vprotect2: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'vprotect1', 'vprotect2'
vprotect1: successful distribution of the file 'pacemaker_remote authkey'
vprotect2: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
vprotect1: Succeeded
vprotect2: Succeeded

Synchronizing pcsd certificates on nodes vprotect1, vprotect2...
vprotect1: Success
vprotect2: Success
Restarting pcsd on the nodes in order to reload the certificates...
vprotect1: Success
vprotect2: Success

Enable and start your new cluster

[root@vprotect1 ~]# pcs cluster start --all && pcs cluster enable --all
vprotect1: Starting Cluster (corosync)...
vprotect2: Starting Cluster (corosync)...
vprotect1: Starting Cluster (pacemaker)...
vprotect2: Starting Cluster (pacemaker)...
vprotect1: Cluster Enabled
vprotect2: Cluster Enabled

OK! We have our cluster enabled. We have not created any resources (such as a floating IP) yet, but before we proceed we still have a few settings to modify.

Because we are using only two nodes, we need to

disable default quorum policy

(this command should not return any output)

[root@vprotect1 ~]# pcs property set no-quorum-policy=ignore

We should also

define default failure settings

[root@vprotect1 ~]# pcs resource defaults failure-timeout=30s
[root@vprotect1 ~]# pcs resource defaults migration-threshold=3

These two settings combined will define how many failures can occur for a node to be marked as ineligible for hosting a resource and after what time this restriction will be lifted. We define the defaults here, but it may be a good idea to also set these values at the resource level, depending on your experience.

As long we are not using any fencing device in our environment (and here we are not) we need to:

disable stonith

[root@vprotect1 ~]# pcs property set stonith-enabled=false && crm_verify -L

The second part of this command verifies running-config. These commands normally do not return any output.

Resource creation

Finally, we have our cluster configured, so it's time to proceed to

resource creation

First, we will create a resource that represents our floating IP 10.40.1.100. Adjust your IP and cidr_netmask, and you're good to go.

IMPORTANT: From this moment on we need to use this IP when connecting to our vProtect server.

[root@vprotect1 ~]# pcs resource create "Failover_IP" ocf:heartbeat:IPaddr2 ip=10.40.1.100 cidr_netmask=22 op monitor interval=30s

Immediately, we should see our IP is up and running on one of the nodes (most likely on the one we issued this command for).

[root@vprotect1 ~]# ip a
[..]
2: ens160:  mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:50:56:a6:9f:c6 brd ff:ff:ff:ff:ff:ff
    inet 10.40.1.50/22 brd 10.40.3.255 scope global ens160
       valid_lft forever preferred_lft forever
    inet 10.40.1.100/22 brd 10.40.3.255 scope global secondary ens160
       valid_lft forever preferred_lft forever
    inet6 fe80::250:56ff:fea6:9fc6/64 scope link
       valid_lft forever preferred_lft forever

As you can see, our floating IP 10.40.1.100 has been successfully assigned as the second IP of interface ens160. This is what we wanted!

We should also check if the Storware Backup & Recovery web interface is up and running. We can do this by opening the web browser and typing in https://10.40.1.100.

The next step is to

define a resource responsible for monitoring network connectivity

[root@vprotect1 ~]# pcs resource create ping ocf:pacemaker:ping dampen=5s multiplier=1000 host_list=10.40.0.1 clone
[root@vprotect1 ~]# pcs constraint location Failover_IP rule score=-INFINITY pingd lt 1 or not_defined pingd

Note that you need to use your gateway IP in the host_list parameter

Finally, we have to define a set of cluster resources responsible for other services crucial for Storware as Storware Node and the Storware server itself. We will logically link these services with our floating IP. Whenever the floating IP disappears from our server, these services will be stopped. We also have to define the proper order for services to start and stop, as for example starting the Storware-server without a running database makes little sense.

Resource creation

[root@vprotect1 ~]#  pcs resource create "vProtect-node" systemd:vprotect-node op monitor timeout=300s on-fail="stop" --group vProtect-group
[root@vprotect1 ~]# pcs resource create "vProtect-server" service:vprotect-server op start on-fail="stop" timeout="300s" op stop timeout="300s" on-fail="stop" op monitor timeout="300s" on-fail="stop" --group vProtect-group

It is OK for these commands not to return any output.

Resource colocation

[root@vprotect1 ~]# pcs constraint colocation add Failover_IP with vProtect-group

To finish with, we can set which server is more preferred for running our services

Set node preference

[root@vprotect1 ~]# pcs constraint location Failover_IP prefers vprotect1=INFINITY
[root@vprotect1 ~]# pcs constraint location vProtect-group prefers vprotect1=INFINITY

We have made it to the end. At this point, our pacemaker HA cluster is functional.

However, there are still two things we need to consider, that is:

Creating DB replication
Setting up DRBD for /vprotect_data (optionally with VDO)

Setting up VDO+DRBD

In this section, we will prepare our deduplicated and replicated filesystem mounted in /vprotect_data.

Using a deduplicated FS is optional but highly recommended. If you don't intend to use it, skip the part regarding VDO configuration.

Note: If you are altering existing Stoware Backup & Recovery configuration it is very important to preserve the /vprotect_data contents and transfer them to the new filesystem. You may also need to re-create your backup_destination if you previously had one in this directory. Setting up VDO and DRBD will cause all data to be wiped from the configured volume.

Installation is split into the steps below that you need to follow to get the job done.

Stop the Storware server and node

# systemctl stop vprotect-server vprotect-node

No output means everything went OK.

On both nodes install the equired repositories and packages

# rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
# rpm -Uvh https://www.elrepo.org/elrepo-release-7.0-4.el7.elrepo.noarch.rpm
Retrieving https://www.elrepo.org/elrepo-release-7.0-4.el7.elrepo.noarch.rpm
Preparing...                          ################################# [100%]
Updating / installing...
   1:elrepo-release-7.0-4.el7.elrepo  ################################# [100%]

The next command can produce quite a few lines, so I've truncated the output, however the idea is simple: install drbd packages:

[root@vprotect1 ~]# yum install -y kmod-drbd84 drbd84-utils

Installed:
drbd84-utils.x86_64 0:9.6.0-1.el7.elrepo                                               kmod-drbd84.x86_64 0:8.4.11-1.1.el7_6.elrepo

If you have not disabled SELinux and the firewall, remember to

configure them on both nodes

# semanage permissive -a drbd_t
# firewall-cmd --add-port=7788/tcp --permanent
success
# firewall-cmd --complete-reload
success

Don't forget to repeat these steps on the second node

Now that we have the necessary software installed, we must prepare an identical size block device on both nodes. A block device can be a hard drive, a hard drive partition, software RAID, LVM Volume, etc. In this scenario, we are going to use a hard drive connected as /dev/sdb.

To add a DRBD resource we create the file /etc/drbd.d/vprotect.res with the content below. Be sure to change the "address" so that t reflects your network configuration.

Also, the node names (storware1 and storware2) must match your uname -n output.

resource replicate {
protocol C;
    on vprotect1 {
                device /dev/drbd0;
                disk /dev/sdb;
                address 10.40.1.50:7788;
                meta-disk internal;
        }
    on vprotect2 {
                device /dev/drbd0;
                disk /dev/sdb;
                address 10.40.1.52:7788;
                meta-disk internal;
        }

We now have config in place and can create and bring our resource online.

On both nodes, run

# drbdadm create-md replicate
initializing activity log
initializing bitmap (4800 KB) to all zero
Writing meta data...
New drbd meta data block successfully created.

then bring the volume online

# drbdadm up replicate

You can verify if the device is up & running by issuing

# lsblk
NAME                    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                       8:0    0   16G  0 disk
├─sda1                    8:1    0    1G  0 part /boot
└─sda2                    8:2    0   15G  0 part
├─vg_vprotect-lv_root 253:0    0 13.4G  0 lvm  /
└─vg_vprotect-lv_swap 253:1    0  1.6G  0 lvm  [SWAP]
sdb                       8:16   0  150G  0 disk
└─drbd0                 147:0    0  150G  1 disk

However, if we check

[root@vprotect1 ~]# drbdsetup status replicate
replicate role:Secondary
disk:Inconsistent
peer role:Secondary
replication:Established peer-disk:Inconsistent

we will notice we need to start synchronization before we can use our volume.

On the first server, run

[root@vprotect1 ~]# drbdadm primary --force replicate
[root@vprotect1 ~]# drbdsetup status replicate
replicate role:Primary
disk:UpToDate
peer role:Secondary
replication:SyncSource peer-disk:Inconsistent done:0.22

This way we have successfully started the process of replication between servers with vprotect1 as the ynchronization source.

If you don't want to create a VDO device, then create and mount your filesystem:

[root@vprotect1 ~]# mkfs.xfs -K /dev/drbd0
[root@vprotect1 ~]# mount /dev/mapper/drbd0 /vprotect_data/ && chown -R vprotect:vprotect /vprotect_data

Create VDO volume (optional)

By issuing the command below we will create a VDO volume called vdo_data and put in at the top our DRBD volume. Afterwards, we format it with XFS and mount it in /vprotect_data.

[root@vprotect1 ~]# vdo create --name=vdo_data --device=/dev/drbd0 --vdoLogicalSize=400G --compression=enabled --deduplication=enabled
Creating VDO vdo_data
Starting VDO vdo_data
Starting compression on VDO vdo_data
VDO instance 0 volume is ready at /dev/mapper/vdo_data

[root@vprotect1 ~]# mkfs.xfs -K /dev/mapper/vdo_data
meta-data=/dev/mapper/vdo_data   isize=512    agcount=4, agsize=26214400 blks
    =                       sectsz=4096  attr=2, projid32bit=1
    =                       crc=1        finobt=0, sparse=0
data     =                       bsize=4096   blocks=104857600, imaxpct=25
    =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=51200, version=2
    =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

[root@vprotect1 ~]# mount /dev/mapper/vdo_data /vprotect_data/ && chown -R vprotect:vprotect /vprotect_data

Copy the VDO config to the second node

[root@vprotect1 ~]# scp /etc/vdoconf.yml root@vprotect2:/etc/vdoconf.yml

Disable VDO automatic startup
As this resource will be managed by the cluster, we need to disable auto startup of this service on both nodes.
```
# systemctl disable vdo
```

Final cluster settings

At this point, we have three components set up. To fully utilize our HAcluster and eliminate the need for manual intervention we should add the resources and settings below to our cluster.

Issue these commands on one node only as it will propagate to the cluster settings.

[root@vprotect1 ~]#  pcs cluster cib drbd_cfg
[root@vprotect1 ~]#  pcs -f drbd_cfg resource create replicate ocf:linbit:drbd \
         drbd_resource=replicate op monitor interval=10s --group fs_group

[root@vprotect1 ~]#  pcs -f drbd_cfg resource master replicateClone replicate \
         master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 \
         notify=true --group fs_group

[root@vprotect1 ~]#  pcs -f drbd_cfg resource create vdo_resource ocf:heartbeat:vdo-vol volume=vdo_data --group fs_group
[root@vprotect1 ~]#  pcs -f drbd_cfg resource create fs_resource ocf:heartbeat:Filesystem device=/dev/mapper/vdo_data directory=/vprotect_data fstype=xfs  --group fs_group
[root@vprotect1 ~]#  pcs cluster cib-push drbd_cfg --config

[root@vprotect1 ~]#  pcs constraint colocation add vdo_resource with replicateClone
[root@vprotect1 ~]#  pcs constraint order start vdo_resource then fs_resource
[root@vprotect1 ~]#  pcs constraint order start replicateClone then vdo_resource
[root@vprotect1 ~]#  pcs constraint colocation add vProtect-group with fs_group
[root@vprotect1 ~]#  pcs constraint colocation add vdo_resource with replicateClone INFINITY with-rsc-role=Master
[root@vprotect1 ~]#  pcs constraint order promote replicateClone then start fs_group

Here we have created a temporary file drbd_cfg and inside this file we have added our drbd_resource called replicate, plus a Master/Slave set for this resource.

Afterwards, we have the definition of the vdo_resource and fs_resource in one fs_group followed by an update of the cluster configuration.

As a second step, we have put in place several resource colocations and constraints which allow us to control the order and existence of newly created resources.

We need still to

Make sure that our node is pointed to a localhost address. Check the Nodes UI section.

If the node's IP is different than 127.0.0.1, delete the node and re-register it using

[root@vprotect1 ~]# vprotect node -e <Node_Name> admin http://127.0.0.1:8080/api

copy our license and node information from the first node to the second node:

[root@vprotect1 ~]# scp -pr /opt/vprotect/.session.properties 
[root@vprotect1 ~]# scp -pr /opt/vprotect/license.key

MariaDB replication

In this section, we will cover how to setup master<->master MariaDB replication.

On both nodes, if you have the firewall enabled, allow communication via port 3306

# firewall-cmd --add-port=3306/tcp --permanent
# firewall-cmd --complete-reload

Steps to run on the first storware1 node: 10.40.1.50

This server will be the source of DB replication.

Stop the Storware server, node and database

[root@vprotect1 ~]# systemctl stop vprotect-server vprotect-node mariadb

Edit the config file, enable binary logging and start MariaDB again. Depending on your distribution, the config file location may vary, most likely it is /etc/my.cnf or /etc/my.cnf.d/server.cnf
In the [mysqld] section, add the lines:

[root@vprotect1 ~]# vi /etc/my.cnf.d/server.cnf
log-bin
server_id=1
replicate-do-db=vprotect
[root@vprotect1 ~]# systemctl start mariadb

Now log in into your MariaDB, create a user used for replication and assign appropriate rights to it.
For the purpose of this task, we will set the username to 'replicator' and the password to 'R3pLic4ti0N'

[root@vprotect1 ~]# mysql -u root -p
Enter password:
[..]
MariaDB [(none)]> create user 'replicator'@'%' identified by 'R3pLic4ti0N';
Query OK, 0 rows affected (0.026 sec)

MariaDB [(none)]> grant replication slave on *.* to 'replicator'@'%';
Query OK, 0 rows affected (0.001 sec)

MariaDB [(none)]> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.001 sec)

Don't log out just yet, we need to check the master status and

write down the log file name and position, as it is required for proper slave configuration.

MariaDB [(none)]> show master status;
+----------------------+----------+--------------+------------------+
| File                 | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+----------------------+----------+--------------+------------------+
| vprotect1-bin.000007 |    46109 |              |                  |
+----------------------+----------+--------------+------------------+

Dump the vprotect database and copy it onto the second server (vprotect2).

 [root@vprotect1 ~]# mysqldump -u root -p vprotect > /tmp/vprotect.sql
 [root@vprotect1 ~]# scp /tmp/vprotect_rep.sql root@vprotect2:/tmp/

Steps to run on the 2nd server, storware2: 10.40.1.52

For the reader's convenience, I have only highlighted the differences in configuration between storware1 and storware2, and omitted the output of some commands if they are the same as on the previous node.

Stop the vprotect server, node and database
Edit the MariaDB config file. Assign a different server id, for example: 2. Then start MariaDB.

[root@vprotect2 ~]# vi /etc/my.cnf.d/server.cnf
log-bin
server_id=2
replicate-do-db=vprotect
[root@vprotect2 ~]# systemctl start mariadb

Load the database dump copied from storware1.

[root@vprotect2 ~]# mysql -u root -p vprotect < /tmp/vprotect.sql

At this point, we have two identical databases on our two servers.

Log in to the MariaDB instance, create a replication user with a password. Use the same user as on storware1. Grant the necessary permissions.
Set the master host. You must use the user_master_log_file and master_log_pos written down earlier. Change the IP of the master host to match your network configuration.

MariaDB [(none)]> STOP SLAVE;
MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST = '10.40.10.50', MASTER_USER = 'replicator',MASTER_PASSWORD='R3pLic4ti0N',MASTER_LOG_FILE = 'vprotect1-bin.000007',MASTER_LOG_POS=46109;
Query OK, 0 rows affected (0.004 sec)

Start the slave, check the master status and write down the file name and position.

MariaDB [(none)]> start slave;
Query OK, 0 rows affected (0.001 sec)

MariaDB [(none)]> SHOW MASTER STATUS;
+----------------------+----------+--------------+------------------+
| File                 | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+----------------------+----------+--------------+------------------+
| vprotect2-bin.000002 |   501051 |              |                  |
+----------------------+----------+--------------+------------------+
1 row in set (0.000 sec)

Go back to the first server (storware1)

On storreaw1, stop the slave then change the master host using the parameters noted down in the previous step. Also, change the master host IP to match your network configuration.

MariaDB [(none)]> stop slave;
MariaDB [(none)]> MariaDB [(none)]>  change master to master_host='10.40.1.52', master_user='replicator', master_password='R3pLic4ti0N',MASTER_LOG_FILE = 'vprotect2-bin.000002', master_log_pos=501051;
Query OK, 0 rows affected (0.004 sec)
MariaDB [(none)]> start slave;
Query OK, 0 rows affected (0.001 sec)

At this point, you have successfully configured MariaDB master<->master replication.

Testing the setup

Automatic

The fastest way to test our setup is to invoke

# pcs node standby vprotect1

to put storware1 into standby mode, which prevents it from hosting any cluster resources.

After a while, you should see your resources up and running on storware2.

Note that if you perform normal OS shutdown (not a forced one), the pacemaker will wait for a long time for a node to come back online, which in fact will prevent completion of shutdown. As a result, resources will not switch correctly to the other node.

Manual

If you want to dive a little bit deeper, we have prepared instructions on how to manually move a filesystem resource from the first node to the second.

Stop vprotect services.

 systemctl stop vprotect-server && systemctl stop vprotect-node

Unmount the FS used by DRBD/VDO on the primary server (here storware1).

[root@vprotect1 ~]# drbdadm role replicate
Primary/Secondary
[root@vprotect1 ~]# umount /vprotect_data/

If you are using a VDO device, stop it.

[root@vprotect1 ~]# vdo stop -n vdo_data
Stopping VDO vdo_data

Demote the primary replication server (still storware1) to secondary server.
```
[root@vprotect1 ~]# drbdadm secondary replicate
```

On the second server

Promote the second server (here storware2) to the primary DRBD role.
```
[root@vprotect2 ~]# drbdadm    primary replicate
```

Start the VDO.

[root@vprotect2 ~]# vdo start -n vdo_data
Starting VDO vdo_data
Starting compression on VDO vdo_data
VDO instance 2 volume is ready at /dev/mapper/vdo_data

Mount the filesystem on the second server.

[root@vprotect2 ~]# mount /dev/mapper/vdo_data /vprotect_data/

Now you have your replicated volume mounted on the second node.

PreviousHigh Availability Next3 Node Cluster