High Availability

In this scenario, we are going to set up two vProtect servers in High Availability, Active/Passive mode. This is possible by using techniques such as pacemaker, corosync and DRBD. At least a basic understanding of these is highly desirable. This how-to is intended for RPM-based systems such as Red Hat / CentOS. If you run vProtect on a different OS, you may need to refer to your distribution docs.
Our environment is built of the following elements:
  1. 1.
    vprotect1 - first vProtect server + vProtect node, IP: 10.40.1.50
  2. 2.
    vprotect2 - second vProtect server + vProtect node, IP: 10.40.1.52
  3. 3.
    Cluster IP: 10.40.1.100 - We will use this IP to connect to our active vProtect service. This IP will float between our servers and will point to an active instance.
  4. 4.
    DRBD (optionally with VDO) for data replication and deduplication between nodes.
  5. 5.
    MariaDB master <-> master replication

HA cluster setup

Preparing the environment

  • Stop and disable the vProtect server, node and database as the cluster will manage these resources.
systemctl disable vprotect-server vprotect-node mariadb
  • Use yum to check if you have any updates pending
# yum update
  • It is a good idea to check /etc/hosts, especially if you installed vProtect using the All in one quick installation method, as you might find an entry such as:
    127.0.0.1 <your_hostname_here>
    Delete it as this prevents the cluster from functioning properly (your nodes will not "see" each other).
Now we can proceed with installation of the required packages.
  • On both servers run
# yum install -y pacemaker pcs psmisc policycoreutils-python
  • Add a firewall rule to allow HA traffic - TCP ports 2224, 3121, and 21064, and UDP port 5405 (both servers)
# firewall-cmd --permanent --add-service=high-availability
success
# firewall-cmd --reload
success
While testing, depending on your environment, you may encounter problems related to network traffic, permissions, etc. While it might be a good idea to temporarily disable the firewall and SELinux, we do not recommend disabling that mechanism in the production environment as it creates significant security issues. If you choose to disable the firewall, bear in mind that vProtect will no longer be available on ports 80/443. Instead, connect to ports 8080/8181 respectively.
# setenforce 0
# sed -i.bak "s/SELINUX=enforcing/SELINUX=permissive/g" /etc/selinux/config
# systemctl mask firewalld.service
# systemctl stop firewalld.service
# iptables --flush
  • Enable and start PCS daemon
# systemctl enable pcsd.service
# systemctl start pcsd.service
Cluster configuration
Earlier installation of a pcs package automatically creates a user hacluster with no password authentication. While this may be good for running locally, we will require a password for this account to perform the rest of the configuration, so let's
  • configure the same password on both nodes
# passwd hacluster
Changing password for user hacluster.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
Corosync configuration
  • On node 1, issue a command to authenticate as a hacluster user:
[[email protected] ~]# pcs cluster auth vprotect1 vprotect2
Username: hacluster
Password:
vprotect1: Authorized
vprotect2: Authorized
  • Generate and synchronize the corosync configuration
[[email protected] ~]# pcs cluster setup --name mycluster vprotect1 vprotect2
​ Take a look at your output, which should look similar to below:
Destroying cluster on nodes: vprotect1, vprotect2...
vprotect1: Stopping Cluster (pacemaker)...
vprotect2: Stopping Cluster (pacemaker)...
vprotect1: Successfully destroyed cluster
vprotect2: Successfully destroyed cluster
Sending 'pacemaker_remote authkey' to 'vprotect1', 'vprotect2'
vprotect1: successful distribution of the file 'pacemaker_remote authkey'
vprotect2: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
vprotect1: Succeeded
vprotect2: Succeeded
Synchronizing pcsd certificates on nodes vprotect1, vprotect2...
vprotect1: Success
vprotect2: Success
Restarting pcsd on the nodes in order to reload the certificates...
vprotect1: Success
vprotect2: Success
  • Enable and start your new cluster
[[email protected] ~]# pcs cluster start --all && pcs cluster enable --all
vprotect1: Starting Cluster (corosync)...
vprotect2: Starting Cluster (corosync)...
vprotect1: Starting Cluster (pacemaker)...
vprotect2: Starting Cluster (pacemaker)...
vprotect1: Cluster Enabled
vprotect2: Cluster Enabled
OK! We have our cluster enabled. We have not created any resources (such as a floating IP) yet, but before we proceed we still have a few settings to modify.
Because we are using only two nodes, we need to
  • disable default quorum policy
(this command should not return any output)
[[email protected] ~]# pcs property set no-quorum-policy=ignore
We should also
  • define default failure settings
[[email protected] ~]# pcs resource defaults failure-timeout=30s
[[email protected] ~]# pcs resource defaults migration-threshold=3
These two settings combined will define how many failures can occur for a node to be marked as ineligible for hosting a resource and after what time this restriction will be lifted. We define the defaults here, but it may be a good idea to also set these values at the resource level, depending on your experience.
As long we are not using any fencing device in our environment (and here we are not) we need to:
  • disable stonith
[[email protected] ~]# pcs property set stonith-enabled=false && crm_verify -L
The second part of this command verifies running-config. These commands normally do not return any output.
Resource creation
Finally, we have our cluster configured, so it's time to proceed to
  • resource creation
First, we will create a resource that represents our floating IP 10.40.1.100. Adjust your IP and cidr_netmask, and you're good to go.
IMPORTANT: From this moment on we need to use this IP when connecting to our vProtect server.
[[email protected] ~]# pcs resource create "Failover_IP" ocf:heartbeat:IPaddr2 ip=10.40.1.100 cidr_netmask=22 op monitor interval=30s
Immediately, we should see our IP is up and running on one of the nodes (most likely on the one we issued this command for).
[..]
2: ens160: mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:a6:9f:c6 brd ff:ff:ff:ff:ff:ff
inet 10.40.1.50/22 brd 10.40.3.255 scope global ens160
valid_lft forever preferred_lft forever
inet 10.40.1.100/22 brd 10.40.3.255 scope global secondary ens160
valid_lft forever preferred_lft forever
inet6 fe80::250:56ff:fea6:9fc6/64 scope link
valid_lft forever preferred_lft forever
As you can see, our floating IP 10.40.1.100 has been successfully assigned as the second IP of interface ens160. This is what we wanted!
We should also check if the vProtect web interface is up and running. We can do this by opening the web browser and typing in https://10.40.1.100. At this point we should see:
The next step is to
  • define a resource responsible for monitoring network connectivity
[[email protected] ~]# pcs resource create ping ocf:pacemaker:ping dampen=5s multiplier=1000 host_list=10.40.0.1 clone
[[email protected] ~]# pcs constraint location Failover_IP rule score=-INFINITY pingd lt 1 or not_defined pingd
Note that you need to use your gateway IP in the host_list parameter
Finally, we have to define a set of cluster resources responsible for other services crucial for vProtect operations, such as vProtect Node and the vProtect server itself. We will logically link these services with our floating IP. Whenever the floating IP disappears from our server, these services will be stopped. We also have to define the proper order for services to start and stop, as for example starting the vProtect-server without a running database makes little sense.
  • Resource creation
[[email protected] ~]# pcs resource create "vProtect-node" systemd:vprotect-node op monitor timeout=300s on-fail="stop" --group vProtect-group
[[email protected] ~]# pcs resource create "vProtect-server" service:vprotect-server op start on-fail="stop" timeout="300s" op stop timeout="300s" on-fail="stop" op monitor timeout="300s" on-fail="stop" --group vProtect-group
It is OK for these commands not to return any output.
  • Resource colocation
[[email protected] ~]# pcs constraint colocation add Failover_IP with vProtect-group
To finish with, we can set which server is more preferred for running our services
  • Set node preference
[[email protected] ~]# pcs constraint location Failover_IP prefers vprotect1=INFINITY
[[email protected] ~]# pcs constraint location vProtect-group prefers vprotect1=INFINITY
We have made it to the end. At this point, our pacemaker HA cluster is functional.
However, there are still two things we need to consider, that is:
  1. 1.
    Creating DB replication
  2. 2.
    Setting up DRBD for /vprotect_data (optionally with VDO)

Setting up VDO+DRBD

In this section, we will prepare our deduplicated and replicated filesystem mounted in /vprotect_data.
Using a deduplicated FS is optional but highly recommended. If you don't intend to use it, skip the part regarding VDO configuration.
Note: If you are altering existing vProtect configuration it is very important to preserve the /vprotect_data contents and transfer them to the new filesystem. You may also need to re-create your backup_destination if you previously had one in this directory. Setting up VDO and DRBD will cause all data to be wiped from the configured volume.
Installation is split into the steps below that you need to follow to get the job done.
  • Stop the vprotect server and node
# systemctl stop vprotect-server vprotect-node
No output means everything went OK.
  • On both nodes install the equired repositories and packages
# rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
# rpm -Uvh https://www.elrepo.org/elrepo-release-7.0-4.el7.elrepo.noarch.rpm
Retrieving https://www.elrepo.org/elrepo-release-7.0-4.el7.elrepo.noarch.rpm
Preparing... ################################# [100%]
Updating / installing...
1:elrepo-release-7.0-4.el7.elrepo ################################# [100%]
The next command can produce quite a few lines, so I've truncated the output, however the idea is simple: install drbd packages:
[[email protected] ~]# yum install -y kmod-drbd84 drbd84-utils
Installed:
drbd84-utils.x86_64 0:9.6.0-1.el7.elrepo kmod-drbd84.x86_64 0:8.4.11-1.1.el7_6.elrepo
If you have not disabled SELinux and the firewall, remember to
  • configure them on both nodes
    # semanage permissive -a drbd_t
    # firewall-cmd --add-port=7788/tcp --permanent
    success
    # firewall-cmd --complete-reload
    success
    Don't forget to repeat these steps on the second node
Now that we have the necessary software installed, we must prepare an identical size block device on both nodes. A block device can be a hard drive, a hard drive partition, software RAID, LVM Volume, etc. In this scenario, we are going to use a hard drive connected as /dev/sdb.
To add a DRBD resource we create the file /etc/drbd.d/vprotect.res with the content below. Be sure to change the "address" so that t reflects your network configuration.
Also, the node names (vprotect1 and vprotect2) must match your uname -n output.
resource replicate {
protocol C;
on vprotect1 {
device /dev/drbd0;
disk /dev/sdb;
address 10.40.1.50:7788;
meta-disk internal;
}
on vprotect2 {
device /dev/drbd0;
disk /dev/sdb;
address 10.40.1.52:7788;
meta-disk internal;
}
We now have config in place and can create and bring our resource online.
  • On both nodes, run
    # drbdadm create-md replicate
    initializing activity log
    initializing bitmap (4800 KB) to all zero
    Writing meta data...
    New drbd meta data block successfully created.
    then bring the volume online
    # drbdadm up replicate
    You can verify if the device is up & running by issuing
    # lsblk
    NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
    sda 8:0 0 16G 0 disk
    ├─sda1 8:1 0 1G 0 part /boot
    └─sda2 8:2 0 15G 0 part
    ├─vg_vprotect-lv_root 253:0 0 13.4G 0 lvm /
    └─vg_vprotect-lv_swap 253:1 0 1.6G 0 lvm [SWAP]
    sdb 8:16 0 150G 0 disk
    └─drbd0 147:0 0 150G 1 disk
    However, if we check
    [[email protected] ~]# drbdsetup status replicate
    replicate role:Secondary
    disk:Inconsistent
    peer role:Secondary
    replication:Established peer-disk:Inconsistent
    we will notice we need to start synchronization before we can use our volume.
  • On the first server, run
    [[email protected] ~]# drbdadm primary --force replicate
    [[email protected] ~]# drbdsetup status replicate
    replicate role:Primary
    disk:UpToDate
    peer role:Secondary
    replication:SyncSource peer-disk:Inconsistent done:0.22
    This way we have successfully started the process of replication between servers with vprotect1 as the ynchronization source.
    If you don't want to create a VDO device, then create and mount your filesystem:
    [[email protected] ~]# mkfs.xfs -K /dev/drbd0
    [[email protected] ~]# mount /dev/mapper/drbd0 /vprotect_data/ && chown -R vprotect:vprotect /vprotect_data
  • Create VDO volume (optional)
    By issuing the command below we will create a VDO volume called vdo_data and put in at the top our DRBD volume. Afterwards, we format it with XFS and mount it in /vprotect_data.
    [[email protected] ~]# vdo create --name=vdo_data --device=/dev/drbd0 --vdoLogicalSize=400G --compression=enabled --deduplication=enabled
    Creating VDO vdo_data
    Starting VDO vdo_data
    Starting compression on VDO vdo_data
    VDO instance 0 volume is ready at /dev/mapper/vdo_data
    [[email protected] ~]# mkfs.xfs -K /dev/mapper/vdo_data
    meta-data=/dev/mapper/vdo_data isize=512 agcount=4, agsize=26214400 blks
    = sectsz=4096 attr=2, projid32bit=1
    = crc=1 finobt=0, sparse=0
    data = bsize=4096 blocks=104857600, imaxpct=25
    = sunit=0 swidth=0 blks
    naming =version 2 bsize=4096 ascii-ci=0 ftype=1
    log =internal log bsize=4096 blocks=51200, version=2
    = sectsz=4096 sunit=1 blks, lazy-count=1
    realtime =none extsz=4096 blocks=0, rtextents=0
    [[email protected] ~]# mount /dev/mapper/vdo_data /vprotect_data/ && chown -R vprotect:vprotect /vprotect_data
  • Copy the VDO config to the second node
[[email protected] ~]# scp /etc/vdoconf.yml [email protected]:/etc/vdoconf.yml
  • Disable VDO automatic startup
    As this resource will be managed by the cluster, we need to disable auto startup of this service on both nodes.
    # systemctl disable vdo

Final cluster settings

At this point, we have three components set up. To fully utilize our HAcluster and eliminate the need for manual intervention we should add the resources and settings below to our cluster.
Issue these commands on one node only as it will propagate to the cluster settings.
[[email protected] ~]# pcs cluster cib drbd_cfg
[[email protected] ~]# pcs -f drbd_cfg resource create replicate ocf:linbit:drbd \
drbd_resource=replicate op monitor interval=10s --group fs_group
[[email protected] ~]# pcs -f drbd_cfg resource master replicateClone replicate \
master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 \
notify=true --group fs_group
[[email protected] ~]# pcs -f drbd_cfg resource create vdo_resource ocf:heartbeat:vdo-vol volume=vdo_data --group fs_group
[[email protected] ~]# pcs -f drbd_cfg resource create fs_resource ocf:heartbeat:Filesystem device=/dev/mapper/vdo_data directory=/vprotect_data fstype=xfs --group fs_group
[[email protected] ~]# pcs cluster cib-push drbd_cfg --config
[[email protected] ~]# pcs constraint colocation add vdo_resource with replicateClone
[[email protected] ~]# pcs constraint order start vdo_resource then fs_resource
[[email protected] ~]# pcs constraint order start replicateClone then vdo_resource
[[email protected] ~]# pcs constraint colocation add vProtect-group with fs_group
[[email protected] ~]# pcs constraint colocation add vdo_resource with replicateClone INFINITY with-rsc-role=Master
[[email protected] ~]# pcs constraint order promote replicateClone then start fs_group
Here we have created a temporary file drbd_cfg and inside this file we have added our drbd_resource called replicate, plus a Master/Slave set for this resource.
Afterwards, we have the definition of the vdo_resource and fs_resource in one fs_group followed by an update of the cluster configuration.
As a second step, we have put in place several resource colocations and constraints which allow us to control the order and existence of newly created resources.
We need still to
  • Make sure that our node is pointed to a localhost address. Check the Nodes UI section.
If the node's IP is different than 127.0.0.1, delete the node and re-register it using
[[email protected] ~]# vprotect node -e <Node_Name> admin http://127.0.0.1:8080/api
  • copy our license and node information from the first node to the second node:
[[email protected] ~]# scp -pr /opt/vprotect/.session.properties
[[email protected] ~]# scp -pr /opt/vprotect/license.key

MariaDB replication

In this section, we will cover how to setup master<->master MariaDB replication.
  • On both nodes, if you have the firewall enabled, allow communication via port 3306
# firewall-cmd --add-port=3306/tcp --permanent
# firewall-cmd --complete-reload
Steps to run on the first vprotect1 node: 10.40.1.50
This server will be the source of DB replication.
  • Stop the vprotect server, node and database
[[email protected] ~]# systemctl stop vprotect-server vprotect-node mariadb
  • Edit the config file, enable binary logging and start MariaDB again. Depending on your distribution, the config file location may vary, most likely it is /etc/my.cnf or /etc/my.cnf.d/server.cnf
    In the [mysqld] section, add the lines:
[[email protected] ~]# vi /etc/my.cnf.d/server.cnf
log-bin
server_id=1
replicate-do-db=vprotect
[[email protected] ~]# systemctl start mariadb
  • Now log in into your MariaDB, create a user used for replication and assign appropriate rights to it.
    For the purpose of this task, we will set the username to 'replicator' and the password to 'R3pLic4ti0N'
[[email protected] ~]# mysql -u root -p
Enter password:
[..]
MariaDB [(none)]> create user 'replicator'@'%' identified by 'R3pLic4ti0N';
Query OK, 0 rows affected (0.026 sec)
MariaDB [(none)]> grant replication slave on *.* to 'replicator'@'%';
Query OK, 0 rows affected (0.001 sec)
MariaDB [(none)]> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.001 sec)
Don't log out just yet, we need to check the master status and
  • write down the log file name and position, as it is required for proper slave configuration.
MariaDB [(none)]> show master status;
+----------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+----------------------+----------+--------------+------------------+
| vprotect1-bin.000007 | 46109 | | |
+----------------------+----------+--------------+------------------+
  • Dump the vprotect database and copy it onto the second server (vprotect2).
[[email protected] ~]# mysqldump -u root -p vprotect > /tmp/vprotect.sql
[[email protected] ~]# scp /tmp/vprotect_rep.sql [email protected]:/tmp/
Steps to run on the 2nd server, vprotect2: 10.40.1.52
For the reader's convenience, I have only highlighted the differences in configuration between vprotect1 and vprotect2, and omitted the output of some commands if they are the same as on the previous node.
  • Stop the vprotect server, node and database
  • Edit the MariaDB config file. Assign a different server id, for example: 2. Then start MariaDB.
[[email protected] ~]# vi /etc/my.cnf.d/server.cnf
log-bin
server_id=2
replicate-do-db=vprotect
[[email protected] ~]# systemctl start mariadb
  • Load the database dump copied from vprotect1.
[[email protected] ~]# mysql -u root -p vprotect < /tmp/vprotect.sql
At this point, we have two identical databases on our two servers.
  • Log in to the MariaDB instance, create a replication user with a password. Use the same user as on vprotect1. Grant the necessary permissions.
  • Set the master host. You must use the user_master_log_file and master_log_pos written down earlier. Change the IP of the master host to match your network configuration.
MariaDB [(none)]> STOP SLAVE;
MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST = '10.40.10.50', MASTER_USER = 'replicator',MASTER_PASSWORD='R3pLic4ti0N',MASTER_LOG_FILE = 'vprotect1-bin.000007',MASTER_LOG_POS=46109;
Query OK, 0 rows affected (0.004 sec)
  • Start the slave, check the master status and write down the file name and position.
MariaDB [(none)]> start slave;
Query OK, 0 rows affected (0.001 sec)
MariaDB [(none)]> SHOW MASTER STATUS;
+----------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+----------------------+----------+--------------+------------------+
| vprotect2-bin.000002 | 501051 | | |
+----------------------+----------+--------------+------------------+
1 row in set (0.000 sec)
Go back to the first server (vprotect1)
  • On vprotect1, stop the slave then change the master host using the parameters noted down in the previous step. Also, change the master host IP to match your network configuration.
MariaDB [(none)]> stop slave;
MariaDB [(none)]> MariaDB [(none)]> change master to master_host='10.40.1.52', master_user='replicator', master_password='R3pLic4ti0N',MASTER_LOG_FILE = 'vprotect2-bin.000002', master_log_pos=501051;
Query OK, 0 rows affected (0.004 sec)
MariaDB [(none)]> start slave;
Query OK, 0 rows affected (0.001 sec)
At this point, you have successfully configured MariaDB master<->master replication.

Testing the setup

Automatic
The fastest way to test our setup is to invoke
# pcs node standby vprotect1
to put vprotect1 into standby mode, which prevents it from hosting any cluster resources.
After a while, you should see your resources up and running on vprotect2.
Note that if you perform normal OS shutdown (not a forced one), the pacemaker will wait for a long time for a node to come back online, which in fact will prevent completion of shutdown. As a result, resources will not switch correctly to the other node.
Manual
If you want to dive a little bit deeper, we have prepared instructions on how to manually move a filesystem resource from the first node to the second.
  1. 1.
    Stop vprotect services.
    systemctl stop vprotect-server && systemctl stop vprotect-node
  2. 2.
    Unmount the FS used by DRBD/VDO on the primary server (here vprotect1).
    [[email protected] ~]# drbdadm role replicate
    Primary/Secondary
    [[email protected] ~]# umount /vprotect_data/
  3. 3.
    If you are using a VDO device, stop it.
    [[email protected] ~]# vdo stop -n vdo_data
    Stopping VDO vdo_data
  4. 4.
    Demote the primary replication server (still vprotect1) to secondary server.
    [[email protected] ~]# drbdadm secondary replicate
On the second server
  1. 1.
    Promote the second server (here vprotect2) to the primary DRBD role.
    [[email protected] ~]# drbdadm primary replicate
  2. 2.
    Start the VDO.
    [[email protected] ~]# vdo start -n vdo_data
    Starting VDO vdo_data
    Starting compression on VDO vdo_data
    VDO instance 2 volume is ready at /dev/mapper/vdo_data
  3. 3.
    Mount the filesystem on the second server.
    [[email protected] ~]# mount /dev/mapper/vdo_data /vprotect_data/
Now you have your replicated volume mounted on the second node.