2 Node Cluster
Last updated
Last updated
In this scenario, we are going to set up two Storware Backup & Recovery servers in High Availability, Active/Passive mode. This is possible by using techniques such as a pacemaker, corosync, and DRBD. At least a basic understanding of these is highly desirable. This how-to is intended for RPM-based systems such as Red Hat / CentOS. If you run Storware Backup & Recovery on a different OS, you may need to refer to your distribution docs.
Our environment is built of the following elements:
storware1 - first Storware Backup & Recovery server + Storware Backup & Recovery node, IP: 10.40.1.50
storware2 - second Storware Backup & Recovery server + Storware Backup & Recovery node, IP: 10.40.1.52
Cluster IP: 10.40.1.100 - We will use this IP to connect to our active Storware Backup & Recovery service. This IP will float between our servers and will point to an active instance.
DRBD (optionally with VDO) for data replication and deduplication between nodes.
MariaDB master <-> master replication
Stop and disable the Storware Backup & Recovery server, node and database as the cluster will manage these resources.
Use yum to check if you have any updates pending
It is a good idea to check /etc/hosts, especially if you installed Storware Backup & Recovery using the All in one quick installation method, as you might find an entry such as:
Delete it as this prevents the cluster from functioning properly (your nodes will not "see" each other).
Now we can proceed with installation of the required packages.
On both servers run
Add a firewall rule to allow HA traffic - TCP ports 2224, 3121, and 21064, and UDP port 5405 (both servers)
While testing, depending on your environment, you may encounter problems related to network traffic, permissions, etc. While it might be a good idea to temporarily disable the firewall and SELinux, we do not recommend disabling that mechanism in the production environment as it creates significant security issues. If you choose to disable the firewall, bear in mind that Storware will no longer be available on ports 80/443. Instead, connect to ports 8080/8181 respectively.
Enable and start PCS daemon
Cluster configuration
Earlier installation of a pcs package automatically creates a user hacluster with no password authentication. While this may be good for running locally, we will require a password for this account to perform the rest of the configuration, so let's
configure the same password on both nodes
Corosync configuration
On node 1, issue a command to authenticate as a hacluster user:
Generate and synchronize the corosync configuration
Take a look at your output, which should look similar to below:
Enable and start your new cluster
OK! We have our cluster enabled. We have not created any resources (such as a floating IP) yet, but before we proceed we still have a few settings to modify.
Because we are using only two nodes, we need to
disable default quorum policy
(this command should not return any output)
We should also
define default failure settings
These two settings combined will define how many failures can occur for a node to be marked as ineligible for hosting a resource and after what time this restriction will be lifted. We define the defaults here, but it may be a good idea to also set these values at the resource level, depending on your experience.
As long we are not using any fencing device in our environment (and here we are not) we need to:
disable stonith
The second part of this command verifies running-config. These commands normally do not return any output.
Resource creation
Finally, we have our cluster configured, so it's time to proceed to
resource creation
First, we will create a resource that represents our floating IP 10.40.1.100. Adjust your IP and cidr_netmask, and you're good to go.
IMPORTANT: From this moment on we need to use this IP when connecting to our vProtect server.
Immediately, we should see our IP is up and running on one of the nodes (most likely on the one we issued this command for).
As you can see, our floating IP 10.40.1.100 has been successfully assigned as the second IP of interface ens160. This is what we wanted!
We should also check if the Storware Backup & Recovery web interface is up and running. We can do this by opening the web browser and typing in https://10.40.1.100.
The next step is to
define a resource responsible for monitoring network connectivity
Note that you need to use your gateway IP in the host_list parameter
Finally, we have to define a set of cluster resources responsible for other services crucial for Storware as Storware Node and the Storware server itself. We will logically link these services with our floating IP. Whenever the floating IP disappears from our server, these services will be stopped. We also have to define the proper order for services to start and stop, as for example starting the Storware-server without a running database makes little sense.
Resource creation
It is OK for these commands not to return any output.
Resource colocation
To finish with, we can set which server is more preferred for running our services
Set node preference
We have made it to the end. At this point, our pacemaker HA cluster is functional.
However, there are still two things we need to consider, that is:
Creating DB replication
Setting up DRBD for /vprotect_data (optionally with VDO)
In this section, we will prepare our deduplicated and replicated filesystem mounted in /vprotect_data.
Using a deduplicated FS is optional but highly recommended. If you don't intend to use it, skip the part regarding VDO configuration.
Note: If you are altering existing Stoware Backup & Recovery configuration it is very important to preserve the /vprotect_data contents and transfer them to the new filesystem. You may also need to re-create your backup_destination if you previously had one in this directory. Setting up VDO and DRBD will cause all data to be wiped from the configured volume.
Installation is split into the steps below that you need to follow to get the job done.
Stop the Storware server and node
No output means everything went OK.
On both nodes install the equired repositories and packages
The next command can produce quite a few lines, so I've truncated the output, however the idea is simple: install drbd packages:
If you have not disabled SELinux and the firewall, remember to
configure them on both nodes
Don't forget to repeat these steps on the second node
Now that we have the necessary software installed, we must prepare an identical size block device on both nodes. A block device can be a hard drive, a hard drive partition, software RAID, LVM Volume, etc. In this scenario, we are going to use a hard drive connected as /dev/sdb.
To add a DRBD resource we create the file /etc/drbd.d/vprotect.res with the content below. Be sure to change the "address" so that t reflects your network configuration.
Also, the node names (storware1 and storware2) must match your uname -n output.
We now have config in place and can create and bring our resource online.
On both nodes, run
then bring the volume online
You can verify if the device is up & running by issuing
However, if we check
we will notice we need to start synchronization before we can use our volume.
On the first server, run
This way we have successfully started the process of replication between servers with vprotect1 as the ynchronization source.
If you don't want to create a VDO device, then create and mount your filesystem:
Create VDO volume (optional)
By issuing the command below we will create a VDO volume called vdo_data and put in at the top our DRBD volume. Afterwards, we format it with XFS and mount it in /vprotect_data.
Copy the VDO config to the second node
Disable VDO automatic startup
As this resource will be managed by the cluster, we need to disable auto startup of this service on both nodes.
At this point, we have three components set up. To fully utilize our HAcluster and eliminate the need for manual intervention we should add the resources and settings below to our cluster.
Issue these commands on one node only as it will propagate to the cluster settings.
Here we have created a temporary file drbd_cfg and inside this file we have added our drbd_resource called replicate, plus a Master/Slave set for this resource.
Afterwards, we have the definition of the vdo_resource and fs_resource in one fs_group followed by an update of the cluster configuration.
As a second step, we have put in place several resource colocations and constraints which allow us to control the order and existence of newly created resources.
We need still to
Make sure that our node is pointed to a localhost address. Check the Nodes UI section.
If the node's IP is different than 127.0.0.1, delete the node and re-register it using
copy our license and node information from the first node to the second node:
In this section, we will cover how to setup master<->master MariaDB replication.
On both nodes, if you have the firewall enabled, allow communication via port 3306
Steps to run on the first storware1 node: 10.40.1.50
This server will be the source of DB replication.
Stop the Storware server, node and database
Edit the config file, enable binary logging and start MariaDB again. Depending on your distribution, the config file location may vary, most likely it is /etc/my.cnf or /etc/my.cnf.d/server.cnf
In the [mysqld] section, add the lines:
Now log in into your MariaDB, create a user used for replication and assign appropriate rights to it.
For the purpose of this task, we will set the username to 'replicator' and the password to 'R3pLic4ti0N'
Don't log out just yet, we need to check the master status and
write down the log file name and position, as it is required for proper slave configuration.
Dump the vprotect database and copy it onto the second server (vprotect2).
Steps to run on the 2nd server, storware2: 10.40.1.52
For the reader's convenience, I have only highlighted the differences in configuration between storware1 and storware2, and omitted the output of some commands if they are the same as on the previous node.
Stop the vprotect server, node and database
Edit the MariaDB config file. Assign a different server id, for example: 2. Then start MariaDB.
Load the database dump copied from storware1.
At this point, we have two identical databases on our two servers.
Log in to the MariaDB instance, create a replication user with a password. Use the same user as on storware1. Grant the necessary permissions.
Set the master host. You must use the user_master_log_file and master_log_pos written down earlier. Change the IP of the master host to match your network configuration.
Start the slave, check the master status and write down the file name and position.
Go back to the first server (storware1)
On storreaw1, stop the slave then change the master host using the parameters noted down in the previous step. Also, change the master host IP to match your network configuration.
At this point, you have successfully configured MariaDB master<->master replication.
Automatic
The fastest way to test our setup is to invoke
to put storware1 into standby mode, which prevents it from hosting any cluster resources.
After a while, you should see your resources up and running on storware2.
Note that if you perform normal OS shutdown (not a forced one), the pacemaker will wait for a long time for a node to come back online, which in fact will prevent completion of shutdown. As a result, resources will not switch correctly to the other node.
Manual
If you want to dive a little bit deeper, we have prepared instructions on how to manually move a filesystem resource from the first node to the second.
Stop vprotect services.
Unmount the FS used by DRBD/VDO on the primary server (here storware1).
If you are using a VDO device, stop it.
Demote the primary replication server (still storware1) to secondary server.
On the second server
Promote the second server (here storware2) to the primary DRBD role.
Start the VDO.
Mount the filesystem on the second server.
Now you have your replicated volume mounted on the second node.