1. Overview

Under normal operation, if the primary node fails, a secondary node will be elected leader. An operator would typically resolve any issues with the initial (failed) primary, bring it back online, and it would rejoin the cluster as a secondary node.

However, if the initial primary node fails permanently, and you add a new node to the cluster to preserve the minimum node count, then you must reconfigure the replica set. This procedure shows you how to recover your HA cluster from a permenantly failed primary node.

2. Error messages

There are no specific error messages except that you cannot add new nodes to HA cluster.

3. Steps to confirm the issue

When you go to /var/lib/twistlock/scripts/twistlock.cfg on your active Console host, under HIGH_AVAILABILITY setting the role must be "SECONDARY" instead of Primary.

4. Troubleshooting steps

Follow the steps below:

Prerequisites

  • A high availability set with a failed primary node and at least one secondary Console online.

  • One or more new nodes to add to bring the high availability set back to three nodes.

  • You have made a manual backup of your current configuration, and stored it in a location outside of your HA cluster.

Procedure

  1. Uninstall Prisma Cloud from all secondary nodes. Leave the newly elected primary node alone.

    $ sudo ./twistlock.sh -u
  2. Connect to the primary node and reinstall Prisma Cloud to clear the HA state from the database.

    1. Navigate to the directory where you originally ran the curl command to set up the node as a secondary. You should find a number of Prisma Cloud files, including twistlock.cfg and twistlock.sh.

    2. Open twistlock.cfg for editing.

    3. In the section for High availability settings, set HIGH_AVAILABILITY_ENABLED to true and HIGH_AVAILABILITY_STATE to PRIMARY.

      #############################################
      #      High availability settings
      #############################################
      HIGH_AVAILABILITY_ENABLED=true
      HIGH_AVAILABILITY_STATE=PRIMARY
      HIGH_AVAILABILITY_PORT=8086
    4. Reinstall Prisma Cloud Console.

      $ sudo ./twistlock.sh -ys onebox
  3. Install a secondary Console on any original or new secondary nodes to bring the total number of nodes in the replica set back to three.

    1. Open the primary Console. In a browser, go to <CONSOLE_PRIMARY_NODE>:8443, and login as admin.

    2. Go to Manage > System > High Availability.

    3. Copy the script to install a secondary Console.

    4. On a different host, designated as secondary, run the script you just copied. Console is installed on your secondary host.

      In your primary Console, you will the secondary host waiting to join the cluster in the REQUEST_JOIN state.

      This is the only step required to install Console on a secondary node. This script installs Console on the host, and joins the cluster. Do not run the procedure documented in the Install Prisma Cloud article on any secondary cluster nodes.
    5. Follow the steps a-d to install secondary Consoles on all remaining secondary nodes.

  4. Connect to Prisma Cloud’s MongoDB database on the primary Console to update the nodes in the replica set.

    1. Connect to your primary node.

    2. Run the following command to connect to the database. The following command assumes HIGH_AVAILABILITY_PORT is set to 8086 in twistlock.cfg.

      $ docker exec -ti twistlock_console mongo \
        --ssl \
        --sslAllowInvalidHostnames \
        --sslCAFile /var/lib/twistlock/certificates/ca.pem \
        --sslPEMKeyFile /var/lib/twistlock/certificates/client.pem \
        --sslPEMKeyPassword $(cat /var/lib/twistlock/certificates/service-parameter) \
        -port 8086
    3. Run the following mongo commands to get the hostname of previous failed primary node:

      var conf = rs.conf();
      conf.members[0].host;
    4. Execute the following mongo commands, replacing <CURRENT_PRIMARY_IPADDR> with your current primary node’s IP and replication port, and <PREVIOUS_PRIMARY_HOSTNAME> with the output from previous step:

      conf.members[0]._id = 0;
      conf.members[0].host = <CURRENT_PRIMARY_IPADDR>:8086;
      rs.reconf(conf);
      conf.members[0].host = <PREVIOUS_PRIMARY_HOSTNAME>;
  5. Connect to your Console, and go to Manage > System > High Availability. You should now see all hosts in the replica set.

  6. When you see the REQUEST_JOIN, state for both secondary Consoles, click APPLY. After a few moments, the state changes to SECONDARY.

4.1. Additional information

4.2. Prisma Cloud version

Applies to all versions.

5. Cautionary notes

These changes are on mongo db and need extreme precaution. Contact support if you have any questions.