Hello!
I recently posted this. I know, utopia! Two of my favorite things in one place. Woohoo! What could go wrong - nothing right!
Not Bob’s experience?
Bob (not his real name) read my post and thought.
Yes! Let’s save some cash and ditch ALB! To be clear, ALB is fine, but it seems a bit excessive if you don’t need to run it, as cloud is always ‘a pay for what you use’ model.
Only he ran into some issues… nothing too tricky, but of course, if Bob had them, you might have the same kind of problem. This post is a ‘things to check’ type of post.
Problem 1: HA failover not working
Bob (and his cloud architects) had some odd symptoms with their setup where HA was not working as they expected. We asked them to do this:
Perform a failover and then take a copy of the cloud-ha-daemon.log and upload this to the support case.
They said:
I have just uploaded the HA failover log from the secondary node. The secondary node is logging the following error:
The log is for the customer environment, so it is not something I can post as it has details of their environment. However, there is a key line from the log that helped us see a problem.
Error: Azure API ‘create_or_updateNetwork_interface’ failed, params… has permission to perform x, but does not have permission to do y.
Looks like a smoking gun to me! At least a colleague of mine said so. Thanks Srinivasarao!
From the logs, it looks like there is an issue with the permissions for execution of the API calls. Did the customer provide the necessary Azure role assignments?
To deploy a NetScaler VPX HA pair using the secondary IP configurations in Azure, follow these steps: Step 3 is the key in this case.
Deploy two NetScaler VPX instances in Azure, each with three network interfaces in the same resource group and VNet.
Assign a managed identity to both the NetScaler VPX instances.
Assign Azure secondary IP configurations to the client and server network interfaces of the primary node.
Configure the VIP and SNIP on the primary node using the Azure private IP address from the secondary IP configuration.
Configure the primary private IP addresses of the server network interfaces on both the primary and secondary NetScaler instances to be the VIP on the primary node.
Configure HA on both nodes.
For those permissions:
Reader and Network Contributor: Grants read-only access to Azure resources and the permissions required to manage networking resources.
Contributor: Provides full access to Azure resources.
Once Bob’s Cloud platform team assigned the Netscaler Managed Identity to the Azure virtual network that the NetScalers are using, HA failover was working as expected. The secondary IPs have moved over to the secondary NetScaler node, and all errors have cleared from cloud-ha-daemon.log.
Problem 1: Solution:
The virtual network that the NetScalers were using did not have the permissions they needed; they had changed the permissions for the VMs but not the NICs(different resource groups). They were separate things. When this was resolved, the HA worked, and the HA log was clear.
Problem 2: Secondary IPs
Bob said:
I have created 20 x VIPs on the primary NetScaler node, and our cloud architect has commented that he expects to see these IPs listed as secondary IPs on the Azure VM. We have determined that every NetScaler VIP must have an associated secondary IP address attached to the NetScaler Azure VM.
But the problem we are facing is that when we create a new NetScaler VIP either via the CLI or GUI, the associated secondary IP address doesn't get created on the NetScaler Azure VM. We were expecting this to be an automated process.
So the process we have to follow is:
1. Create a VIP on the NetScaler via CLI
2. Create the secondary IP address on the NetScaler Azure VM
As we need to provision many thousands of VIPs, the extra steps required will be problematic for us, as different teams manage the NetScaler and the Azure VMs.
Problem 2: Solution 1
The best practice is for customers to explicitly configure the secondary IP addresses that are used for VIP or SNIP assignments on the NetScaler. During HA failover events, NetScaler only migrates the already-configured secondary private and associated public IP addresses. It does not create or replicate any additional configuration automatically.
Currently, there is no built-in mechanism to detect or reconcile differences between the IP addresses configured on the NetScaler and those assigned on the cloud platform. This limitation applies across platforms, including Azure, AWS, and GCP.
If the customer wishes to automate the configuration or detection of missing secondary IP addresses, the following approaches can be considered:
Azure CLI using a Service Principal
Azure CLI using az login (browser-based interactive login)
Python SDK using a Service Principal
Python SDK using a Managed Identity (when run from within an Azure VM)
We recommend approach 4, where the customer runs the tool directly from the NetScaler instance, assuming the required Reader and Network Contributor roles are already assigned. This approach allows the script to detect any missing IP addresses and add them as secondary IPs to the appropriate network interfaces.
A Python tool is available for this purpose and should be executed only from the primary NetScaler instance. First, copy the script to the /var/ directory on the instance. Then, run the following command from the shell:
python3 /var/netscaler_sync.py --netscaler-ip <IP> --username nsroot --password '<password>'
The Python script is something I can share, if required..
Problem 2: Solution 2
My colleague Marcelo came up with another possible solution for this. Simply reserve the NetScaler a subnet space that is not part of an existing subnet. Then on the NS, configure the VIPs within this subnet space and connect the NetScaler via BGP to an Azure Route Server to inject the VIP /32 addresses. That way, Azure will send packets destined to VIPs to the NS without having to configure secondary IPs to the NS VMs at Azure Resource Manager.
More details here:
Summary
Two different problems and three solutions! These are both quite easy to fix if you know where to look or who to ask..
Have a good one.