Picture the scene, you get a delivery!
The boss has signed off on some new NetScaler appliances! The thing is, your operation runs 24x7. Downtime isn’t an option.
Problem statement: I just got my new MPX NetScaler’s, how can I migrate off that EOL appliance to a new one quickly and with little to no downtime? Something like a MPX8015 to a MPX9120? The current setup has pairs of NetScaler’s in High Availability. This will help the process.
Solution: I can use High Availability, and I will use some commands to limit HA config sync and command propagation. A call out to Steven Wright for the process!
Prerequisites:
Ensure both nodes are on the exact same release. Likely the latest 14.1 or 13.1 build.
Nsroot access to the appliances.
Access to the network team (later on)
A config editor
Assumptions
This is a MPX process, VPX has other options (to snapshot).
SDX is a bit more involved, as there are more elements.
The process assumes physical access to the devices, you are in the DC.
Before any migration steps
A. Make a screenshot of your vServers and their services so that you have a reference for what was online before you begin any work.
B. Test HA failover before you change anything. This is to ensure that it does work before any changes are made. Confirm that there is not an existing problem that has lain dormant? HA issues might be linked to GARP not working/never having worked.
C. Ensure the firmware of the existing NetScalers and the replacement NetScalers match, upgrade the existing NetScalers before the migration (if needed) - the purpose of this step principally being to isolate any issue to a firmware upgrade or a hardware change rather than making two changes at the same time.
The test plan for the firmware should look at all the key use cases that the users make use of, sometimes as an admin it is easy to look at the system and think it is fine.
Migration Steps:
1. Do a complete backup of the primary node. This is just insurance, remember your boss….
2. Define the current primary as 'Stay Primary' and disable HA sync and HA propagation in the HA setup.
Repeat this on the secondary node, 'Stay Secondary' with HA sync and HA propagation off also. In the GUI it looks like this:
The STAYSECONDARY is arguably not required, but disabling command sync and propagation on secondary are needed.
At this point you are in a change control freeze, as any HA failover will require manual intervention(take the nodes out of Stayx mode and force them over). This is by design while we make changes. Also, we are making no new changes to the service, no new VIPS or other changes will be added to the two appliances.
3. Prepare a new (MPX9100) Secondary away from the production setup. This would likely be in a pre-production environment. We will be using the live secondary config, so it cannot be in the live network at the same time. A crossover cable into a laptop works.
The usual "on NetScaler" steps are here: https://docs.netscaler.com/en-us/netscaler-hardware-platforms/mpx/migrating-configuration-of-existing-appliance-to-another-appliance.html
The ports on the 9100 are 25/x, so the config will need some changes to clean it up from the 10/x on the MPX8015.
4. Take a deep breath. With the new Secondary setup, under change control, swap over the connections from the live secondary. Nothing should happen as production services remain live on the primary. Another way to do this would be to get on the switch and ‘shut/no shut’ the ports. This could be better if there are a bunch of them. Thanks Simeon for this tip.
5. Run show HA node on the primary, it should see the new secondary is listed (the mac address should be different).
At this point we have a mixed HA setup of MPX8015(Primary) and MPX9100(secondary), HA sync & propagation is still off.
Wake up the network team, have the Router lady/gentlemen on stand by for the next phase as existing issues with the router processing GARP packets may already be seen at every HA failover and we will want them to flush the router's ARP cache if that happens. Ideally, we will know this isn't the case before we begin, but it never hurts to have a belt and braces approach with people on standby. In the unlikely event this does happen, introducing a VMAC may be a wise future step. This allows the mac to float between the nodes with the config.
6. Re-assign both nodes to participate in HA, by turning off Stay Secondary/Primary.
Run "force HA failover" on the CLI and test... etc
Run with this mixed setup for sometime to be sure, all is well with the new MPX9100. Steven suggested an extended test at this point.
Compare vServers and services to those captured before, ensure that services states match. What is down and what is up?
7. Now we have live service running on the MPX9100, all is running.
8. Repeat the process for adding the new (MPX9100) secondary node. Taking the config that was applied at step 3.
At this stage we would have two new nodes on the MPX9100 with HA sync and command propagation still disabled.
9. Enable HA sync and command propagation. Run the CLI command "force HA sync" and observe success or reset the RPC node passwords in the event of synchronization errors.
I don’t like your steps?
Robert, a colleague in the US stumbled across this gem for another customer: https://github.com/netscaler/console-netscaleradc-config-migration
I have not tired it yet, but will stick it on the to do list, as it looks a good one.
What else?
Sometimes having advice from people that have experienced issues with upgrades is very helpful. One of my favourite upgrade issue was when Steven Wright came to upgrade an HA pair and, just before rebooting a very old secondary node to validate it actually would reboot before starting work, discovered that the flash card (holding the configuration) had been borrowed.
Happy days!
Summary
A simple process, could take 10mins if you are NetScaler proficient, or have done it a few times.. :-)