Skip to content

VPN demystified with HA options – part 1 – Single IP HA

Recently one of my customer asked me how we can improve his current VPN setup to 3rd party environment for maintenance window. On customer’s side we have Aviatrix spoke GW with HA option enabled. 3rd party is just AZURE VNG.

Before we jump into optimization,  let’s talk about VPN itself (from Aviatrix point of view). I needed to concrete my knowledge and let me share with you a few TIPs while taking that path.

Few facts about VPN

VPN is a broad topic and there are many great articles about it – I assume you have basic knowledge so apologies if I don’t explain everything in details. It is also not about encryption algorithms, IKE negotiations and IPSEC itself. It is more about they way you do it on Aviatrix Platform.

There are 2 Types of VPNs:

  • Policy-Based
  • Route-Based

Here is very good Comparison between Policy Based vs Route Based VPN done by Design Expert – Shahzad Ali of Aviatrix (worth reading as there is no point in writing about it one more time) – LINK to AVIATRIX community

There are 2 places to configure this on Aviatrix Controller:

Multi-Cloud Transit -> Setup -> external connections
  • If Aviatrix HA GW is deployed, both will actively try to establish connections to the same remote-GW
Site2Cloud -> Setup
  • Supports Policy-Based and Route-Based
  • For Route-Based VPN – VTI tunnel is created
  • doesn’t support BGP
  • Gives you an option to establish backup tunnel from HAGW
  • Supports MAPPED NAT – very useful feature when we have overlapping IPs on both ends. That allows to establish NAT in a subnet-to-subnet manner – simplifying configuration of it.
  • CIDR of remote-subnet is not automatically propagated into Active-Mesh (towards Transits – if tunnel terminated on SPOKE). We need to use “Customize Spoke Advertised VPC CIDRs” to inject it into Aviatrix infrastructure or use “Auto Advertise Spoke Site2Cloud CIDRs” (mapped tunnels only!)

Let’s get back to customer’s case:

Initial setup

Problem here is that there is one VPN tunnel only and no HA at all. Assuming Azure handles failover scenarios in its own way (this is just A/S deployment with PIP move to 2nd instance in case of failover). Because it’s traffic-selector being used for tunnel we can’t establish 2nd backup tunnel from Aviatrix HA GW, and here comes the problem – what do we do for maintenance of Primary GW (upgrade lets say)? We loose connectivity.

Let’s take a look at Spoke’s GW route tables:

As there is only 1 tunnel from primary one we see the next hop for VTI interface (tun-14E5DC4F-0) with metric 100. HA GW on the other hand is pointing to primary spoke GW – makes sense right?

Let’s try to make it better…

Option 1 - Single IP HA

This feature was introduced some time ago (Controller version of 6.4) but is not being used that often what I can see. The idea behind it is to migrate Public IP of one of the GW to standby one and re-establish the VPN. Simple right? That feature requires standalone GW  and HAGW (Transit and spoke are not supported as it’s harder to migrate PIP having active/mesh enabled on them). Interesting use case for this one is if you are migrating VPN from on-PREM world where you have HA but based on FHRP (VRRP or HSRP) and actually have only 1 active tunnel there. Here with both Aviatrix GWs we are simulating such environment.

Key things to remember:

This is a standalone GW so workflow for deploying  HAGW is slightly different. You need to go to

GATEWAY -> EDIT -> Gateway for High Availability Peering

Or with TF by just specifying 2 additional attributes (peering_ha_subnet, peering_ha_gw_size)

resource “aviatrix_gateway” “standalone_1” {

  cloud_type         = 8

  account_name       = var.avx_ctrl_account_azure

  gw_name            = ${local.env_prefix}-standalone-1″

  vpc_id             = module.mc_spoke_1.vpc.vpc_id

  vpc_reg            = var.azure_region

  gw_size            = “Standard_B1ms”

  subnet             = module.mc_spoke_1.vpc.public_subnets[0].cidr

  zone               = “az-1”

  peering_ha_subnet  = module.mc_spoke_1.vpc.public_subnets[0].cidr

  peering_ha_gw_size = “Standard_B1ms”


When having HAGW we just need to create new Site2Cloud connection with the following attributes specified:

resource “aviatrix_site2cloud” “site2cloud_1” {

  vpc_id                     = “vpntest-spoke1:rg-av-vpntest-spoke1-308843:ecae61e1-f8df-40fc-a0cb-b2160528d87b”

  connection_name            = “conn2”

  connection_type            = “unmapped”

  remote_gateway_type        = “generic”

  tunnel_type                = “route”

  primary_cloud_gateway_name = “vpntest-standalone-1”

  remote_gateway_ip          = azurerm_public_ip.VNG-PIP.ip_address

  ha_enabled                 = true

  enable_single_ip_ha        = true

  private_route_encryption   = false

  remote_subnet_cidr         = “”

  local_subnet_cidr          = “,,,”

  backup_gateway_name        = “vpntest-standalone-1-hagw”

  backup_remote_gateway_ip   = azurerm_public_ip.VNG-PIP.ip_address

  custom_algorithms          = false

  pre_shared_key             = “secret$123”

  backup_pre_shared_key      = “secret$123”

  enable_dead_peer_detection = true

  enable_event_triggered_ha  = true

  enable_ikev2               = false


As soon as tunnel goes up Controller is smart enough to program all UDRs in this spoke VNET so will appear.

Of course normally we want this prefix to be accessible from the rest of the spokes. To make it happen we need to inject it into transit from the spoke.


Lets check how long does it take to failover. Initial state is that only 1 GW has tunnel being UP

And both GWs are running

Keep in mind that you will see public IPs swapped but that’s not all what is happening behind the scenes. Controller does UDR programming for you and IPSEC tunnel still needs to come up.


There are a few options that have impact on the time it takes to failover. In my case these were the following:


This option is quite nice but would not meet customer’s goal – maintenance purpose.  With some tuning of parameters above I was able to lower the time needed for failover to  150s. It is not a matter of detection it’s slowness of AZURE’s API itself. For AWS I was surprised to see time around 30s which is acceptable in my opinion. In Azure I think we could upgrade GW faster than doing failover 😊.

It serves its main purpose though – HA. Second GW is deployed in a different AZ so we have our resiliency there and remote ends points only to single IP which is what we wanted.  

Follow-up – coming soon
next option will be BGP with 2 different scenarios (Active / Passive vs Active / Active – with VNG)

1 thought on “VPN demystified with HA options – part 1 – Single IP HA”

  1. Pingback: Providing Scalability and Availability for Site-2-Cloud VPN with Overlapping IP addresses – RTrentin's world

Leave a Reply

Your email address will not be published. Required fields are marked *