Azure Security

Deploy Azure API Management in an internal network

Deploy Azure API Management in an internal network

TL;DR

Remember to associate a user defined route and enable service endpoints on your API Management (APIM) subnet when deploying in an internal network in a Virtual WAN or other forced routing scenario. This is necessary for the management traffic for your APIM.

Jump to recipe

AI DISCLAIMER: This post was not written by an AI, but I used ChatGPT to generate the featured image. Why not AI? I want to write the blog posts myself to learn as much as possible. There is enough slop out there, and I don’t want to push slop myself..

Background

I have been testing Api Management for an internal project lately, and I have honestly been struggling with deployment. Every time my terraform apply has served me with an error message about not bein able to reach the management endpoint.

Error: retrieving Policy for Service (Subscription: "<subscription-id>"
│ Resource Group Name: "rg-apim-resource-group"
│ Service Name: "<apim-service-name>"): unexpected status 422 (422 Unprocessable Entity) with error: ManagementApiRequestFailed: 
| Failed to connect to management endpoint <apim-service-name>.management.azure-api.net:3443 for a service deployed in a Virtual Network.
| Make sure to follow guidance at https://aka.ms/apim-vnet-common-issues for Inbound connectivity to Management endpoint.
| Check 'ApiManagement Control Plane - inbound' connectivity at https://aka.ms/apimnetworkstatus.

Somehow I thought this was an issue with DNS resolution or missing NSG rules, but I was actually ignoring a very important step in the network preparation.

The environment

I have a virtual WAN with an Azure Firewall in the central hub. My APIM is deployed in a spoke virtual network. Virtual WAN simplifies much of the manual routing we need to do when deploying our landing zones. There is usually no need for User Defined Routes (UDR), because vWAN automatically creates routes on the virtual network. The Azure Firewall is managed by a different team in my organization.

Troubleshooting steps

Initial deployment timed out, but the APIM was created as needed. However, on the next terraform plan I got the error message regarding access to management endpoint.

In preparation I skimmed the network prerequisites, and I had already added the NSG rules to my code. For reference, you can find the rules here.

They look like this:

security_rules = [
    {
      access                     = "Allow"
      description                = "Allows inbound from ApiManagement tag"
      destination_address_prefix = local.subnet_address_prefix
      destination_port_range     = "3443"
      direction                  = "Inbound"
      name                       = "Allow-incoming-traffic-from-apimanagement"
      priority                   = 300
      protocol                   = "Tcp"
      source_address_prefix      = "ApiManagement.NorwayEast"
      source_port_range          = "*"
    },
    {
      access                     = "Allow"
      description                = "Allows inbound from AzureLoadbalancer tag"
      destination_address_prefix = local.subnet_address_prefix
      destination_port_range     = "6390"
      direction                  = "Inbound"
      name                       = "Allow-incoming-traffic-from-azureloadbalancer"
      priority                   = 310
      protocol                   = "Tcp"
      source_address_prefix      = "AzureLoadBalancer"
      source_port_range          = "*"
    },
    {
      access                     = "Allow"
      description                = "Allows outbound to Storage tag"
      destination_address_prefix = "Storage.NorwayEast"
      destination_port_range     = "443"
      direction                  = "Outbound"
      name                       = "Allow-outbound-to-storage"
      priority                   = 300
      protocol                   = "Tcp"
      source_address_prefix      = local.subnet_address_prefix
      source_port_range          = "*"
    },
    {
      access                     = "Allow"
      description                = "Allows outbound to SQL tag"
      destination_address_prefix = "Sql.NorwayEast"
      destination_port_range     = "1443"
      direction                  = "Outbound"
      name                       = "Allow-outbound-to-sql"
      priority                   = 310
      protocol                   = "Tcp"
      source_address_prefix      = local.subnet_address_prefix
      source_port_range          = "*"
    },
    {
      access                     = "Allow"
      description                = "Allows outbound to AzureKeyVault tag"
      destination_address_prefix = "AzureKeyVault.NorwayEast"
      destination_port_range     = "443"
      direction                  = "Outbound"
      name                       = "Allow-outbound-to-azurekeyvault"
      priority                   = 320
      protocol                   = "Tcp"
      source_address_prefix      = local.subnet_address_prefix
      source_port_range          = "*"
    },
    {
      access                     = "Allow"
      description                = "Allows outbound to AzureMonitor tag"
      destination_address_prefix = "AzureMonitor"
      destination_port_ranges    = ["443", "1886"]
      direction                  = "Outbound"
      name                       = "Allow-outbound-to-azuremonitor"
      priority                   = 330
      protocol                   = "Tcp"
      source_address_prefix      = local.subnet_address_prefix
      source_port_range          = "*"
    }
  ]

The required NSG rule for ApiManagement traffic is already added, hence this was not the issue here 🤔

I started backtracking and checking the network. Perhaps our CI/CD runners were not able to resolve the management endpoint? Unfortunately I had already deleted the deployed APIM by removing it from terraform state and cleaning manually. This was necessary because my terraform pipeline was failing and I couldn’t fix the issue with terraform itself. Our runners can resolve the management endpoint, and this isn’t even necessary since the management endpoint is accessed from a Microsoft subset of public IPs noted by the ApiManagement service tag.

The issue had to be somewhere else.

Service Endpoints

In addition to the NSG rules, you need to add some service endpoints so APIM can access necessary storage, SQL, and Key Vault. These service endpoints are easily added with the following example terraform code (tailor it to your own needs, this is an example):

resource "azurerm_subnet" "example" {
  name                 = "example-subnet"
  resource_group_name  = azurerm_resource_group.example.name
  virtual_network_name = azurerm_virtual_network.example.name
  address_prefixes     = ["10.0.0.0/24"]

  service_endpoints         = ["Microsoft.Storage", "Microsoft.Sql", "Microsoft.KeyVault"]
}

Api Management service terraform

For those interested, here is the terraform code I used to deploy the APIM:

resource "azurerm_api_management" "api_management_service" {
  name                          = "apim-service-name"
  location                      = azurerm_resource_group.apim_rg.location
  resource_group_name           = azurerm_resource_group.apim_rg.name
  public_network_access_enabled = true # Must be set to true initially for internal network mode, but can be set to false after initial deployment
  sku_name = "Developer_1"

  identity {
    type = "SystemAssigned"
  }

  publisher_name  = "Contoso"
  publisher_email = "admin@contoso.com"

  virtual_network_configuration {
    subnet_id = module.azurerm_subnet.subnet.id
  }

  virtual_network_type = "Internal"
  depends_on = [azurerm_subnet_route_table_association.apim_management_route_table]
}

The solution

After going back and forth with examples online of other people deploying APIM in an internal network, I finally found the missing piece of the puzzle when re-reading the documentation. After reading about the NSG rules I had completely overlooked the important - and unfortunately highly relevant - part. Big facepalm moment on my part, but what can you do? 🤷‍♂️

By adding this configuration to my terraform code, I was able to deploy the APIM successfully:

resource "azurerm_route_table" "apim_management_route_table" {
  name                          = "rt-apim-management"
  location                      = azurerm_resource_group.apim_rg.location
  resource_group_name           = azurerm_resource_group.apim_rg.name
  bgp_route_propagation_enabled = true

  route {
    name           = "management_route"
    address_prefix = "ApiManagement.NorwayEast" # Change this if you are not deploying in Norway East, or you can use the base service tag "ApiManagement"
    next_hop_type  = "Internet"
  }
}

resource "azurerm_subnet_route_table_association" "apim_management_route_table" {
  route_table_id = azurerm_route_table.apim_management_route_table.id
  subnet_id      = module.azurerm_subnet.subnet.id
}

In summary

  • Read the documentation properly before deploying. I should have done that 🤦‍♂️ 🤠
  • Associate a UDR on the APIM subnet before deployment.
  • Create necessary NSG rules to allow traffic to the management endpoint.
  • Enable service endpoints on the APIM subnet.

In any case this post will serve as a form of noteToSelf. Please leave comments and criticism if you see something incorrect/insecure/inaccurate!