TL;DR
Remember to associate a user defined route and enable service endpoints on your API Management (APIM) subnet when deploying in an internal network in a Virtual WAN or other forced routing scenario. This is necessary for the management traffic for your APIM.
AI DISCLAIMER: This post was not written by an AI, but I used ChatGPT to generate the featured image. Why not AI? I want to write the blog posts myself to learn as much as possible. There is enough slop out there, and I don’t want to push slop myself..
I have been testing Api Management for an internal project lately, and I have honestly been struggling with deployment. Every time my terraform apply has served me with an error message about not bein able to reach the management endpoint.
Error: retrieving Policy for Service (Subscription: "<subscription-id>"
│ Resource Group Name: "rg-apim-resource-group"
│ Service Name: "<apim-service-name>"): unexpected status 422 (422 Unprocessable Entity) with error: ManagementApiRequestFailed:
| Failed to connect to management endpoint <apim-service-name>.management.azure-api.net:3443 for a service deployed in a Virtual Network.
| Make sure to follow guidance at https://aka.ms/apim-vnet-common-issues for Inbound connectivity to Management endpoint.
| Check 'ApiManagement Control Plane - inbound' connectivity at https://aka.ms/apimnetworkstatus.
Somehow I thought this was an issue with DNS resolution or missing NSG rules, but I was actually ignoring a very important step in the network preparation.
I have a virtual WAN with an Azure Firewall in the central hub. My APIM is deployed in a spoke virtual network. Virtual WAN simplifies much of the manual routing we need to do when deploying our landing zones. There is usually no need for User Defined Routes (UDR), because vWAN automatically creates routes on the virtual network. The Azure Firewall is managed by a different team in my organization.
Initial deployment timed out, but the APIM was created as needed. However, on the next terraform plan I got the error message regarding access to management endpoint.
In preparation I skimmed the network prerequisites, and I had already added the NSG rules to my code. For reference, you can find the rules here.
They look like this:
security_rules = [
{
access = "Allow"
description = "Allows inbound from ApiManagement tag"
destination_address_prefix = local.subnet_address_prefix
destination_port_range = "3443"
direction = "Inbound"
name = "Allow-incoming-traffic-from-apimanagement"
priority = 300
protocol = "Tcp"
source_address_prefix = "ApiManagement.NorwayEast"
source_port_range = "*"
},
{
access = "Allow"
description = "Allows inbound from AzureLoadbalancer tag"
destination_address_prefix = local.subnet_address_prefix
destination_port_range = "6390"
direction = "Inbound"
name = "Allow-incoming-traffic-from-azureloadbalancer"
priority = 310
protocol = "Tcp"
source_address_prefix = "AzureLoadBalancer"
source_port_range = "*"
},
{
access = "Allow"
description = "Allows outbound to Storage tag"
destination_address_prefix = "Storage.NorwayEast"
destination_port_range = "443"
direction = "Outbound"
name = "Allow-outbound-to-storage"
priority = 300
protocol = "Tcp"
source_address_prefix = local.subnet_address_prefix
source_port_range = "*"
},
{
access = "Allow"
description = "Allows outbound to SQL tag"
destination_address_prefix = "Sql.NorwayEast"
destination_port_range = "1443"
direction = "Outbound"
name = "Allow-outbound-to-sql"
priority = 310
protocol = "Tcp"
source_address_prefix = local.subnet_address_prefix
source_port_range = "*"
},
{
access = "Allow"
description = "Allows outbound to AzureKeyVault tag"
destination_address_prefix = "AzureKeyVault.NorwayEast"
destination_port_range = "443"
direction = "Outbound"
name = "Allow-outbound-to-azurekeyvault"
priority = 320
protocol = "Tcp"
source_address_prefix = local.subnet_address_prefix
source_port_range = "*"
},
{
access = "Allow"
description = "Allows outbound to AzureMonitor tag"
destination_address_prefix = "AzureMonitor"
destination_port_ranges = ["443", "1886"]
direction = "Outbound"
name = "Allow-outbound-to-azuremonitor"
priority = 330
protocol = "Tcp"
source_address_prefix = local.subnet_address_prefix
source_port_range = "*"
}
]
The required NSG rule for ApiManagement traffic is already added, hence this was not the issue here 🤔
I started backtracking and checking the network. Perhaps our CI/CD runners were not able to resolve the management endpoint? Unfortunately I had already deleted the deployed APIM by removing it from terraform state and cleaning manually. This was necessary because my terraform pipeline was failing and I couldn’t fix the issue with terraform itself. Our runners can resolve the management endpoint, and this isn’t even necessary since the management endpoint is accessed from a Microsoft subset of public IPs noted by the ApiManagement service tag.
The issue had to be somewhere else.
In addition to the NSG rules, you need to add some service endpoints so APIM can access necessary storage, SQL, and Key Vault. These service endpoints are easily added with the following example terraform code (tailor it to your own needs, this is an example):
resource "azurerm_subnet" "example" {
name = "example-subnet"
resource_group_name = azurerm_resource_group.example.name
virtual_network_name = azurerm_virtual_network.example.name
address_prefixes = ["10.0.0.0/24"]
service_endpoints = ["Microsoft.Storage", "Microsoft.Sql", "Microsoft.KeyVault"]
}
For those interested, here is the terraform code I used to deploy the APIM:
resource "azurerm_api_management" "api_management_service" {
name = "apim-service-name"
location = azurerm_resource_group.apim_rg.location
resource_group_name = azurerm_resource_group.apim_rg.name
public_network_access_enabled = true # Must be set to true initially for internal network mode, but can be set to false after initial deployment
sku_name = "Developer_1"
identity {
type = "SystemAssigned"
}
publisher_name = "Contoso"
publisher_email = "admin@contoso.com"
virtual_network_configuration {
subnet_id = module.azurerm_subnet.subnet.id
}
virtual_network_type = "Internal"
depends_on = [azurerm_subnet_route_table_association.apim_management_route_table]
}
After going back and forth with examples online of other people deploying APIM in an internal network, I finally found the missing piece of the puzzle when re-reading the documentation. After reading about the NSG rules I had completely overlooked the important - and unfortunately highly relevant - part. Big facepalm moment on my part, but what can you do? 🤷♂️
By adding this configuration to my terraform code, I was able to deploy the APIM successfully:
resource "azurerm_route_table" "apim_management_route_table" {
name = "rt-apim-management"
location = azurerm_resource_group.apim_rg.location
resource_group_name = azurerm_resource_group.apim_rg.name
bgp_route_propagation_enabled = true
route {
name = "management_route"
address_prefix = "ApiManagement.NorwayEast" # Change this if you are not deploying in Norway East, or you can use the base service tag "ApiManagement"
next_hop_type = "Internet"
}
}
resource "azurerm_subnet_route_table_association" "apim_management_route_table" {
route_table_id = azurerm_route_table.apim_management_route_table.id
subnet_id = module.azurerm_subnet.subnet.id
}
In any case this post will serve as a form of noteToSelf. Please leave comments and criticism if you see something incorrect/insecure/inaccurate!