Azure Security

Azure Firewall Kusto optimization

Azure Firewall Kusto optimization

TL;DR

Operators on IPv4 addresses can optimize your Azure Firewall kusto queries!

My personal collection of useful kusto queries.

Background

Azure Firewall can generate a large amount of log data if you have enabled diagnostic settings on it. Parsing through the data to find particular events can be time consuming and ineffective. Enter Operators on IPv4 Addresses. This is a special operator that is particularly good at filtering out IP-address prefixes or IPv4-addresses.

Kusto

What is the Kusto Query Language? I will not write a thesis on the basics of KQL here, but Microsoft has this to say:

Kusto Query Language is a powerful tool to explore your data and discover patterns, identify anomalies and outliers, create statistical modeling, and more. The query uses schema entities that are organized in a hierarchy similar to SQL’s: databases, tables, and columns.

Basically, it is a language for querying your log data in Log Analytics Workspaces, among other services. Personally I mostly use it to dig for log events in Azure Application Gateway or Azure Firewall. You can also use queries to check your billed data in Log Analytics Workspace.

The Kusto query

I am using a particular query in this instance to exemplify the time saved when using the operator. This is the somewhat complex query:

// Fetch AzureDiagnostics data
AzureDiagnostics
// for the past 30 days
| where TimeGenerated > ago(30d)
// if msg_s contains ip prefix 10.0.0 or 10.0.1 or 10.0.2
| where msg_s has "10.0.0" or msg_s has "10.0.1" or msg_s has "10.0.2"
// if category is AzureFirewallNetworkRule or AzureFirewallApplicationRule
//optionally apply filters to only look at a certain type of log data
//| where OperationName == "AzureFirewallNetworkRuleLog"
//| where OperationName == "AzureFirewallNatRuleLog"
//| where OperationName == "AzureFirewallApplicationRuleLog"
//| where OperationName == "AzureFirewallIDSLog"
//| where OperationName == "AzureFirewallThreatIntelLog"
| extend msg_original = msg_s
// normalize data so it's eassier to parse later
| extend msg_s = replace(@'. Action: Deny. Reason: SNI TLS extension was missing.', @' to no_data:no_data. Action: Deny. Rule Collection: default behavior. Rule: SNI TLS extension missing', msg_s)
| extend msg_s = replace(@'No rule matched. Proceeding with default action', @'Rule Collection: default behavior. Rule: no rule matched', msg_s)
// extract web category, then remove it from further parsing
| parse msg_s with * " Web Category: " WebCategory
| extend msg_s = replace(@'(. Web Category:).*','', msg_s)
// extract RuleCollection and Rule information, then remove it from further parsing
| parse msg_s with * ". Rule Collection: " RuleCollection ". Rule: " Rule
| extend msg_s = replace(@'(. Rule Collection:).*','', msg_s)
// extract Rule Collection Group information, then remove it from further parsing
| parse msg_s with * ". Rule Collection Group: " RuleCollectionGroup
| extend msg_s = replace(@'(. Rule Collection Group:).*','', msg_s)
// extract Policy information, then remove it from further parsing
| parse msg_s with * ". Policy: " Policy
| extend msg_s = replace(@'(. Policy:).*','', msg_s)
// extract IDS fields, for now it's always add the end, then remove it from further parsing
| parse msg_s with * ". Signature: " IDSSignatureIDInt ". IDS: " IDSSignatureDescription ". Priority: " IDSPriorityInt ". Classification: " IDSClassification
| extend msg_s = replace(@'(. Signature:).*','', msg_s)
// extra NAT info, then remove it from further parsing
| parse msg_s with * " was DNAT'ed to " NatDestination
| extend msg_s = replace(@"( was DNAT'ed to ).*",". Action: DNAT", msg_s)
// extract Threat Intellingence info, then remove it from further parsing
| parse msg_s with * ". ThreatIntel: " ThreatIntel
| extend msg_s = replace(@'(. ThreatIntel:).*','', msg_s)
// extract URL, then remove it from further parsing
| extend URL = extract(@"(Url: )(.*)(\. Action)",2,msg_s)
| extend msg_s=replace(@"(Url: .*)(Action)",@"\2",msg_s)
// parse remaining "simple" fields
| parse msg_s with Protocol " request from " SourceIP " to " Target ". Action: " Action
| extend 
    SourceIP = iif(SourceIP contains ":",strcat_array(split(SourceIP,":",0),""),SourceIP),
    SourcePort = iif(SourceIP contains ":",strcat_array(split(SourceIP,":",1),""),""),
    Target = iif(Target contains ":",strcat_array(split(Target,":",0),""),Target),
    TargetPort = iif(SourceIP contains ":",strcat_array(split(Target,":",1),""),""),
    Action = iif(Action contains ".",strcat_array(split(Action,".",0),""),Action),
    Policy = case(RuleCollection contains ":", split(RuleCollection, ":")[0] ,Policy),
    RuleCollectionGroup = case(RuleCollection contains ":", split(RuleCollection, ":")[1], RuleCollectionGroup),
    RuleCollection = case(RuleCollection contains ":", split(RuleCollection, ":")[2], RuleCollection),
    IDSSignatureID = tostring(IDSSignatureIDInt),
    IDSPriority = tostring(IDSPriorityInt)
| project TimeGenerated,Protocol,SourceIP,SourcePort,Target,TargetPort,URL,Action, NatDestination, OperationName,ThreatIntel,IDSSignatureID,IDSSignatureDescription,IDSPriority,IDSClassification,Policy,RuleCollectionGroup,RuleCollection,Rule,WebCategory

The sections are commented, so they should be somewhat easy to identify and understand what they do. If something is unclear or outright wrong, please don’t hesitate to contact me and request a correction :-)

Without the operator

Takes a long time to actually perform the search, and especially if you do the filtering after parsing. This makes sense, as there are millions of records to sift through before finding the ones that I want returned. For each log event, the msg_s must be compared to my query strings, and see if they are alike. The contains operator can be switched with the has operator, to get a slightly lower execution time. Still, these are best suited to compare strings, and not necessarily to find IPv4-related content.

I have not measured exactly how long it takes, but one of the tests I did on a customer firewall (with millions of logs daily) spun for around 2 minutes before I killed it. This is the screenshot I based the post featured image on.

With the operators

The operators for IPv4 index searching are tailored for finding these patterns in logs, and will make your searches faster and better. There are four different versions, each with their own mode of operation.

has_ipv4

This one will find log records where single IPv4-address 10.0.0.1 is found in msg_s.

// Fetch AzureDiagnostics data
AzureDiagnostics
// for the past 30 days
| where TimeGenerated > ago(30d)
// if msg_s contains ip 10.0.0.1
| where has_ipv4(msg_s, "10.0.0.1")

has_ipv4_prefix

This one will find log records where the entire IPv4-prefix 10.0.0. is found in msg_s.

Note that the prefix is not mistyped. You need to have the trailing period for query to work. Also note that this finds entire prefixes, not custom ones like /21, /27, or /29: 10.0.0.0-10.0.0.255 / 10.0.0.0-10.0.255.255 / 10.0.0.0-10.255.255.255

// Fetch AzureDiagnostics data
AzureDiagnostics
// for the past 30 days
| where TimeGenerated > ago(30d)
// if msg_s contains any ip in 10.0.0.0-10.0.0.255
| where has_ipv4_prefix(msg_s, "10.0.0.")

has_any_ipv4

Sometimes you might want to return records if they contain one of several IPv4-addresses. You could make do with msg_s has "x.x.x.x" or msg_s has "x.x.x.y" or msg_s has "x.x.y.z". A better approach then would be to use this operator instead.

// Fetch AzureDiagnostics data
AzureDiagnostics
// for the past 30 days
| where TimeGenerated > ago(30d)
// if msg_s contains any ipv4 in array
| where has_any_ipv4(msg_s, dynamic(["10.1.2.3", "10.4.5.6", "10.7.8.9", "10.10.11.12"]))

has_any_ipv4_prefix

Then again, you might want to find several different prefixes. Again you could use string comparison with msg_s has "x.x." or msg_s has "x.y.x." or msg_s has "x.y.z.", but this would be inefficient. A better approach would be to use this operator instead.

// Fetch AzureDiagnostics data
AzureDiagnostics
// for the past 30 days
| where TimeGenerated > ago(30d)
// if msg_s contains any ipv4 in array
| where has_any_ipv4_prefix(msg_s, dynamic(["10.1.", "10.2.3.", "10.4.5", "10.6.7."]))

Useful kusto queries

I have been doing queries here and there, but for the most part just copy/paste from wherever. After looking for “that one good query I remember using one time”, I realized this needed to be stored somewhere outside of Azure for me. Because of this I started saving the queries in a public markdown file. Everyone can read this. You can even propose changes to it in your own branch if you want, and create a PR for me to review.

You can find them in my public GitHub repository.

In summary

You can save lots of execution time with using some clever operators. The specialized IPv4 operator is one of these.