Azure Firewall Prescaling: Monitoring, Scenarios, and Cost Management - Part 2 | Hasan Gural

Hello Friends,

In Part 1 of this series, we covered how Azure Firewall autoscaling works by default, what prescaling is, how to configure it through the portal, PowerShell, and Bicep, and what the billing and limitations look like. In this second part, I want to go deeper on the operational side: how prescaling actually behaves under load, what the Observed Capacity Metric tells you and how to use it for planning and validation, and the real-world scenarios where prescaling genuinely makes the difference between smooth traffic ramps and avoidable performance degradation.

The goal here is practical. By the end of this article you should be able to look at your own Observed Capacity data, understand what it is telling you, identify the workload patterns where prescaling makes the biggest difference, and build a monitoring and alerting strategy around it.

How Capacity Units Work

Before configuring prescaling, it helps to have a clear mental model of what a capacity unit actually represents. A single capacity unit is a combination of the following:

2.22 Gbps of throughput
2,500 new connections per second
1 connection table (tracking active sessions)

Azure Firewall allocates multiple capacity units behind the scenes as it scales. The firewall's raw throughput ceiling, for example 30 Gbps for Standard or 100 Gbps for Premium, represents the total throughput available when fully scaled out. Each unit you prescale represents guaranteed headroom that exists before a single packet arrives.

When you configure prescaling, you are effectively telling Azure Firewall: "I know traffic is going to be significant. Start with at least N capacity units so you are ready."

The service still respects the underlying platform limits and SKU ceilings. Prescaling sets the floor, not the ceiling. Maximum capacity is still governed by autoscale and the SKU maximums.

The Observed Capacity Metric

The Observed Capacity metric (ObservedCapacity) is the primary signal for understanding how your firewall is scaling in practice. It is available in Azure Monitor under the Microsoft.Network/azureFirewalls resource type and supports three aggregations: Average, Minimum, and Maximum.

Aggregation	What It Shows
Average	Typical capacity utilization over the time window. The best signal for real-time operational monitoring
Minimum	The lowest point during the interval. Useful for validating that prescaling is holding the floor
Maximum	The highest point during the interval. Useful for understanding peak capacity demands

The metric is sampled every minute (PT1M) and is exportable to Azure Monitor Logs via diagnostic settings, meaning you can write KQL queries against it, build workbooks around it, and set up alerts on it just like any other Azure Monitor metric.

Reading the Metric in Practice

The most useful thing you can do immediately after enabling prescaling is to verify the Minimum aggregation. If you set minCapacity to 5, the Minimum value should never drop below 5 during normal operation. If it does, something is wrong with the configuration or the setting has not propagated yet.

The Average aggregation tells you how close you are to that floor during quiet periods. If your Average sits at 2 while your minimum is set to 5, that means your baseline traffic is well below what you have reserved, which is a signal that either your prescaling configuration is appropriate (you have headroom) or you have over-provisioned for the current load (and may want to revisit the cost trade-off).

The Maximum aggregation is where you catch demand spikes. If you see the Maximum spiking significantly above your minimum during specific time windows, those windows are exactly the periods prescaling was designed to cover.

Using the Metric for Capacity Planning

The documentation recommends combining historical Observed Capacity data with traffic trend analysis for forecasting. In practice, the process looks like this:

Export Observed Capacity at one-minute granularity to a Log Analytics workspace over a 30-60 day period.
Write a KQL query to identify the distribution of capacity usage across different time-of-day and day-of-week buckets.
Look for consistent patterns: batch job windows, business-hours peaks, end-of-month processing spikes.
Set your minCapacity to cover the bottom of those known-high periods without waiting for autoscaling to react.

AzureMetrics
| where ResourceProvider == "MICROSOFT.NETWORK"
| where MetricName == "ObservedCapacity"
| where TimeGenerated > ago(30d)
| summarize
    AvgCapacity = avg(Average),
    MaxCapacity = max(Maximum),
    MinCapacity = min(Minimum)
    by bin(TimeGenerated, 1h), Resource
| order by TimeGenerated asc

This query gives you an hourly picture of capacity behavior across a 30-day lookback window. Plotting this data makes it much easier to identify the time windows where your floor needs to be higher than the autoscale default.

Real-World Scenarios Where Prescaling Brings Value

Prescaling is not a one-size-fits-all setting. There are specific patterns where the benefit is clear and others where it adds cost without meaningful impact. Here are the scenarios I encounter most often.

Scheduled Batch Jobs and ETL Pipelines

The most common and straightforward case. You have a data pipeline that runs every night at 02:00 and processes large volumes of traffic through the firewall over 90 minutes. Outside of that window, traffic is minimal. Autoscaling will not have time to spin up before the job starts hammering the firewall, but with prescaling you can set a minimum capacity that covers the expected load during that window.

This pairs well with Azure Automation or an Azure Function that adjusts the minCapacity setting before the job starts and lowers it again after completion. You only pay for the elevated minimum during the window you actually need it.

Business-Hours Traffic Patterns

Many enterprise environments see a predictable jump in traffic when employees start their workday. VPN connections come up, storage sync jobs kick off, application health checks intensify. For environments where the morning ramp-up is steep, prescaling ensures the firewall is already at a working capacity before the first spike arrives.

The opposite transition, end of business day, matters less because the firewall scales in gradually and the cost impact of carrying extra capacity for a few minutes is negligible.

Planned Migrations and Deployments

When you are moving workloads, cutting over services, or running a planned failover drill, traffic patterns through the firewall can change dramatically and suddenly. Pre-allocating capacity before a migration window means you are not debugging firewall latency at the same time you are managing a complex infrastructure change.

This is also relevant during Azure DevOps release pipelines that trigger infrastructure deployments at scale. If a deployment rolls out hundreds of agent connections simultaneously, the firewall sees an abrupt jump in new connections per second that autoscaling might not absorb fast enough.

Seasonal and End-of-Period Processing

Financial services, retail, and other industries with strong seasonality often see traffic spikes that are entirely predictable but infrequent enough that autoscaling baselines do not naturally account for them. End-of-quarter processing, tax filing periods, Black Friday traffic, or large batch reconciliation runs are all examples where you know weeks in advance that demand is coming. Prescaling gives you a clean way to prepare without permanently over-provisioning.

Proactive Alerting with Observed Capacity

One of the most underused aspects of the Observed Capacity metric is alerting. The documentation recommends alerting when scaling exceeds 80% of maxCapacity, but for prescaling scenarios the more useful alert is the other direction: alert when Observed Capacity drops below your expected minimum.

You can configure this in Azure Monitor as a metric alert:

Resource: Your Azure Firewall instance
Metric: ObservedCapacity
Condition: Minimum < your configured minCapacity
Evaluation frequency: Every 5 minutes
Aggregation period: 15 minutes
Severity: 3 (Informational)

This alert fires if the firewall is not maintaining the floor you expect. Combined with a high-side alert at 80% of your maximum, you get clear signals for both under-delivery and over-demand scenarios.

Correlating Capacity with Other Signals

The Observed Capacity metric is most useful when you pair it with the other Azure Firewall performance metrics:

Throughput (Throughput): Correlate capacity growth with actual traffic volume. If throughput is high but capacity is not scaling, that may indicate a configuration issue.
Latency Probe (FirewallLatencyPng): Watch for latency increases during scale-out transitions. If prescaling is correctly configured, you should not see significant latency spikes at traffic ramp-up times.
SNAT Port Utilization (SNATPortUtilization): As the firewall scales out (including through prescaling), more SNAT ports become available. If you are seeing high SNAT utilization despite having capacity headroom, you may need additional public IP addresses.

The combination of these four metrics gives you a complete picture of firewall health and capacity behavior.

Closing

In this article we walked through what capacity units represent, how to read and use the Observed Capacity metric for planning and monitoring, the workload scenarios where prescaling delivers the most value, and the cost management and alerting patterns that keep everything in check.

From here, the practical next step is to look at your own Observed Capacity data and identify the patterns. If you have regular batch jobs, steep business-hour ramps, or planned migration windows coming up, prescaling is a straightforward way to take a reactive scaling mechanism and make it predictable.

How Capacity Units Work​

The Observed Capacity Metric​

Reading the Metric in Practice​

Using the Metric for Capacity Planning​

Real-World Scenarios Where Prescaling Brings Value​

Scheduled Batch Jobs and ETL Pipelines​

Business-Hours Traffic Patterns​

Planned Migrations and Deployments​

Seasonal and End-of-Period Processing​

Proactive Alerting with Observed Capacity​

Correlating Capacity with Other Signals​

Closing​

References​