Azure Budgets and Azure OpenAI Cost Management
Azure OpenAI is a cloud-based service that allows you to access and use large-scale, generative AI models, such as GPT-4 and Dall-E, for various applications involving language, code, reasoning, and image generation. Azure OpenAI Service provides enterprise-grade security and reliability for your AI workloads.
Azure OpenAI Pricing Overview
To give everyone a bit of a context, the pricing for Azure OpenAI Service is based on a pay-as-you-go consumption model, which means you only pay for what you use. The price per unit depends on the type and size of the model you choose. It also depends upon the number of tokens being used in the prompt and the response. Tokens are the units of measurement that OpenAI uses to charge for its API services. Each request to the API consumes a certain number of tokens, depending on the model, the input length, and the output length. For example, if you use the DaVinci model to generate a 100-token completion from a 50-token prompt, you will be charged 150 tokens in total. Tokens are not the same as words or characters. They are pieces of text that the models use to process language. For example, the word “hamburger” is split into three tokens: “ham”, “bur”, and “ger”. A short and common word like “pear” is a single token. Naturally, what this means is that the more tokens you use, the more the pricing, and hence the ask that is stated below.
Problem Statement
Many customers who have discussed Azure OpenAI have expressed a desire to control their costs by setting budgets and taking actions based on them. One specific and repeated request has been to put a hard stop to calls made to the AOAI service automatically once a certain budget threshold is crossed.
Possible Solution
AOAI is an API based service and thus, there is no option to Start/Stop it like some other services in Azure. However, there is a workaround that can be taken to achieve this goal. Here is my perspective on how we can do this by adding Azure API Management layer before AOAI.
You can optionally route the requests through your usual enterprise route like FW, App Gateway etc. through your Hub.
This has multiple advantages besides the cost aspect which we will discuss a bit later.
- Provide a gate between the application and the AOAI instance to inspect and action on the request and response.
- Enforce and use Azure AD authentication.
- Provide enhanced logging for security and for tracking the use of tokens through the service.
- Lastly, use policies to restrict the usage if a certain cost threshold has been crossed.
The approach that I have taken here is to create an inbound processing policy using the ip-filter rule and leave the allowed IPs list blank. This will deny all the calls made to the service as none of the requestor IP is in the list. Here is what the basic inbound processing policy with ip-filter will look like:
Let us start by creating the automation runbook which will add the above policy to the relevant scope under inbound processing. Scopes can be at the tenant level, Product level, API level or even at the operations level under each API. In this example, we will apply it at the API level.
Here is the code for the runbook:
If you notice, we are using the System Managed Identity of the automation account to authenticate and authorize the runbook to perform actions against the APIM API policy. Ensure that you give relevant RBAC permission to the automation account, otherwise the operation would fail. The minimum permission needed to do this would be 'Microsoft.ApiManagement/service/apis/operations/policies/write'.
Once you have created and published the runbook, head over to the Budgets and create a budget at the scope you require. The scope here can be at Management Group, Subscription, Resource Group or even at the Resource level.
On the next page, select when the alert needs to be triggered. This can be done on both the Actual consumption as well as the Forecasted one. Below, it will alert if the actual consumption reaches 120% of my budget value.
We will need to create an Action Group which will now call our Runbook in case this alert gets triggered. You either create this separately or click on the Manage action group link.
Create a new Action Group and head to the Actions section
Select the runbook created in the previous steps:
Give the action a name, add tags as necessary and save the Action Group. Click on the Create Budget link at the top of the page (if you had navigated from the budgets page).
Select the Action group that was created and add an email recipient to get the email if this budget alert is triggered.
And that is it! The billing data gets refreshed every 24 hours, so in case the budget alert gets triggered, an email will be sent, and the API policy will be added to restrict access to the service.
Optionally, you can setup another runbook with a schedule to reset the inbound policy on 1st of every month (or 1st day of the billing period).
One important footnote to this whole idea is to lock down access to the Azure OpenAI service using Private Endpoints and Service Firewall to restrict access. Only the APIM instance should be able to access the AOAI service, otherwise this solution will not work. People can easily circumvent the APIM to directly hit the service and make this solution redundant.
In conclusion, Azure OpenAI Service is a powerful and flexible way to leverage the state-of-the-art AI models for various applications. However, it also comes with a significant cost that depends on the type, size, and usage of the models. Therefore, it is important to plan and optimize your AI workloads to control your costs and avoid unnecessary expenses.
Published on:
Learn moreRelated posts
Writing Azure service-related unit tests with Docker using Spring Cloud Azure
This post shows how to write Azure service-related unit tests with Docker using Spring Cloud Azure. The post Writing Azure service-related uni...
Azure SDK Release (March 2026)
Azure SDK releases every month. In this post, you find this month's highlights and release notes. The post Azure SDK Release (March 2026) appe...
Specifying client ID and secret when creating an Azure ACS principal via AppRegNew.aspx will be removed
The option to specify client ID and secret when creating Azure ACS principals will be removed. Users must adopt the system-generated client ID...
Azure Developer CLI (azd): Run and test AI agents locally with azd
New azd ai agent run and invoke commands let you start and test AI agents from your terminal—locally or in the cloud. The post Azure Developer...
Microsoft Purview compliance portal: Endpoint DLP classification support for Azure RMS–protected Office documents
Microsoft Purview Endpoint DLP will soon classify Azure RMS–protected Office documents, enabling consistent DLP policy enforcement on encrypte...
Introducing the Azure Cosmos DB Plugin for Cursor
We’re excited to announce the Cursor plugin for Azure Cosmos DB bringing AI-powered database expertise, best practices guidance, and liv...
Azure DevOps Remote MCP Server (public preview)
When we released the local Azure DevOps MCP Server, it gave customers a way to connect Azure DevOps data with tools like Visual Studio and Vis...
Azure Cosmos DB at FOSSASIA Summit 2026: Sessions, Conversations, and Community
The FOSSASIA Summit 2026 was an incredible gathering of developers, open-source contributors, startups, and technology enthusiasts from across...