Azure Web Application Firewall- Bot Manager Scenarios
This article is part of our ongoing efforts to continually develop strategies against malicious bots.
The continuous integration of bots to simulate human engagement, especially for unethical activities in web applications lead to both security incidents and diversion of engagement with web resources. The advent of new AI projects and LLMs (Large Language Models) have also opened more avenues for vulnerabilities including prompt injections, data leakage, training data poisoning, unauthorized code execution etc.
It is essential to grant bots access to your websites. Microsoft Bot Manager Ruleset coupled with Web Application Firewall have been positioned to reduce the effect of illegitimate non-human access using different methods such as verified labels, static analysis (rate limit) and other behavioral analysis.
Some bots are of good category such as search engines for indexing bots, content fetchers, automated response services, and some are bad such as spam bots, scrapers, and brute force bots. There is also the grey area of bots that overwhelm your resources due to the share volume of requests.
Search engine optimization tools, for example use bots to perform SEO (search engine optimization) analytics for websites. These bots typically comply with instructions and policies set in the robots.txt file. Careful pruning of bot access includes using the robot.txt file to specify access, rate limiting, and bot challenges Some bots are also used for price extrapolation for web apps that compare prices of products. However, if you are focused on search engine optimization for your business and experience an excessive load from too many requests, you may struggle to slow down crawlers without blocking them. To resolve this, you can implement a combination of bot control detection and take advantage of rate-based rules with a response status code.
When you see contents from your websites mirrored on another website, you have seen the work of bots that aggregate contents for the bot owner’s web page to increase traffic and divert engagement to the owner’s page. The mirrored page might even appear above the original source in search requests.
Bot Protection Rule Action
Rule structure
There are different components of a bot manager rule. There is rule ID, state, action, and exclusions. For example, rule Bot200100 below for Search engine crawlers. We can change the enabledState to disable it and change the action if we do not want search engine crawlers.
Some bots have verified classification while there are also bots with unknown classification. In the example above, we have a good bot category.
If we want to block a known bot, for example an evilbot, you can use the several ways as explained in the documentation.
We will explore how to use Custom Rules in Azure WAF to block Bots, in later sections.
Good Bot
As mentioned earlier on this post, some bots are necessary for ad targeting, indexing, social media, personal assistants etc.
The step below highlights a simple way a known good bot may be used for SEO purposes.
A user might want to use a SEO bot to query the Bingbot API to confirm that a particular IP address is in their database. By appending the IP address to a GET request (in this case our juice shop website IP address), we obtain a negative response that it is not in the database (false in the result) and a 200 OK. This is a simple request that is otherwise harmless.
When a good bot is allowed e.g., for google ads, you can verify match in the log.
Bad bot
Bad bots are developed for malicious purposes such as bot nets, click frauds, spam, scrapers etc. They may also be used in IP spoofing. IP spoofing occurs when the IP header of a source packet is manipulated to reflect a different IP address in order to mask origin information for malicious activities. When spoofing, an attacker can deceive a target system into sending sensitive data to a false destination, thereby enabling unauthorized access to the data.
In this example, we will use the curl- command to show how WAF with Bot Manager responds when a request is made with at an attempt to falsify packet origin.
Use the Curl command with the User agents for Bing bot to query the Juice shop application as a valid request. You can find more information about the latest user agent here
(Note: juiceshop.server has been mapped to the App Gateway Public IP in my host file for this blog)
Now, we craft a curl command to inject a different public IP as our request origin by using (X-Forwarded-For) XFF IP address in the header for a Yandex user-agent.
We can see the WAF/BotManager response as shown in the log with reference to false identity.
Scenario 2: Deny Bot access- Web Scraper
Most websites use the robot.txt file to control bot access to their application. This page is an example of a robot.txt file that shows permission status for various web request types. While you can use “User-agent: * Disallow:/ “ to disable all bot access, SEO and ranking bots may be affected as well. We can use the user-agent type to control access to resources provided in the response. As an example, the following is a python script with a chrome user-agent.
Replace the URL variable with your URL, specify user-agent of choice and change the remainder fields as needed. (It's pertinent to mention that, even browser detection using the user agent string is unreliable hence the version number should be specified as well). The dnt (do not track) field has been specified as 1- to opt out of tracking across online activities. The sec-fetch-dest field is an example of a logical field to restrict bot requests. This field is used to specify the response type expected. When preventing bots to our web apps, If the destination type specified is audio and you do not serve audio content, you can deny such requests. In this example, we are expecting a type: document in the response.
Run the .py file and this should give Status Response: [200] OK when run. Change line 20 to print(response.text) to see the document.
We can now set up a custom rule to deny this type of access using this user-agent.
If we run the python file this time, we should get Error: 403 as the App Gateway WAF prevents access. Log image below
Scenario 3: Allow blocked bot
There are scenarios where a legitimate or unverified bot is denied access by the web application firewall. A user may want to permit this bot access to the application. For instance, an organization wants to take advantage of a customer survey bot API or leverage on new AI bot development.
In this example, a bot used to obtain and compare different bank data such as interest and API rates needs to fetch information about card services. The target bank Alturo Mutual (demo website for OWASP attack validation) has the URL altoromutual.com. As seen in the log, this was blocked by the BotManager ruleset under rule 300700.
While an exclusion may be used on Rule 300700 to permit bot access to the web application, this becomes too permissive for other bots as the selector field is no longer inspected for the User-Agents in the BotManager Ruleset 300700 category.
In order to allow access to this bot specifically, we can use the origin IP (clientIp_s) or the Request Header in the details_data_s which uses the User-Agent, in a custom rule and specify the User-agent python-requests/2.31.0. We then set the Action to "Allow" to grant the access.
The bot is now permitted access to fetch the html elements.
By default, the Azure BotManager rule set is deployed with known good bots allowed, bad bots are mostly blocked, and unknown bots are reviewed by the Threat Intelligence and logged or blocked depending on the actions assigned to them.
In conclusion, preventing bot access to your web resources involves multi-layered approach that combines security practices such as using updated WAF with Bot Managers, rate limiting, bot challenges and IP blocking. Regularly monitoring and analyzing website traffic, user behavior, and server logs can provide insights into bot activities and assist in refining security measures.
Resources:
- Tuning Web Application Firewall (WAF) for Azure Front Door | Microsoft Learn
- Configure bot protection for Azure Web Application Firewall (WAF) | Microsoft Learn
- Troubleshoot - Azure Web Application Firewall | Microsoft Learn
- Best practices for Web Application Firewall on Azure Application Gateway | Microsoft Learn
Published on:
Learn moreRelated posts
Fabric Mirroring for Azure Cosmos DB: Public Preview Refresh Now Live with New Features
We’re thrilled to announce the latest refresh of Fabric Mirroring for Azure Cosmos DB, now available with several powerful new features that e...
Power Platform – Use Azure Key Vault secrets with environment variables
We are announcing the ability to use Azure Key Vault secrets with environment variables in Power Platform. This feature will reach general ava...
Validating Azure Key Vault Access Securely in Fabric Notebooks
Working with sensitive data in Microsoft Fabric requires careful handling of secrets, especially when collaborating externally. In a recent cu...
Azure Developer CLI (azd) – May 2025
This post announces the May release of the Azure Developer CLI (`azd`). The post Azure Developer CLI (azd) – May 2025 appeared first on ...
Azure Cosmos DB with DiskANN Part 4: Stable Vector Search Recall with Streaming Data
Vector Search with Azure Cosmos DB In Part 1 and Part 2 of this series, we explored vector search with Azure Cosmos DB and best practices for...
General Availability for Data API in vCore-based Azure Cosmos DB for MongoDB
Title: General Availability for Data API in vCore-based Azure Cosmos DB for MongoDB We’re excited to announce the general availability of the ...
Efficiently and Elegantly Modeling Embeddings in Azure SQL and SQL Server
Storing and querying text embeddings in a database it might seem challenging, but with the right schema design, it’s not only possible, ...