In most cases, robots.txt files won't be blocking our site auditor. However, if you have rules set up in your robots.txt to block bots, you can use the information below to ensure our auditor can still run on your website.
Note: you can also stop our crawler from scanning portions of your website via the site auditor settings in our interface.
Use robots.txt to allow the site audit crawler
It's possible to specify URLs to ignore or allow just for our crawler. This is done with the user agent "aabot" in your website's robots.txt file.
When editing your robots.txt file, the following example will allow our site auditor to crawl your website:
Note that if you've tried the above and our auditor still won't crawl your website, your server may be blocking the audit. If you see a 4xx or 5xx status code for all pages in our site auditor, this is most likely the case. In this situation, please white list the user agent "Mozilla/5.0 (compatible; aa/1.0)" (without quotation marks) on your server. Note that this white listing is done on your server, not in your robots.txt file.
Use robots.txt to allow a specific path to be crawled
There might be situations where you won't need the site auditor to crawl your whole website, but rather a specific path. In this case, it's possible to specify which path you would like our site auditor to crawl using your robots.txt file.
The following example will allow our site auditor to crawl only the pages under example.com/categories:
If your robots.txt configuration is not working, please check your cache: It's possible that a different version of robots.txt is being delivered from your website/server cache, or via CDN cache (e.g. Cloudflare). Clearing this cache will usually solve this problem.