All Collections
SEO Tools
Site Audits
How To
Allow our site auditor via robots.txt and firewall
Allow our site auditor via robots.txt and firewall

Allow our site auditor to crawl your website, or to set specific paths to be crawled.

Matthew Davis avatar
Written by Matthew Davis
Updated over a week ago

Site Auditor is available as a paid add-on in accounts created before November 2023. For more information please reach out to our friendly support team.

In most cases, robots.txt files won't be blocking our site auditor, nor will our IP addresses be restricted. However, if you have rules set up in your robots.txt to block bots, or if there are any IP restrictions, you can use the information below to ensure our auditor can still run on your website.

We strongly recommend using an experienced web developer to make these changes, as the process will vary depending on the website and other factors.

Note: you can also stop our crawler from scanning portions of your website via the site auditor settings in our interface.

Use robots.txt to allow the site audit crawler

It's possible to specify URLs to ignore or allow just for our crawler. This is done with the user agent "Mozilla/5.0 (compatible; RSiteAuditor)" in your website's robots.txt file. 

When editing your robots.txt file, the following example will allow our site auditor to crawl your website:

User-agent: RSiteAuditor
Allow: /

Note that if you've tried the above and our auditor still won't crawl your website, your server may be blocking the audit. If you see a 4xx or 5xx status code for all pages in our site auditor, this is most likely the case. In this situation, please white list the user agent "Mozilla/5.0 (compatible; RSiteAuditor)" (without quotation marks) on your server. Note that this white listing is done on your server, not in your robots.txt file.

Use robots.txt to allow a specific path to be crawled

There might be situations where you won't need the site auditor to crawl your whole website, but rather a specific path. In this case, it's possible to specify which path you would like our site auditor to crawl using your robots.txt file.

The following example will allow our site auditor to crawl only the pages under

User-agent: RSiteAuditor
Disallow: /
Allow: /categories

If your robots.txt configuration is not working, please check your cache: It's possible that a different version of robots.txt is being delivered from your website/server cache, or via CDN cache (e.g. Cloudflare). Clearing this cache will usually solve this problem.

Whitelist our IP addresses for the site auditor

The following IP addresses can be whitelisted on your website or server firewall if you are having issues getting our site auditor to connect to your site:

Did this answer your question?