Site Auditor is available as a paid add-on in accounts created before November 2023. For more information please reach out to our friendly support team.

In most cases, robots.txt files won't be blocking our site auditor, nor will our IP addresses be restricted. However, if you have rules set up in your robots.txt to block bots, or if there are any IP restrictions, you can use the information below to ensure our auditor can still run on your website.

We strongly recommend using an experienced web developer to make these changes, as the process will vary depending on the website and other factors.

Note: you can also stop our crawler from scanning portions of your website via the site auditor settings in our interface.

Use robots.txt to allow the site audit crawler

It's possible to specify URLs to ignore or allow just for our crawler. This is done with the user agent "Mozilla/5.0 (compatible; RSiteAuditor)" in your website's robots.txt file.

When editing your robots.txt file, the following example will allow our site auditor to crawl your website:

User-agent: RSiteAuditor
Allow: /

Note that if you've tried the above and our auditor still won't crawl your website, your server may be blocking the audit. If you see a 4xx or 5xx status code for all pages in our site auditor, this is most likely the case. In this situation, please white list the user agent "Mozilla/5.0 (compatible; RSiteAuditor)" (without quotation marks) on your server. Note that this white listing is done on your server, not in your robots.txt file.

Use robots.txt to allow a specific path to be crawled

There might be situations where you won't need the site auditor to crawl your whole website, but rather a specific path. In this case, it's possible to specify which path you would like our site auditor to crawl using your robots.txt file.

The following example will allow our site auditor to crawl only the pages under example.com/categories:

User-agent: RSiteAuditor
Disallow: /
Allow: /categories

If your robots.txt configuration is not working, please check your cache: It's possible that a different version of robots.txt is being delivered from your website/server cache, or via CDN cache (e.g. Cloudflare). Clearing this cache will usually solve this problem.

Whitelist our IP addresses for the site auditor

If you are having trouble getting our site auditor to connect to your site, the following IP addresses can be whitelisted on your website or server firewall.

IPv4

94.130.93.30

168.119.141.170

168.119.99.190

168.119.99.191

168.119.99.192

168.119.99.193

168.119.99.194

68.183.60.34

134.209.42.109

68.183.60.80

68.183.54.131

68.183.49.222

68.183.149.30

68.183.157.22

68.183.149.129

IPv6

2a01:4f8:c17:f386::1/128

2a01:4f8:c17:f387::1/128

2a01:4f8:c17:f38a::1/128

2a01:4f8:c17:f394::1/128

2a01:4f8:c17:f395::1/128

2a01:4f8:251:5d3::2/128

2604:a880:800:10::eb:9001/128

2604:a880:800:10::596:4001/128

2604:a880:800:10::e9:1001/128

2604:a880:800:10::65b:f001/128

2604:a880:800:10::695:7001/128

2604:a880:800:10::6da:6001/128

2604:a880:800:10::6ee:8001/128

2604:a880:800:10::6f7:3001/128

Exclude specific pages from being crawled

Verify "Sitemap.xml not Found" reported by our site auditor

Why can't the site auditor find the Robots.txt file on my website?

Site Auditor Overview

Customize the Site Auditor Crawl Settings