Site Auditor is available as a paid add-on in accounts created before November 2023. For more information please reach out to our friendly support team.
This article will show you how to block our site auditor from scanning parts of your website. Note that you can also use robots.txt to create rules that allow or block our site auditor.
How to exclude specific pages from being crawled by the site auditor
Click the settings button at the top right of your site auditor dashboard, and navigate to the "Settings". Click the pencil icon to open the crawl settings:
Then click "Edit".
To exclude specific paths or URLs from being crawled by the site auditor, enter the paths to exclude in the "Excluded URLs / Paths" box in the site crawl settings screen. Once you've entered a path, click the "+" button to the right. Once you've added paths to exclude, click the "Save" button at the bottom.
Paths are excluded using a contains exclusion criteria.
Example 1:
Let's say that the campaign URL is www.agencyanalytics.com, and the site hosts a blog at www.agencyanalytics.com/blog. Blog posts are in the format of:
...and so on.
If you wanted to exclude the entire blog and all blog posts from being crawled, you would enter "agencyanalytics.com/blog" in the exclusions box.
This will exclude anything related to that path.
Example 2:
Let's say that the campaign URL is www.agencyanalytics.com/, and the site hosts listings for products across multiple categories.
Listings are in the format of:
...and so on.
If you wanted to exclude all listing pages from being crawled, you would enter "/listing/" in the exclusions box.
If you wanted to exclude all listings in Canada only, you would enter "/canada/listing" in the exclusion box.
Note: The exclusions box doesn't currently accept wild cards or regular expressions, but this functionality is on the roadmap and will likely be released at a future date.
Other Examples:
You can exclude subdomains by entering the full URL (for example toronto.agencyanalytics.com) in the exclusion box. This will exclude all pages on the "toronto" subdomain of agencyanalytics.com.
You can exclude query parameters (for example "agencyanalytics.com/something?page=1"). Since we exclude paths using a contains criteria, you can simply enter "agencyanalytics.com/something?page=" and we'll automatically exclude all pages that follow (e.g. "?page=1", "?page=2", etc.).
Removing path and URL exclusions
You can remove exclusion rules by heading to the Crawl Settings page and clicking the "Remove Icon" (pictured as a garbage can) next to the exclusion rule you want to remove.
If you're ever unsure of how to exclude pages or paths on your website, reach out to our support team.