How to exclude specific pages from being crawled by the site auditor

Click the settings button at the top right of your site auditor dashboard, and navigate to the "Crawl Settings". Click "edit" to open the crawl settings:

To exclude specific paths or URLs from being crawled by the site auditor, enter the paths to exclude in the "Excluded URLs / Paths" box in the site crawl settings screen. Once you've entered a path, click the "+" button to the right. Once you've added paths to exclude, click the "Save" button.

Paths are excluded using a contains exclusion criteria.

Example 1:

Let's say that the campaign URL is www.agencyanalytics.com, and the site hosts a blog at www.agencyanalytics.com/blog. Blog posts are in the format of:

www.agencyanalytics.com/blog/myblogpost1
www.agencyanalytics.com/blog/myblogpost2
 

...and so on.

If you wanted to exclude the entire blog and all blog posts from being crawled, you would enter "agencyanalytics.com/blog" in the exclusions box.

This will exclude anything related to that path.

Example 2: 

Let's say that the campaign URL is www.agencyanalytics.com/, and the site hosts listings for products across multiple categories.

Listings are in the format of:

www.agencyanalytics.com/usa/listing/entry1
www.agencyanalytics.com/canada/listing/entry2

...and so on.

If you wanted to exclude all listing pages from being crawled, you would enter "/listing/" in the exclusions box.

If you wanted to exclude all listings in Canada only, you would enter "/canada/listing" in the exclusion box.

Note: The exclusions box doesn't currently accept wild cards or regular expressions, but this functionality is on the roadmap and will likely be released at a future date.

Other Examples:

  • You can exclude subdomains by entering the full URL (for example toronto.agencyanalytics.com) in the exclusion box. This will exclude all pages on the "toronto" subdomain of agencyanalytics.com.
  • You can exclude query parameters (for example "agencyanalytics.com/something?page=1"). Since we exclude paths using a contains criteria, you can simply enter "agencyanalytics.com/something?page=" and we'll automatically exclude all pages that follow (e.g. "?page=1", "?page=2", etc.).

Removing path and URL exclusions

You can remove exclusion rules by heading to the Crawl Settings page and clicking the "Remove Icon" (pictured as a garbage can) next to the exclusion rule you want to remove.

If you're ever unsure of how to exclude pages or paths on your website, reach out to our support team.

What's Next:

Did this answer your question?