How to Interpret "reason codes" for lost or deleted links

You've worked hard to build links for your clients. Now, make sure that those links stay in place!

Our Majestic integration includes an entire section devoted to letting you know whenever a link is lost. A link can disappear for a variety of reasons. Sometimes a link is removed manually by a site owner. Other times, a link is removed due to a mistake or a technical glitch.

Knowing the reason behind a link's disappearance is the first step to getting it back! Our Backlinks module's "Lost Links" section provides a "Reason Lost" code for each link that's gone missing.

This article explores each of those possible "Reasons Lost" codes. Note that our Backlinks data is pulled from Majestic.com and, as a result, this article has been adapted from an original blog post from Majestic.

Reason Codes for Deleted or Lost links

Link Removed

This is the most common reason for a link to be ‘lost’. This code typically results from one of three scenarios.

1)The link has simply been manually removed by the site owner or webmaster.

2)The link was located in an RSS feed, and has been pushed off the page.

RSS feeds are constantly finding new content to ‘repost’ on their feed. Therefore your link is not static and it will move further down the feed – eventually falling off the page altogether. If this happens, when our bot revisited the original URL your link is nowhere to be found. Your link is now on a new URL so of course our bots will report a missing link.

Unfortunately, it is hard for any bot to know the different between a ‘pushed’ link and a ‘removed’ link. We would advise looking for clues. Can you see the word ‘feed’ in the URL? If so, that's pretty much a guarantee that the link has been pushed to the next page. In these cases, you can also check the "New links" tab to see if you recently received a fresh link from the same domain.

3) Links resulting from advertisements.

If you're running ads in an ad network, the ad network typically rotates between your ads, and the ads for competing sites. Our bots may crawl the site one time and see your ad and your link. But then the next time the site is crawled, a different ad is being displayed and, as a result it will look like your link has been lost.

Redirect Canonical

Canonical tags allow you to point Search Engines to the page that you want them to crawl instead of its’ duplicate – most often this is to distinguish between www vs non www or http vs https.

When a bot crawls a site without a canonical tag in place, it'll see these separate versions of the same page (www vs non-www for example), as two different pages. If your link is on both pages, it'll be crawled and counted twice.

If a rel=canonical tag is then added, the bot will then see the two pages as one. As a result, one of those duplicate links will be seen as being "lost".

HTTP 301 – Permanent Redirect

This redirect is used when a website owner wants to move an entire page to a new location – forever. This type of redirect does pass link juice in the view of Search Engines, more so than other redirect options. Seeing this in your Lost tab most likely means that you will have a new link coupled with this change on a new URL. It may take a few days for us to crawl and discover the new link.

HTTP 302/ 307 – Temporary Redirect

Webmasters often put this type of redirect in place while the site is undergoing maintenance, redesign or when there is a minor technical issue that should be quickly resolved. This type of redirect passes very little link juice as it is meant to just be a temporary change. If you spot this within your lost links, it is likely that the link will be back soon.

HTTP 403 – Forbidden

This would usually indicate that our bot has managed to reach that particular URL, and they have responded but are refusing to let us in! Most often, this occurs when a site's webhost is using security software or a default configuration that is not allowing our bot to pass. The most common resolution is to shoot a quick email to the site owner, and have them request that their web host whitelist the MJ12 bot. If they're unable or unwilling to whitelist the bot, then unfortunately we will not be able to crawl the link.

HTTP 404 – Page Not Found

Quite simply, the page has been removed. It’s nothing personal to your business or your link, the webmaster has just decided to remove the content. If you spot this with your Lost links, you may want to try and get a link from a different page on the same domain. Just because the page is gone, doesn’t mean that they're not still willing to link to you!

HTTP 406 – Not Acceptable

The http 406 response is sent when a server cannot fulfill the request made. Our bot is not able to crawl images or movies etc, so to reduce bandwidth and load time we include a marker with our request indicating that a reply should only be sent if the content is text/html/xml. If the server does not believe this is the case then it should send a 406 response. This is the most common scenario. This code can also be generated when the mime types on a server are misconfigured.

HTTP 500 – Internal Server Error

Seeing this error means that our bot encountered an unexpected issue when trying to reach that page or domain. This response from the web server does not really specify what the problem actually is, which certainly makes things harder to resolve. The operators of the site hosting the link will need to analyze their logs in order to determine the exact cause of the issue.

Timeout

This happens when the requested URL takes too long to respond to the request our bot sends. If a page or domain is taking too long to load there is most likely some sort of technical fault, and we'll be unable to crawl the link until the site is able to load in a timely manner.

Connect Failure

Similar to Timeout, this response means that there is a technical fault. This particular failure indicates some sort of infrastructure issue between server and website.

Domain Name Resolution Failure

This shows that there is an issue with the DNS server of the source domain, often due to a change in IP Address. After making a change like this it usually takes a little time for things to propagate across the Internet. This code can also be generated if the DNS server is simply down or offline. If it is down, then we will not be able to crawl the site.