What is rel=canonical and Why Should I Use It?

Hinting to Search Engines the Prefered Version of a Document

Business man using computer in office,close up
sot/Digital Vision/Getty Images

When you run a data driven site or have other reasons why a document might be duplicated it’s important to tell search engines which copy is the master copy, or in the jargon, the “canonical” copy. When a search engine indexes your pages it can tell when content has been duplicated. Without additional information, the search engine will decide which page best meets the needs of their customers. This might be fine, but there are many instances of search engines delivering old and outdated pages because they chose the wrong document as canonical.

How to Specify the Canonical Page

It is very easy to tell search engines the canonical URL with meta data in the of your documents. Put the following HTML near the top of your HEAD element on every page that is not canonical:

<link rel="canonical" href="URL of the canonical page">

If you have access to the HTTP headers (such as with .htaccess or PHP) you can also set the canonical URL on files that don’t have an HTML HEAD, like a PDF. To do this, set the headers for non-canonical pages like this:

Link: <URL of the canonical page>; rel="canonical"

How the Canonical Tag Works and When it Doesn’t

The canonical meta data is used as a hint to search engines as to what page is the master. Search engines use this to update their index to reference the master copy as the primary copy, and when they deliver search results they deliver the page they believe is canonical.

But the canonical page that you specify may not be the page that search engines deliver.

There are many reasons why this might happen:

  • If the URL you specify is 404 not found, search engines will try to find the second most relevant URL to deliver
  • If the search engine believes your site has been hacked to add a fake canonical URL they won’t use it (of course, you’ll have bigger problems in that case)
  • If you place the link in the tag, or there is some reason to believe that the HEAD tag wasn’t closed. This is because many websites allow users to edit the content on the page (inside the BODY element), and as suck a canonical reference found there would be untrustworthy as well.

What the Rel=Canonical Tag Isn’t

Many people believe that if you add the rel=canonical link to a page then that page will be redirected to the canonical version, such as with a HTTP 301 redirect. That is not true. The rel=canonical link provides information to search engines, but it does not affect how the page is displayed nor does it do any redirection at the server level.

The canonical link is, ultimately, just a hint. Search engines don’t have to honor it. Most search engines try hard to respect the wishes of page owners, but at the end of the day, the search results are what they do, and if they don’t want to serve your canonical page, they won’t.

When to Use the Canonical Link

As I said above, you should use the link on every duplicate page that is not canonical. If you have pages that are similar, but not identical, it sometimes makes more sense to change one of them to be more different, than to make one canonical.

It is okay to mark two pages that are not absolutely identical as canonical. They should be similar, but you should never simply point all pages to your home page. Canonical means that the page is the master copy of that document, not any sort of master link on your site.

I think it’s important to repeat that last bit — you should never point all your pages to your home page as the canonical page no matter how tempted you are to do so. Doing this, even by accident, can cause every page that isn’t canonical (i.e. every page that isn’t your home page and has the rel=canonical link on it) to be removed from search engine indexes. This isn’t Google (or Bing or Yahoo! or any other search engine) being malicious. They are doing what you asked them to do — considering every page a duplicate of your home page and returning all results to that page.

Then as customers get frustrated ending up on your home page instead of a more relevant document, that page will be less popular and will drop in search results. Even if you fix the problem, you can kill your search results for months afterwards and there is no guarantee that your site rankings will recover.

You should not make a page canonical that has been excluded from search for some reason (such as with the noindex meta tag or excluded by the robots.txt file). In order for a search engine to reference a page as canonical, it must be able to reference it in the first place.

Good places to use the rel=canonical link include:

  • Sites with dynamic URLs — You can use it to define which URL format you prefer
  • Ecommerce sites, especiall on product lists — When your customers change the sorting criteria, that new URL doesn’t need to be indexed
  • Syndicated content — publishers using the content you wrote should include the rel=canonical link on their pages pointing to your original document

When Not to Use the Canonical Link

Your first choice should be a 301 redirect. This not only tells the search engine that the page URL has changed, but it also takes people to the most up-to-date (and dare I say, canonicol?) version of the page.

Don’t be lazy. If you’re changing your URL structure, then use some form of HTTP header manipulation (such as .htaccess or PHP or another script) to add the 301 redirects automatically. While you can use the rel=canonical link, that doesn’t take the older pages down. And so anyone can get to them at any time. In fact, if a customer has a page bookmarked and you change the URL but only update the search engines using a rel=canonical link, that customer will never see the new page.

The rel=canonical link is a useful tool for sites with a lot of duplicate content. By understanding how it works, you can use it effectively. But ultimately, it is a tool that was released by search engines to help them keep their search indexes up-to-date. If you don’t keep your servers clean and up-to-date as well, your customers will be impacted and your site could be hurt.

Use it responsibly.