Wednesday, 9 October 2013

How to stop Google from indexing your pages?

This may be a confusing title but there are requirements when you may want Google not to index particular page (like pdf's, word docs etc.). Check out the options available.

  1. Robots.txt:  This is the best way to make Google not to index your pages. SEO experts might be using it too. The usual code that is seen on robots.txt file is:
    User-agent: *
    the above syntax allows to index the website by every search engine's robots. Now if you put this code as:
    User-agent: *
    Disallow: /

    Now the above code will stop every robot to index the website. Read more about robots.txt on robotstxt.org.
    Here we want to disallow only a particular web page or particular file types like pdf files.
    Use syntax,
    User-agent: *
    Disallow: /pdf/  

    or
    User-agent: *
    Disallow: *.pdf
  2. The second way is to use meta tag:
    <a href="something.pdf" rel="nofollow">Download PDF</a>
    This way Google understands that this pdf is not be scrolled. The "nofollow" attribute makes Google understand "not to follow" the link. Also if you add, "noindex" attribute then Google will not index the web page.
     If you want to be more precise then use following code When your page is about to expire on stated date:
     <META NAME="GOOGLEBOT" CONTENT="unavailable_after: 28-November-2013 12:00:00 EST">
  3. The third way is to use X-robot meta tag, whose syntax is like:
    x-robots-tag: noindex
    Put this code in .htaccess file:
    <FilesMatch "\.pdf$"> header set x-robots-tag: noindex </FilesMatch>
    x-robot tag can also be used to notify Google that a particular page is about to expire; use the syntax:
    X-Robots-Tag: unavailable_after: 22 Dec 20014 17:00:00 PST

What did we learn?
  1. While the above codes help in not to index particular page they also act as warning to use these codes as caution. Put the codes where they are require. If you website is not showing enough pages on Google search, cross check robots.txt file and the meta tags. It may happen that these codes were put accidentally on wrong web page.
  2. If you want to your website to get scrolled quickly don't forget to add sitemap in robots.txt file. This will increase the chances of the website to get indexed faster.
  3. Use Google webmaster to check the proper working of robots.txt file.
  4. The codes that are to be put in are highlighted in yellow.

No comments:

Post a Comment