How To Stop Search Engines From Indexing Specific Pages And Posts In WordPress

Most of the website owners are happy to see their website to be indexed and Google should give results as soon as early possible.

Search engine spiders will crawl your whole website to analyse your content and store it in their database to give better results to users. when users requested for a particular query.

But there are some other situations you no need to showcase your website to others. Like when you just started off and make changes in content and try to make it so perfect day by day.

Until unless you feel its perfect and you would not want anyone see it that way. Hence you want to prevent search engines from indexing pages, folders or  entire site.

Inbuilt WordPress Feature Discourage search engines from indexing this site

     1. Login to WordPress Dashboard

     2. Navigate Setting

     3. Click on Reading

 

4. Make Enable the option “Discourage search engines from indexing this site”

 5. Click “Save Changes”

Now your entire WordPress website is stopped Search Engines from Indexing.

 

Stop Search Engines Crawling Using Robots.txt File

robots.txt file indicates search engines to what search engine spiders should index and what they should not index.

1. Login to your Cpanel or FTP to access file Manager

2. Access the files inside the Public html Folder

3. Locate robots.txt file, If not you need to create text file and save the file name as robots.txt

4. Enter the following code in the robots.txt file and save it.

User-agent: *
Disallow: /

The Above code will block search engine from indexing whole WordPress website. This will block all your pages to be indexed.

User-agent: * ”  means this section applies to all robots.
Disallow / ” tells the robot that it should not visit any pages on the site.

The Robot Exclusion Standard determines what search engine spiders should index and what they should not index. To do this, you need to create a new text file and save the file as robots.txt.

The concept behind the Robots.txt protocol is the same as the robots meta tag that I have discussed in length in this article. There are only a few basic rules.

User-agent – The search engine spider that the rule should be applied to
Disallow – The URL or directory that you want to block

Suppose if you want to block specific search engine to index the site you can do like below

User-agent: googlebot   (Block Google search engine to Index)

User-agent: bingbot      (Block Bing search engine to Index)

Most of the peoples generally use User-agent: * block all search engines

The Disallow rule, defines the URL or the directory you block. Therefore / would block search engines from indexing whole website and /gallery/ would block search engines from your WordPress Gallery section from WordPress

Here is a few examples to understand how to use robots.txt file to prevent search engines

To stop search engines from indexing your WordPress Comment section, you could use like this:

User-agent: *
Disallow: /*?comments=all

Block newsletter confirmation page from search engine, you could use something like this:

User-agent: *
Disallow: /email-subscription-confirmed/

To hide new year discount page, you could use something like this:

User-agent: *
Disallow: /new-year-discount/

The robots.txt file are case sensitive. Make sure use it correctly if the file name WordPress-Book.pdf and you mentioned as /wordpress-book.pdf/  in your robots.txt file.

Another rule that is available to you is Allow. This rules allows you to specify user agents are permitted. The code below will block all search engines, but it will allow Google Images to index the content inside your uploads folder in the WordPress. In general It will allow to Index the images from Google Image search Engine.

User-agent: *
Disallow: /

User-agent: Googlebot-Image
Allow: /wp-content/uploads/

The robots.txt also supports pattern matching, which is useful for blocking files that have similar names like conditions like file name starting with, or File name ending with. For just blocking a few pages you no need to learn there pattern matching.

If you want to learn more on robots.txt you can get help from robotstxt.orgGoogle and Bing You can see any website robots.txt and learn how they used on their web with url https://websitename.com/robots.txt some websites do not use Robots.txt, so you may get a 404 error.

Here are some noted website examples of Robots.txt files will help you to understand how to take control over search engines.

The robots.txt file is the one way of preventing your whole website, posts or pages from search engine to index. by entering https://yourdomainname/robots.txt you can verify your rules applied on robots.txt file.

About Robots Meta Tag

Robots meta tag (meta directives) are the code which gives instructions to crawlers how to index web page content. Whereas robots.txt file gives suggestion how to crawl website’s page. Robots meta tag gives more accurate as well as exact instructions on how to crawl and index web pages.

Google recommends to block URL’s using the robots meta tag.

 

There are two types of robots meta tags. one is part of HTML page. (meta robotstag) and another web server sends as HTTP headers  (x-robots-tag) These two robots meta tags do same as i.e, crawling or indexing instructions a meta tag provides“noindex” and “nofollow” can be used with both meta robots and the x-robots-tag.  Major Difference is how they communicated to crawlers.

 

<meta name=”value” content=”value”>

The robots meta tag should be placed within the <head> section, like <head> and </head> of your WordPress theme header. There are different values available for the name as well as content.  The values that Google instruct to block access to a page are robots and noindex:

 

<meta name=”robots” content=”noindex”>

The value robots refers to all search engines and the value noindex disallow search engines from showing web page in their results.

If you want to block web page from specific search engine you need to specify that search engine spider as value. Some general search engine spiders are:

    1. googlebot – Google

    2. googlebot-news – Google News

    3. googlebot-image – Google Images

    4. bingbot – Bing

    5. teoma – Ask

The two major MSNBot and Slurp not mentioned, Windows Live Search, and MSN Search both use the MSNBot to Indexing their search engine. These search engines were re-branded as Bing in 2009 and MSNBot was replaced by Bingbot in October 2010. MSNBot-Media is crawler for images and video for Bing. MSNBot still handles some of crawling for Bing Search engine. Slurp used to crawl pages for Yahoo!. later in 2009 Yahoo! stopped it and with the help of Bing show their results.

If you want to block your web page from Google Images, you could use something like this:

<meta name=”googlebot-image” content=”noindex”>

 

By specifying two bots with separated by commas can able to block multiple Search engine.

<meta name=”googlebot-news,googlebot-image” content=”noindex”>

 

Meta tags give instructions to bots about how to crawl and index information they find on the webpage. If these tags are discovered by bots, their content value serves as strong instructions for crawler index behaviour. There are many values that can be used with the content value. The content value is not case-sensitive, but do some search engines may consider these values slightly differently.

 

Crawling and Indexing controlling values

 

 1. index: Instruct search engine to index a web page and it’s a default meta tag;  you don’t have to add this to your pages

 2. noindex: Instruct search engine to not to index a web page.

 3. follow: Instruct search engines to follow the links on the web page, whether it can index it or not.

 4. nofollow: Instruct crawler to not follow any links on the page at all.

 5noimageindex: Disallow crawler not to index any images on that page. Of course, if images are linked to directly from elsewhere, search engine can still index them.

  6. none: This is a shortcut for noindex and nofollow, or Instruct search engines don’t do anything with this page.

  7. noarchive: Tells search engines should not show copy of this page.

  8. nocache: Same as noarchive, But only used by MSN/Live.

  9. nosnippet: Tells a search engine not to show a snippet (i.e. meta description) of this page in the search results and prevents them from caching the web page.

  10. noodyp: Prevents search engines from using the description for this page in DMOZ  as the snippet for your               page in the search results. However, DMOZ was retired in early 2017

   11. noydir: Prevents Yahoo.! from using description for this page  Yahoo! directory as the snippet for your page in the search results. This Yahoo.! directory none of other search engines use, so they don’t support the tag.

   12. unavailable_after: [RFC-850 date/time]: Disallow the page in search results after a date and time specified                in the RFC 850 format

Table shows which search engines support which values:

Robots value Google Yahoo! Bing Ask
index Yes Yes Yes Yes
noindex Yes Yes Yes Yes
none Yes Yes
follow Yes Yes
nofollow Yes Yes Yes Yes
noarchive Yes Yes Yes Yes
nosnippet Yes No No No
noodp Yes Yes Yes No
noydir None Yes None None

How To Install WordPress On Google Cloud Platform: Step by Step Guide

Like other cloud platforms, you can Install WordPress On Google Cloud Platform for high performance, scaling purpose of your website.

One thing makes you use Google Cloud Platform over the other platform is Google uses the same software and hardware infrastructure for their products like YouTube and Gmail

Google Cloud Platform has Compute Engine refers to them as machine types. Recently released Compute Engine machine types with up to 96 vCPUs and 624GB of memory.

The type of disks and storage used by a Google cloud play a significant role in performance, Google cloud storage referred as Block storage. Google Persistent Disk is durable and high-performance block storage for the Google Cloud Platform.

Latency is important when it comes to businesses that serve visitors in a specific geographical location.  For example, let’s say you have an e-commerce shop in Landon, and 90% of your customers are from England. Your business is going to greatly benefit from placing your site on a server in England, vs hosting it in the Asia or Middle east countries. Google Cloud Platform has dozens of different data centre locations to choose from around the globe to decrease the across the board and ensure faster load times

 

About Google Cloud Platform

Some of the world’s most innovative brands, including Spotify, Coca-Cola, Snapchat, HTC, Best Buy, Feedly, ShareThis, Domino’s, Sony Music, Airbus and Philips already rely on Google Cloud Platform.

Even Apple has been Signs up to Google Cloud Service and spending between $400 million and $600 million on Cloud Platform.

And most recently Evernote took decisions to migrate its 200 million users. Ben McCormack, VP Operations at Evernote stated on their blog:

 Evernote will gain significant improvements in performance, security, efficiency, and scalability. Moving   to the cloud also allows us to focus time and resources on the things that matter most.

 

Why you Choose Cloud Platform over the other?

You probably heard Cloud is the powerful alternative to traditional web hosting providers. It offers a verity of benefits.

Server uptime guaranteed: Performance of your website is directly dependent on server uptime, Cloud platform provides maximum network uptime and guarantees no single point of failure. In cloud by default all the systems are interconnected if, at any point, one server is unable to take your request then another server takes over the workload of the failed server.

Pay for the resources you used: Cloud platform offers pay as you use the resources. There is no fixed cost in the cloud platform.

Security: Cloud platform comes with various levels-network, application, data and physical security. Cloud service providers ensure data safety and encrypted solutions, firewalls and backup recoveries.

Scalability: Dynamic allocation of resources for emerging needs, if there is a spike in traffic on the festival day or sale day. No need to worry Cloud platform handle traffic efficiently with the needs.

Installing WordPress On Google Cloud Platform

Step 1: Login to Google Cloud Console Sign up for FREE TRAIL and set up your billing account (must before you get started) 

 

Step 2: Create a new project by entering the Project name

 

Step 3: Search for “Google Cloud Deployment Manager V2 API”

 

Step 4: “Enable” – Google Cloud Deployment Manager V2 API

Note: If you are not setting up your billing account it gives error to “Enable Google Cloud Deployment Manager V2 API”

 

Step 5: Search for WordPress and Select or Click on WordPress “Google Click to Deploy” 

 

Step 6: Click on “Launch on Compute Engine” and proceed.

Step 7: In WordPress deployment have to fill the necessary fields to complete the WordPress Installation Process

  • Deployment name: Give your WordPress deployment name here. I have mentioned as wpweb 
  • Zone: The zone determines what computing resources, as well as your server locations, Choose your nearest location of server to decrease the latency I have chosen us-central1-c 
  • Machine Type: I have Chosen 1vCPU 3.75 GB memory you can even Customize your machine type Learn more about machine types.
  • Administrator Email: Enter your admin email
  • Install phpMyAdmin: phpMyAdmin is necessary for WordPress, make sure this option is checked.
  • Disk type:  Choose SSD persistent disk over Standard persistent disk for better performance.
  • Disk size in GB: Select a disk size for your WordPress installation, 10 GB is more than enough.
  • Network name: Leave this field as it is.
  • Subnetwork name: Leave this field as it is.
  • Firewall: Check both the Allow HTTP traffic and Allow HTTPS 

Click on Deploy once you enter the complete details.

 

Step 8: This screen ensures you are successfully installed WordPress on Google cloud platform.

 

Step 9: Enter the ip address (https://104.154.220.252/) you will be redirected to a new WordPress site.

That’s all your WordPress Site Live.  Note down your WordPress credentials for WordPress Login & further use

Setup your domain name and DNS settings so that you can access the website from domain name rather than an IP address.  A Step by Step guide is here

 

Conclusion

You no need to depend on the traditional shared hosting providers. Google Cloud Platform provides Extensive performance, scalable and robust security for your website.

When you switch from shared hosting to cloud platform you can see the performance as well as speed. It will help in your SEO ranking. You can install SSL certificate for security purpose.