Discourage Search Engines from Indexing This Site: What It Means

Discouraging search engines from indexing a website means that the website will not appear in search engine results. This is done by adding a line of code to the website’s robots.txt file that tells search engine crawlers not to index the website. This can be useful for websites that are not meant to be publicly accessible, such as staging sites or development sites. It can also be used to prevent search engines from indexing pages that are not ready to be seen by the public.
[ad_1]

If you’ve poked around the WordPress backend, you may have noticed a setting that says “Discourage search engines from indexing this site” and wondered what it meant.

Or maybe you’re looking for a way to hide your site from unwanted visitors and wondering if this one little checkbox is enough to keep your content safely private.

What does this option mean? What exactly does it do to your site? And why should you avoid relying on it — even if you’re trying to hide your content?

Here are the answers and a few other methods to deindex your site and block access to certain pages.

Check Out Our Video Guide to Using the ‘Discourage Search Engines from Indexing This Site’ Option in WordPress

What Does “Discourage Search Engines from Indexing This Site” Mean?

Have you ever wondered how search engines index your site and gauge your SEO? They do it with an automated program called a spider, also known as a robot or crawler. Spiders “crawl” the web, visiting websites and logging all your content.

Google uses them to decide how to rank and place your website in the search results, grab blurbs from your articles for the search results page, and pull your images into Google Images.

When you tick “Discourage search engines from indexing this site,” WordPress modifies your robots.txt file (a file that gives instructions to spiders on how to crawl your site). It can also add a meta tag to your site’s header that tells Google and other search engines not to index any content on your entire site.

The key word here is “discourage”: Search engines have no obligation to honor this request, especially search engines that don’t use the standard robots.txt syntax that Google does.

Web crawlers will still be able to find your site, but properly configured crawlers will read your robots.txt and leave without indexing the content or showing it in their search results.

In the past, this option in WordPress didn’t stop Google from showing your website in the search results, just from indexing its content. You could still see your pages appear in search results with an error like “No information is available for this page” or “A description for this result is not available because of the site’s robots.txt.”

While Google wasn’t indexing the page, they didn’t hide the page entirely either. This anomaly led to people being able to visit pages they weren’t meant to see. Thanks to WordPress 5.3, it now works properly, blocking both indexing and listing of the site.

You can imagine how this would destroy your SEO if you enabled it by accident. It’s critical only to use this option if you really don’t want anyone to see your content — and even then, it may not be the only measure you want to take.

Why You Might Not Want to Index Your Site

Websites are made to be seen by people. You want users to read your articles, buy your products, consume your content — why would you intentionally try to block search engines?

There are a few reasons why you may want to hide part or all of your site.

  • Your site is in development and not ready to be seen by the public.
  • You’re using WordPress as a content management system but want to keep said content private.
  • You’re trying to hide sensitive information.
  • You want your site accessible only to a small number of people with a link or through invites only, not through public search pages.
  • You want to put some content behind a paywall or other gate, such as newsletter-exclusive articles.
  • You want to cut off traffic to old, outdated articles.
  • You want to prevent getting SEO penalties on test pages or duplicate content.

There are better solutions for some of these — using a proper offline development server, setting your articles to private, or putting them behind a password — but there are legitimate reasons why you may want to deindex part or all of your site.

How to Check if Your Site Is Discouraging Search Engines

While you may have legitimate reasons to deindex your site, it can be a horrible shock to learn that you’ve turned this setting on without meaning to or left it on by accident. If you’re getting zero traffic and suspect your site isn’t being indexed, here’s how to confirm it.

One straightforward way is to check the At a Glance box located on the home screen of your admin dashboard. Just log into your backend and check the box. If you see “Search Engines Discouraged,” then you know you’ve activated that setting.

A screenshot of the
“At a Glance” in the WordPress dashboard.

An even more reliable way is to check your robots.txt. You can easily verify this in the browser without even logging into your site.

To check robots.txt, all you need to do is add /robots.txt to the end of your site URL. For instance: https://kinsta.com/robots.txt

If you see Disallow: / then your entire site is being blocked from indexing.

“Disallow” in robots.txt.

If you see Disallow: followed by a URL path, like Disallow: /wp-admin/, it means that any URL with the /wp-admin/ path is being blocked. This structure is normal for some pages, but if, for instance, it’s blocking /blog/ which has pages you want to be indexing, it could cause problems!

Now that WordPress uses meta tags rather than robots.txt to deindex your site, you should also check your header for modifications.

Log in to your backend and go to Appearance > Theme Editor. Find Theme Header (header.php) and look for the following code:

<meta name="robots" content="noindex,nofollow" />
The
noindex, nofollow in header.php.

You can also check functions.php for the noindex tag, as it’s possible to remotely insert code into the header through this file.

If you find this code in your theme files, then your site is not being indexed by Google. But rather than removing it manually, let’s try to turn off the original setting first.

How to Discourage Search Engine Indexing in WordPress

If you want to skip the extra steps and go straight to the original setting, here’s how to activate or deactivate the “Discourage search engines” option in WordPress.

Log in to your WordPress dashboard and navigate to Settings > Reading. Look for the Search Engine Visibility option with a checkbox labeled “Discourage search engines from indexing this site.”

Screenshot showing the search engine visibility checkbox in the
Search engine visibility checkbox.

If you find that this is already on and want your site to be indexed, then uncheck it. If you’re going to prevent your site from being indexed, check it (and jot down a note somewhere reminding you to turn it off later!).

Now click Save Changes, and you’re good to go. It may take some time for your site to be reindexed or for it to be pulled from the search results.

If your site is still deindexed, you can also remove the noindex code from your header file, or manually edit robots.txt to remove the “Disallow” flag.

So that’s simple enough, but what are some reasons why you should avoid this option, or at least not rely entirely on it?

Disadvantages of Using the Discourage Search Engines Option

It seems simple — tick a checkbox and no one will be able to see your site. Isn’t that good enough? Why should you avoid using this option on its own?

When you turn on this setting or any option like it, all it does is add a tag to your header or your robots.txt. As shown by older versions of WordPress still allowing your site to be listed in search results, a small glitch or other error can result in people seeing your supposedly hidden pages.

In addition, it’s entirely up to search engines to honor the request not to crawl your site. Major search engines like Google and Bing usually will, but not all search engines use the same robots.txt syntax, and not all spiders crawling the web are sent out by search engines.

For instance, one service that makes use of web crawlers is the Wayback Machine. And if your content is indexed by such a service, it’s on the web forever.

Screenshot of Wayback Machine showing results for Kinsta.com
Wayback Machine.

You may think just because your brand new site has no links to it that it’s safe from spiders, but that isn’t true. Existing on a shared server, sending an email with a link to your website, or even visiting your site in a browser (especially Chrome) may open your site up to being crawled.

If you want to hide content, it’s just not a good idea to add a parameter and hope it will do the trick.

And let’s be clear, if the content you’re deindexing is of a sensitive or personal nature, you should absolutely not rely on robots.txt or a meta tag to hide it.

Last but not least, this option will entirely hide your site from search engines, while many times you only want to deindex certain pages.

So what should you be doing instead of or alongside this method?

Other Ways to Prevent Search Engine Indexing

While the option provided by WordPress will usually do its job, for certain situations, it’s often better to employ other methods of hiding content. Even Google itself says don’t use robots.txt to hide pages.

As long as your site has a domain name and is on a public-facing server, there’s no way to guarantee your content won’t be seen or indexed by crawlers unless you delete it or hide it behind a password or login requirement.

That said, what are better ways to hide your site or certain pages on it?

Block Search Engines with .htaccess

While its implementation is functionally the same as simply using the “Discourage search engines” option, you may wish to manually use .htaccess to block indexing of your site.

You’ll need to use an FTP/SFTP program to access your site and open the .htaccess file, usually located in the root folder (the first folder you see when you open your site) or in public_html. Add this code to the file and save:

Header set X-Robots-Tag "noindex, nofollow"

Note: This method only works for Apache servers. NGINX servers, such as those running on Kinsta, will need to add this code to the .conf file instead, which can be found in /etc/nginx/ (you can find an example of meta tag implementation here):

add_header X-Robots-Tag "noindex, nofollow";

Password Protect Sensitive Pages

If there are certain articles or pages you don’t want search engines to index, the best way to hide them is to password protect your site. That way, only you and the users you want will be able to see that content.

Luckily, this functionality is built into WordPress, so there’s no need to install any plugins. Just go to Posts Pages and click on the one you want to hide. Edit your page and look for the Status and Visibility > Visibility menu on the right-hand side.

If you’re not using Gutenberg, the process is similar. You can find the same menu in the Publish box.

Change the Visibility to Password Protected and enter a password, then save — and your content is now hidden from the general public.

Screenshot showing how to set a WordPress post to Password Protected via the
Setting a post to Password Protected.

What if you want to password protect your entire site? It’s not practical to require a password for every single page.

Kinsta users are in luck: You can enable password protection in Sites > Tools, requiring both a username and password.

Otherwise, you can use a content restriction plugin (e.g. Password Protected). Please install and activate it, then head to Settings > Password Protected and enable Password Protected Status. This gives you finer control, even allowing you to whitelist certain IP addresses.

A screenshot of
Setting a post to Password Protected.

Install a WordPress Plugin

When WordPress’ default functionality isn’t enough, a good plugin can often solve your problems.

For instance, if you want to deindex specific pages rather than your entire site, Yoast has this option available.

In Yoast SEO, you can open up a page you want to hide and look for the option under the Advanced tab: Allow search engines to show this Post in search results? Change it to No and the page will be hidden.

Yoast SEO settings showing
Yoast SEO settings

You should note that both of these rely on the same methods as WordPress’ default option to discourage search engine indexing, and are subject to the same flaws. Some search engines may not honor your request. You’ll need to employ other methods if you really want to hide this content completely.

Another solution is to paywall your content or hide it behind a required login. The Simple Membership or Ultimate Member plugins can help you set up free or paid membership content.

Simple Membership logo
Simple Membership plugin.

Use a Staging Site for Testing

When working on test projects or in-progress websites, your best bet on keeping them hidden is to use a staging or development site. These websites are private, often hosted on a local machine that no one but you and others you’ve allowed can access.

Many web hosts will provide you with easy-to-deploy staging sites and allow you to push them to your public server when you’re ready. Kinsta offers a one-click WordPress staging site for all plans.

You can access your staging sites in MyKinsta by going to Sites > Info and clicking the Change environment dropdown. Click the Staging environment and then the Create a staging environment button. In a few minutes, your development server will be up and ready for testing.

If you don’t have access to an easy way to create a staging site, the WP STAGING plugin can help you duplicate your install and move it into a folder for easy access.

Use Google Search Console to Temporarily Hide Websites

Google Search Console is a service that allows you to claim ownership of your websites. With this comes the ability to block Google from indexing certain pages temporarily.

This method has a couple of problems: It’s Google-exclusive (so sites like Bing will not be affected) and it only lasts 6 months.

But if you want a quick and easy way to get your content out of Google search results temporarily, this is the way to do it.

If you haven’t already, you’ll need to add your site to Google Search Console. With that done, open Removals and select Temporary Removals > New Request. Then click Remove this URL only and link the page you want to hide.

This is an even more reliable way to block content, but again, it works exclusively for Google and only lasts 6 months.

Summary

There are many reasons why you may want to hide content on your site, but relying on the “Discourage search engines from indexing this site” option is not the best way to make sure such content isn’t seen.

Unless you want to hide your entire website from the web, you should never click this option, as it can do huge damage to your SEO if it’s accidentally toggled.

And even if you do want to hide your site, this default option is an unreliable method. It should be paired with password protection or other blocking, especially if you’re dealing with sensitive content.

Do you use any other methods to hide your site or parts of it? Let us know in the comments section.

[ad_2]

Source link

Jaspreet Singh Ghuman

Jaspreet Singh Ghuman

Jassweb.com/

Passionate Professional Blogger, Freelancer, WordPress Enthusiast, Digital Marketer, Web Developer, Server Operator, Networking Expert. Empowering online presence with diverse skills.

jassweb logo

Jassweb always keeps its services up-to-date with the latest trends in the market, providing its customers all over the world with high-end and easily extensible internet, intranet, and extranet products.

GSTIN is 03EGRPS4248R1ZD.

Contact
Jassweb, Rai Chak, Punjab, India. 143518
Item added to cart.
0 items - 0.00