What’s Robots.txt for WordPress websites and potential issues

The robots.txt file is a simple text file that lives in your website’s root directory and tells search engine crawlers which parts of your site they can or cannot access. It’s like a set of instructions for bots. If your site doesn’t have a physical robots.txt file, WordPress generates a virtual one and this is very bad because it can, by default, index all the posts, images, and videos.

Why is it important?

  • Controls indexing – Helps search engines like Google ignore certain pages or directories.
  • Protects sensitive areas – Prevents bots from crawling admin areas (/wp-admin/).
  • Manages crawl budget – Prevents unnecessary crawling of files that don’t need to be indexed. For examples, images or videos.
  • Handles bot behavior – Allows different rules for specific search engine bots (Googlebot, Bingbot, etc.).

Example of a Basic robots.txt File:

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
  • User-agent: * → Applies rules to all search bots.
  • Disallow: → Blocks bots from accessing /wp-admin/.
  • Allow: → Makes an exception for /wp-admin/admin-ajax.php.

How to check your robots.txt file:

Open a web browser and go to: yourwebsite.com/robots.txt If the file exists, you’ll see its contents. Sometimes, it exists but it’s automatically created. So, when you try to find it in the hosting folder, you can’t find it and should create a robots.txt file instead (instruction below).

Ways to Locate or Create robots.txt:

  1. File Manager or FTP Access
    • Log in to your hosting provider’s File Manager or use an FTP client (e.g., FileZilla).
    • Navigate to the root directory (public_html or www).
    • Look for robots.txt.
  2. WordPress Plugin (If Auto-Generated)
    If your site doesn’t have a physical robots.txt file, WordPress generates a virtual one.
    • Install and check Yoast SEO or Rank Math SEO plugins, which allow you to edit robots.txt from your WordPress dashboard.
    • In Yoast SEO, go to:
      SEO → Tools → File Editor (Create or Edit robots.txt).

Creating robots.txt (If Missing):

  • If your site doesn’t have a robots.txt file, you can create one manually:
    1. Open a text editor like Notepad.
    2. Add rules, for example: User-agent: * Disallow: /wp-admin/
    3. Save it as robots.txt.
    4. Upload the file to your site’s root directory via FTP or File Manager.

If you’re hosting your WordPress site on Hostinger, you can find or edit the robots.txt file using these methods:

1. Check via Browser

Go to:

yourwebsite.com/robots.txt

If the file exists, you’ll see its contents.

2. Use Hostinger’s File Manager

  1. Log in to your Hostinger hPanel.
  2. Navigate to Files → File Manager.
  3. Open the public_html directory.
  4. Look for robots.txt. If it’s missing, you can create one.

3. Edit via WordPress SEO Plugin

If your site doesn’t have a physical robots.txt file, WordPress may generate a virtual one.

  • Install Yoast SEO or Rank Math SEO.
  • In Yoast SEO, go to: SEO → Tools → File Editor (Create or Edit robots.txt).

4. Create a robots.txt File (If Missing)

  1. Open a text editor like Notepad.
  2. Add rules, for example: User-agent: * Disallow: /wp-admin/
  3. Save it as robots.txt.
  4. Upload it to public_html via Hostinger’s File Manager or FTP.

Error: Multiple ‘User-agent: *’ rules found in robot.txt for websites

It sounds like the robots.txt file for the website contains multiple User-agent: * directives, which could lead to confusion in how search engines and other bots interpret the rules. This file is meant to guide web crawlers on which parts of a site they can access, and multiple entries might cause conflicting instructions.

If you’re managing this file, you might want to:

  • Ensure there’s only one User-agent: * section to avoid ambiguity.
  • Consolidate all rules under a single User-agent: * directive.
  • If different bot-specific rules are needed, use distinct User-agent entries for each.

For example, this is a part of a duplicate code

User-agent: *
Disallow: /wp-content/uploads/wc-logs/
Disallow: /wp-content/uploads/woocommerce_transient_files/
Disallow: /wp-content/uploads/woocommerce_uploads/
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

# START YOAST BLOCK
# ---------------------------
User-agent: *
Disallow:

Instead of having two separate User-agent: * blocks, merge them into one:

User-agent: *
Disallow: /wp-content/uploads/wc-logs/
Disallow: /wp-content/uploads/woocommerce_transient_files/
Disallow: /wp-content/uploads/woocommerce_uploads/
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

Why?

  • The Yoast block (User-agent: * followed by Disallow:) is not doing anything because it’s empty.
  • Keeping multiple User-agent: * blocks can confuse search engine crawlers, causing them to interpret rules incorrectly.
  • By combining all directives into one, crawlers will process the file more efficiently.

How to use disallow in robot.txt

The Disallow directive in a robots.txt file is used to prevent search engine crawlers from accessing specific parts of your website. Here’s how you can use it:

  1. Block all bots from a specific directory:
    User-agent: *
    Disallow: /private-folder/
    This prevents all crawlers from accessing /private-folder/.
  2. Block a specific bot:
    User-agent: Googlebot
    Disallow: /sensitive-data/
    This prevents only Googlebot from crawling /sensitive-data/.
  3. Block all bots from the entire site:
    User-agent: *
    Disallow: /
    This prevents all crawlers from indexing any part of the site.
  4. Allow all bots to crawl everything:
    User-agent: *
    Disallow:
    This means no restrictions—bots can crawl everything.

how to use the symbols *, $, and #

In a robots.txt file, the symbols *, $, and # have specific meanings:

  1. Asterisk (*): Acts as a wildcard, meaning “any sequence of characters.”
    • Example: User-agent: * Disallow: /private/* This blocks all bots (*) from accessing any URL that starts with /private/.
  2. Dollar Sign ($): Indicates the end of a URL path.
    • Example: User-agent: * Disallow: /example$ This prevents bots from crawling URLs that end exactly with /example, but allows /example/page.
  3. Hash (#): Used for comments, ignored by crawlers.
    • Example: # This blocks the admin section User-agent: * Disallow: /admin/ Anything after # is just a note for humans reading the file.


Discover more from Science Comics

Subscribe to get the latest posts sent to your email.

Leave a Reply

error: Content is protected !!