How to Prevent Search Engines from Indexing Your Pages and Files

It is possible to block search engines from indexing your web pages or files using the robots.txt file and we will demonstrate how to do that in this tutorial.

Search engines crawl and index every web page they find following the links on other web pages. In general, it is a good thing for your website to be crawled and indexed by search engine spiders because search engines are still the number one traffic source for most websites on the web.

By default, search engine spiders will start crawling and indexing your website once it is published and linked to from other websites. In some cases, you may have some web pages or files on your website that you prefer not to be displayed on search results. For example, maybe you want to share your posts on your site only with your personal friends and connections and not with the general public. This is where robots.txt file comes to help.

robots.txt file basically controls which search engine spiders or other type of crawlers will be allowed to visit your website, crawl and index it. It consists of a set of instructions and good search engines will follow the instructions in this file. With the help of the robots.txt file, you can prevent your whole website or certain web pages and files from being indexed by search engines.

How to Create robots.txt File

1. Open your text editor, e.g. Notepad, and create a new text file.

2. Put your preferred robots instructions into this file. (see the examples below)

How to Prevent Search Engines from Indexing Your Web Pages and Files

3. Save the file as robots.txt.

4. Upload robots.txt file to the root directory of your website.

robots.txt Examples

To prevent all search engines from indexing your whole website, use the following code:

User-agent: *
Disallow: /

To prevent a specific robot/crawler from accessing your website:

User-agent: RobotName
Disallow: /

Replace RobotName with the correct user agent of the robot you want to prevent.

To prevent all search engines from accessing certain directories on your website:

User-agent: *
Disallow: /img/
Disallow: /private/

To prevent robots crawling a certain page or file on your website:

User-agent: *
Disallow: /img/my-portrait.jpg
Disallow: /my-portrait.html

robots.txt file is a publicly accessible file. That means if a person uses the right URL to your robots.txt file, then s/he will see the content of it. Therefore you shouldn't use robots.txt file to hide or store critical information. If you are going to upload private files or publish private pages on your website, things that you don't want to share publicly, you should store them inside a password protected area on your website.

Please keep in mind that some web robots like scrapers and malware robots will ignore robots.txt file instructions. You should also know that if you don't have a properly configured robots.txt file, then all parts of your website will be indexed and visible by the search engines.

You can learn more about the robots.txt file on this site: robotstxt.org. You may also want to check what Google has to say about robots.txt files on this page.

f t g+ in