Irka Bot

How to Create a Robots.txt file tutorial

Definition of a Robots.txt file:

A robot index websites, check link popularity on websites, and judge websites whether it contains relevant content with targeted keywords or not. A robot is also called a Crawler or a Spider, they have the same meaning.

A robot is a computer program operated by a search engine, a research organization, a University or an individual. Everytime a search engines robot look into your root domain a file named "robots.txt" usually at the address: 'http://www.mydomain.com/robots.txt'.

This robots.txt file tell robots what files or folders they have to index on your website. We call that system the "Robots Exclusion Standard".

Here is a list of common mistakes used in a robots.txt file.

The robots.txt file looks like that:

# mydomain.com robots.txt:
User-agent: OmniExplorer_Bot
Disallow: /
User-agent: BecomeBot
Disallow: /
User-agent: Googlebot:
Disallow: /

The command:

User-agent is the name of the robots, spiders, crawlers that visit your website.

User-agent: googlebot The rules only apply to googlebot spider
User-agent: * here the star (*) specifies all kind of robots

the disallow command keep out robots from certain files or folders.

Disallow: /content/ Robots cannot index the folder content
Disallow: / Robots are not allowed to index all the folders and files
Disallow: mustard.php Disallow robots to lindex the mustard.php file.

There is no Allow command as you understood it! If it was so it means you have to write down all the folders and files you want the robots to index.

The comments tag

You can also make comments in a robots.txt file like the following examples:

User-agent: *
Disallow: /forum/userlist/ # Get all informations about our users
You can let a comment about the folder, file or anything you want.

Here is another robots.txt example:

# go away
User-agent: *
Disallow: /

This example is the best, all robots are disallowed to crawl the website.

Let's make some more examples and explain about it

User-agent: inktomislurp
Disallow: /cgi-bin/
Disallow: /forum/
Disallow: /data/taxes.asp

+ Here the targeted robot is inktomislurp and is not allowed to index the folder cgi-bin and forum and also not allowed to index the file taxes.asp from the folder data.

This rules only apply to the robot Inktomislurp.

User-agent:*
Disallow:/4gettobuybread/
Disallow:/404.jsp
Disallow:/tips.jsp
Disallow:/index.jsp
Disallow:/site_index.jsp
Disallow:/common

+ All robots are disallowed to enter the folder 4gettobuybread and common. And are disallowed to access the files 404.jsp, tips.jsp, index.jsp, site_index.jsp.

User-agent:googlebot
Disallow: /*.PDF$
Disallow: /*.jpeg$
Disallow: /*.exe$

+ It restricts googlebot from indexing all PDF, JEPG and EXE files.

Next page: Robots.txt common mistakes


Irkawebpromotions 2005 - 2011 © - All Rights Reserved.