Irka Bot

Robots.txt common mistakes

Robots.txt file Introduction

There is on the internet some robots who doesn't care about your content and website but care about the emails in your webpages. If you have a page called "contact.html" and all the emails are gathered arround inside of this page, may be those robots will find the nice amounts of emails and tomorow you'll receive some news on how you can brush your teeth with a beaver! Those robots are the root of spams. They take your emails, make a nice list everyday. And after that the user spammer take the list and spam with stupids advertisments into your mail box.

Common Robots.txt Mistakes

1. The most common mistakes is backwards syntax:

User-agent: *
Disallow: apple

But it should be:

User-agent: apple
Disallow: *

2. One directory at a time:

User-agent: Googlebot
Disallow: /css/ /image/ /cgi-bin/

This will not work at all and you should correct the syntax into:

User-agent: Googlebot
Disallow: /css/
Disallow: /image/
Disallow: /cgi-bin/

Be careful with the User-agent also, do not align plenty of robots in the User-agent command.

3. Avoid listing your private/secret directories:

Everybody can access your robots.txt file and not only the spiders, you can check on many websites such as google, symantec etc... If you list a directory in a robots.txt file attracts the attention to the directory. And some spiders like i said previously can find out those directories. And give the list to the spammers.

4. Do not create many robots.txt file in different folder:

One of my friends have a website and asked me to check his website for mistakes in seo and design. He told me he has a robots.txt file in all of his folders to lead the spiders. I told him no way, you've to use just one robots.txt, you just can't use many at the same time. This would be extremly confusing for the robots.

5.a text file will be fine. robots.txt. No extensions such as .doc, .xls etc...


Irkawebpromotions 2005 - 2011 © - All Rights Reserved.