Go Back   Webmaster Forum > Development > Web Development
REMOVE the ads below !
Reply
 
LinkBack Thread Tools
  # 1 (permalink)
Old
The Computer Addict !
Posts: 1,996
Join Date: Feb 2007
iTrader: (0)
Location: Bhopal (MP, India)
Thumbs up Robots.txt file : A handy tool for webmasters. - 10-23-2007

What is a Bot ?

A robot is a program that automatically traverses the Web's hypertext structure by retrieving a document, and recursively retrieving all documents that are referenced.

Note that "recursive" here doesn't limit the definition to any specific traversal algorithm; even if a robot applies some heuristic to the selection and order of documents to visit and spaces out requests over a long space of time, it is still a robot.

Normal Web browsers are not robots, because they are operated by a human, and don't automatically retrieve referenced documents (other than inline images).


Robot Exclusion :

The robots exclusion standard, also known as the Robots Exclusion Protocol or robots.txt protocol is a convention to prevent cooperating web spiders and other web robots from accessing all or part of a website which is, otherwise, publicly viewable.

Robots are often used by search engines to categorize and archive web sites, or by webmasters to proofread source code. The standard complements Sitemaps, a robot inclusion standard for websites.


Some Examples :

The following codes can be used in the robots.txt file for the purposes written alongwith.
  1. Allowing all the robots to index the site.
    Code:
    User-agent: *
    Disallow:
  2. Disallowing all the robots to index the site.
    Code:
    User-agent: *
    Disallow: /
  3. Disallowing only the specified directories to all the robots.
    Code:
    User-agent: *
    Disallow: /cgi-bin/
    Disallow: /secret/
    Disallow: /another-secret/
  4. Preventing one robot from indexing a specific directory
    Code:
    User-agent: A-Bad-Robot
    Disallow: /some-secret/
  5. Disallowing only a single file to all robots.
    Code:
    User-agent: *
    Disallow: /private/secret-file.htm
  6. Specifying a Sitemap :
    Code:
    Sitemap: http://www.yoursite.com/path/to/sitemap.xml.gz
  7. Delaying the Crawling :
    Code:
    User-agent: *
    Crawl-delay: 5
    This will instruct all the robots to request different documents at an interval of 5 seconds.

    This can be used to lower the no. of requests you receive per second,
    if you are being flooded by a huge no. of bots and requests.
    And hence lower the server load too.


thanx,
Shadab.
Reply With Quote
  # 2 (permalink)
Old
Junior Geek
Posts: 142
Join Date: Feb 2007
iTrader: (0)
Re: Robots.txt file : A handy tool for webmasters. - 10-23-2007

Note : Googlebot doesn't accept the crawl delay


I can add a feature for spiders who accept Sitemap like googlebot, Msnbot, Yahoo :
Add :
Code:
Sitemap: http://www.yoursite.com/sitemap.php
Just be careful the syntax :
- no space between 'Sitemap' and ':'
- a space between ':' and your address,
- and last you must put the full URL and must be into your website. You can't add an URL from another website
- and of course you must have a correct sitemap page at sitemap.php, lol (you could use another name or extension)


Thanks,
Aqua


-------------------
http://www.aquafolie.org

Last edited by aquafolie : 10-23-2007 at 06:07 PM.
Reply With Quote
  # 3 (permalink)
Old
Regular Geek
Posts: 308
Join Date: Nov 2007
iTrader: (0)
Re: Robots.txt file : A handy tool for webmasters. - 11-03-2007

This seems cool, thanks mate.
Reply With Quote
  # 4 (permalink)
Old
Junior Geek
Posts: 86
Join Date: Sep 2007
iTrader: (0)
Re: Robots.txt file : A handy tool for webmasters. - 11-24-2007

this is an awesome post...do you mind if i use it on my forums and provide you with a vBSEO linkback??

Let me know


Freelance CMS - Where Design Meets Development

Web Design and Development Forums | Portfolio
Reply With Quote
  # 5 (permalink)
Old
The Computer Addict !
Posts: 1,996
Join Date: Feb 2007
iTrader: (0)
Location: Bhopal (MP, India)
Re: Robots.txt file : A handy tool for webmasters. - 11-25-2007

Quote:
Originally Posted by MuscleManiac View Post
this is an awesome post...do you mind if i use it on my forums and provide you with a vBSEO linkback??

Let me know
Yes, and
Thanks you for your appreciation.




Thanx,
Shadab.
Reply With Quote
  # 6 (permalink)
Old
Senior Geek
Posts: 559
Join Date: Nov 2007
iTrader: (0)
Location: Romania
Re: Robots.txt file : A handy tool for webmasters. - 11-27-2007

Thank's Shadab !
I should in my future site , I will add a site map .. ! Because my visitors want to see everything in the site
Reply With Quote
  # 7 (permalink)
Old
The Computer Addict !
Posts: 1,996
Join Date: Feb 2007
iTrader: (0)
Location: Bhopal (MP, India)
Re: Robots.txt file : A handy tool for webmasters. - 12-28-2007

Welcome mate !

I highly recommend you use a sitemap if your site contains a LOT of deep pages.
Makes it easy for the search engines to index and crawl all your page.

:twisted:
Reply With Quote
  # 8 (permalink)
Old
Designing Team Leader
Posts: 86
Join Date: Jun 2008
iTrader: (0)
Location: guidelineproducts.com
06-28-2008

wow, this is neat, i'll probably be using this on my site, thanks!


GuidelineProducts.com
I'm Selling : digitalpointer.net digitalpointer.info digitalpointing.net digitalpointing.info
Get a huge discount if you buy whole pack
Reply With Quote
  # 9 (permalink)
Old
Newbie
Posts: 13
Join Date: Aug 2008
iTrader: (0)
3 Weeks Ago

fantastic post! Great for beginner webmasters..


Databloc - cPanel Web Hosting
Read user reviews here and here.
Reply With Quote
Reply


Thread Tools



Advertise Here for just $6 per month

vBulletin®, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd. | SEO by vBSEO | Skin developed by vBStyles.com