What is a Bot ?
A robot is a program that automatically traverses the Web's hypertext structure by retrieving a document, and recursively retrieving all documents that are referenced.
Note that "recursive" here doesn't limit the definition to any specific traversal algorithm; even if a robot applies some heuristic to the selection and order of documents to visit and spaces out requests over a long space of time, it is still a robot.
Normal Web browsers are not robots, because they are operated by a human, and don't automatically retrieve referenced documents (other than inline images).
Robot Exclusion :
The robots exclusion standard, also known as the Robots Exclusion Protocol or
robots.txt protocol is a convention to prevent cooperating web spiders and other web robots from accessing all or part of a website which is, otherwise, publicly viewable.
Robots are often used by search engines to categorize and archive web sites, or by webmasters to proofread source code. The standard complements Sitemaps, a robot inclusion standard for websites.
Some Examples :
The following codes can be used in the robots.txt file for the purposes written alongwith.
- Allowing all the robots to index the site.
Code:
User-agent: *
Disallow:
- Disallowing all the robots to index the site.
Code:
User-agent: *
Disallow: /
- Disallowing only the specified directories to all the robots.
Code:
User-agent: *
Disallow: /cgi-bin/
Disallow: /secret/
Disallow: /another-secret/
- Preventing one robot from indexing a specific directory
Code:
User-agent: A-Bad-Robot
Disallow: /some-secret/
- Disallowing only a single file to all robots.
Code:
User-agent: *
Disallow: /private/secret-file.htm
- Specifying a Sitemap :
Code:
Sitemap: http://www.yoursite.com/path/to/sitemap.xml.gz
- Delaying the Crawling :
Code:
User-agent: *
Crawl-delay: 5
This will instruct all the robots to request different documents at an interval of 5 seconds.
This can be used to lower the no. of requests you receive per second,
if you are being flooded by a huge no. of bots and requests.
And hence lower the server load too.
thanx,
Shadab.