Skip to main content

robot.txt what is this and how to use this


What do they do exactly?

Robot.txt files tell your instructions to a search engine robot..

The first thing a search engine spider looks at when it is visiting a page is the robots.txt file. It looks for it because it wants to know what it should do. If you have instructions for a search engine robot, you must tell it those instructions.
The most common problem people have with robot.txt files is that they don't know how to make them.

If you can make web pages, you can also make a robot.txt file. The file is a text file, which means that you can use notepad, wordpad, or any other plain text editor. You can also make them in Frontpage or Dreamweaver by using the "code" view. You can even "copy and paste" them.

So instead of thinking "I am making a robot.txt file", just think, "I am writing a note" they are the exact same process. However you would write a note or a letter on your computer will work for the robot.txt file.
robot.txt files and search robots

What should the robot.txt say?

That depends on what you want it to do.

Most people want robots to visit everything in their website. If this is the case with you, and you want the robot to index all parts of your site, there are three options to let the robots know that they are welcome.
1) Do not have a robot.txt file
If your website does not have a robot.txt file then this is what happens -
A robot comes to visit. It looks for the robot.txt file. It does not find it because it isn't there. The robot then feels free to visit all your web pages and content because this is what it is programmed to do in this situation.
2) Make an empty file and call it robots.txt
If your website has a robot.txt file that has nothing in it then this is what happens -
A robot comes to visit. It looks for the robot.txt file. It finds the file and reads it. There is nothing to read, so the robot then feels free to visit all your web pages and content because this is what it is programmed to do in this situation.
3) Make a file called robots.txt and write the following two lines in it... (these are "instructions" for the robot to follow)

User-agent: *

Disallow:
If your website has a robot.txt with these instructions in it then this is what happens -

A robot comes to visit. It looks for the robot.txt file. It finds the file and reads it. It reads the first line. Then it reads the second line. The robot then feels free to visit all your web pages and content because this is what it is what you told it to do.

What do the robot instructions mean?

Here is an explanation of what the different words mean in a robot.txt file
User-agent:
The "User-agent" part is there to specify directions to a specific robot if needed. There are two ways to use this in your file.

If you want to tell all robots the same thing you put a " * " after the "User-agent" It would look like this...
User-agent: *
(This line is saying "these directions apply to all robots")

If you want to tell a specific robot something (in this example Googlebot) it would look like this...
User-agent: Googlebot
(this line is saying "these directions apply to just Googlebot")
Disallow:
The "Disallow" part is there to tell the robots what folders they should not look at.

This means that if, for example you do not want search engines to index the photos on your site then you can place those photos into one folder and exclude it.

Lets say that you have put all these photos into a folder called "photos". Now you want to tell search engines not to index that folder.

Here is what your robot.txt file should look like:

User-agent: *
Disallow: /photos

The above two lines of text in your robots.txt file would keep robots from visiting your photos folder. The "User-agent *" part is saying "this applies to all robots". The "Disallow: /photos" part is saying "don't visit or index my photos folder".

Googlebot specific instructions

The robot that Google uses to index their search engine is called Googlebot. It understands a few more instructions than other robots. The instructions it follows are well defined in the Google help pages (see resources below).

In addition to the "User-name" and "Disallow" Googlebot also uses the...
Allow:
The "Allow:" instructions lets you tell a robot that it is okay to see a file in a folder that has been "Disallowed" by other instructions.

To illustrate this, let's take the above example of telling the robot not to visit or index your photos. We put all the photos into one folder called "photos" and we made a robot.txt file that looked like this...
User-agent: *
Disallow: /photos

Now let's say there was a photo called mycar.jpg in that folder that you want Googlebot to index. With the Allow: instruction, we can tell Googlebot to do so, it would look like this...

User-agent: *
Disallow: /photos
Allow: /photos/mycar.jpg
This would tell Googlebot that it can visit "mycar.jpg" in the photo folder, even though the "photo" folder is otherwise excluded.
Testing your robot.txt file
If you are using a Google sitemap as part of their webmaster tools, then you can log in and see if Google is having any issues crawling your site. There is also a robot.txt tool that allows you to experiment a little, letting you know if their are any problems with your file prior to putting it online.

Key Concept:


- If you use a robots.txt file, make sure it is correctly written because an incorrect robots.txt file can block the bots that index your website.

Comments

Popular posts from this blog

USE any TRIAL SOFTWARE FOREVER WITHOUT SERIAL NUMBER

USE any TRIAL SOFTWARE FOREVER WITHOUT SERIAL NUMBER(most wanted trick) Run a trial software forever now with time stopper you can run a trial software forever no need to fetch for serial numbers,activation codes,patch just DOWNLOAD TIME STOPPER now open it install it click browse select the .exe of the software or file which you want to run forever now simply click create desktop icon and now delete all its existing shortcuts now have fun enjoying software for life time

C++ Program to Find HCF and LCM among 4 numbers (Easiest Logic)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 #include <iostream> #include <math.h> using namespace std; int main () { int a,b,c,d,i,j,minimum; cout << "Enter the all four number " ; cin >> a >> b >> c >> d; if (a < b && a < c && a < d) minimum = a; else if (b < c && b < d) minimum = b; else if (c < d) minimum = c; else minimum = d; for (j = minimum;; -- j) { if (a % j == 0 && b % j == 0 && c % j == 0 && d % j == 0 ) { break ; } } for (i = 1 ;;i ++ ) { if (i % a == 0 && i % b == 0 && i % c == 0 && i % d == 0 ) break ; } cout << "Lowest Common factor=>" << i << endl; ...

12 Tips to Maintain a Virus Free Computer

1. Email is one of the common ways by which your computer can catch a virus . So it is always recommended to stay away from SPAM. Open only those emails that has it’s origin from a trusted source such as those which comes from your contact list. If you are using your own private email host (other than gmail, yahoo, hotmail etc.) " then it is highly recommended that you use a good anti-spam software. And finally NEVER click on any links in the emails that comes from untrusted sources. 2. USB thumb/pen drives is another common way by which viruses spread rapidly." So it is always a good habit to perform a virus scan before copying any data onto your computer. NEVER double-click the pen drive to open it. Instead right-click on it and select the option “open”. This is a safe way to open a pen drive. 3. Be careful about using MS Outlook. Outlook is more susceptible to worms than other e-mail programs, unless you have efficient Anti-Virus programs running. Use Pegasus ...

How to Put Google Adsense Below Post Title in Blogger?

Adsense is used by majority of expert bloggers for their website monetization because it is a cookie based contextual advertising system that shows targeted ads relevant to the content and reader. As bloggers are paid on per click basis, they try various ad placements on the blog to  increase the revenue  and get maximum clicks on the ad units. Well, on some blogs, you might have seen Adsense ad units placed below the post title. Do you know why? It is because the area just below the post title gets the most exposure and is the best place to put AdSense ad units to increase  Click Through Rate (CTR). Even though ads below post title work like a charm but this doesn’t mean that it will work for you as well. If you want to find out the best AdSense ads placement for your blog, try experimenting by placing ads at various locations such as header, sidebar, footer, etc. You can try other  blog monetization methods  as well to effectively monetize y...

FIXED: Feedjit gadgets no longer work on my blog!!! help

well if your feedjit tracker is not working than you can easily fix this... goto blogger.com Go to  Dashboard -> Settings And set  HTTPS Redirect  to  No The gadgets will work now for  http ://yourblog.blogspot.com They won't for  https ://yourblog.blogspot.com because their connection isn't secure. this is not a permanent fix ... the gadget developers will have to make their gadgets https ready before Blogger will (if at all) become https only.

How to Hack an Ethernet ADSL Router

Every router comes with a  username  and  password  using which it is possible to gain access to the router settings and configure the device. The vulnerability actually lies in the  Default username  and  password  that comes with the factory settings. Usually the routers come preconfigured from the Internet Service provider and hence the users do not bother to change the password later. This makes it possible for the attackers to gain unauthorized access to the router and modify its settings using a common set of default usernames and passwords. Here is how you can do it. Before you proceed, you need the following tool in the process: Angry IP Scanner Hacking the ADSL Router: Here is a detailed information on how to exploit the vulnerability of an ADSL router: Go to  whatismyipaddress.com . Once the page is loaded, you will find your IP address. Note it down. Open Angry IP Sca...

How to install gta sanandreas mods

INdex 1.GTA San Andreas System Requirements 2.Backing Up 3.Installing The Mod --Method 1: Using IMG Tool --Method 2 Self Mod Installers ----GMM ----San Andreas Mod Installer --Installing cloths --Installing Weapons 4.Tips 5.My Mods arn't working! Starting off 1.GTA San Andreas System Requirements You need to make sure you computer can run San Andrease. Theese are the Requirements. Recommended System Requirements: Intel Pentuim 4 or AMD XP Processor (or better) 384MB RAM 16x Speed DVD Drive 4.7GB Hard Drive Space 128MB Video Card DirectX 9 compatible sound & video drivers Keyboard, mouse or game pad Minimum System Requirements: 1GHz Pentuim III or AMD Athlon 256MB RAM 8x Speed DVD Drive 3.6 GB Hard Disk Space 64MB Video Card DirectX 9 compatable sound & video drivers Keyboard, mouse 2.Backing Up Before you do anything to the game, back up the files you are going to mod. I usally copy the GTA folder and rename it "Normal San And...

Supported OS id and their code in an installer

Hello frnd this is the code and thier Operating system id in windows ,to identify which program run in which windows The ID below indicates application support for Windows Vista -->           --The ID below indicates application support for Windows 7 -->           -The ID below indicates application support for Windows 8 -       supportedOS Id="{4a2f28e3-53b9-4441-ba9c-d69d4a4a6e38}" code provided to you by Rajlive360.tk