Skip to main content

robot.txt what is this and how to use this


What do they do exactly?

Robot.txt files tell your instructions to a search engine robot..

The first thing a search engine spider looks at when it is visiting a page is the robots.txt file. It looks for it because it wants to know what it should do. If you have instructions for a search engine robot, you must tell it those instructions.
The most common problem people have with robot.txt files is that they don't know how to make them.

If you can make web pages, you can also make a robot.txt file. The file is a text file, which means that you can use notepad, wordpad, or any other plain text editor. You can also make them in Frontpage or Dreamweaver by using the "code" view. You can even "copy and paste" them.

So instead of thinking "I am making a robot.txt file", just think, "I am writing a note" they are the exact same process. However you would write a note or a letter on your computer will work for the robot.txt file.
robot.txt files and search robots

What should the robot.txt say?

That depends on what you want it to do.

Most people want robots to visit everything in their website. If this is the case with you, and you want the robot to index all parts of your site, there are three options to let the robots know that they are welcome.
1) Do not have a robot.txt file
If your website does not have a robot.txt file then this is what happens -
A robot comes to visit. It looks for the robot.txt file. It does not find it because it isn't there. The robot then feels free to visit all your web pages and content because this is what it is programmed to do in this situation.
2) Make an empty file and call it robots.txt
If your website has a robot.txt file that has nothing in it then this is what happens -
A robot comes to visit. It looks for the robot.txt file. It finds the file and reads it. There is nothing to read, so the robot then feels free to visit all your web pages and content because this is what it is programmed to do in this situation.
3) Make a file called robots.txt and write the following two lines in it... (these are "instructions" for the robot to follow)

User-agent: *

Disallow:
If your website has a robot.txt with these instructions in it then this is what happens -

A robot comes to visit. It looks for the robot.txt file. It finds the file and reads it. It reads the first line. Then it reads the second line. The robot then feels free to visit all your web pages and content because this is what it is what you told it to do.

What do the robot instructions mean?

Here is an explanation of what the different words mean in a robot.txt file
User-agent:
The "User-agent" part is there to specify directions to a specific robot if needed. There are two ways to use this in your file.

If you want to tell all robots the same thing you put a " * " after the "User-agent" It would look like this...
User-agent: *
(This line is saying "these directions apply to all robots")

If you want to tell a specific robot something (in this example Googlebot) it would look like this...
User-agent: Googlebot
(this line is saying "these directions apply to just Googlebot")
Disallow:
The "Disallow" part is there to tell the robots what folders they should not look at.

This means that if, for example you do not want search engines to index the photos on your site then you can place those photos into one folder and exclude it.

Lets say that you have put all these photos into a folder called "photos". Now you want to tell search engines not to index that folder.

Here is what your robot.txt file should look like:

User-agent: *
Disallow: /photos

The above two lines of text in your robots.txt file would keep robots from visiting your photos folder. The "User-agent *" part is saying "this applies to all robots". The "Disallow: /photos" part is saying "don't visit or index my photos folder".

Googlebot specific instructions

The robot that Google uses to index their search engine is called Googlebot. It understands a few more instructions than other robots. The instructions it follows are well defined in the Google help pages (see resources below).

In addition to the "User-name" and "Disallow" Googlebot also uses the...
Allow:
The "Allow:" instructions lets you tell a robot that it is okay to see a file in a folder that has been "Disallowed" by other instructions.

To illustrate this, let's take the above example of telling the robot not to visit or index your photos. We put all the photos into one folder called "photos" and we made a robot.txt file that looked like this...
User-agent: *
Disallow: /photos

Now let's say there was a photo called mycar.jpg in that folder that you want Googlebot to index. With the Allow: instruction, we can tell Googlebot to do so, it would look like this...

User-agent: *
Disallow: /photos
Allow: /photos/mycar.jpg
This would tell Googlebot that it can visit "mycar.jpg" in the photo folder, even though the "photo" folder is otherwise excluded.
Testing your robot.txt file
If you are using a Google sitemap as part of their webmaster tools, then you can log in and see if Google is having any issues crawling your site. There is also a robot.txt tool that allows you to experiment a little, letting you know if their are any problems with your file prior to putting it online.

Key Concept:


- If you use a robots.txt file, make sure it is correctly written because an incorrect robots.txt file can block the bots that index your website.

Comments

Popular posts from this blog

13 websites to register your free domain

Register your Free Domain Now!! 1)  .tk Dot TK is a FREE domain registry for websites on the Internet. It has exactly the same power as other domain extensions, but it’s free! Because it’s free, millions of others have been using .TK domains since 2001 – which makes .TK powerful and very recognizable.  Your website will be like www.yourdomainname.tk . It is free for 1 year. It’s a ccTLD domain whixh having the abbreviation  Tokelau. To create a .tk domain, Visit   www.dot.tk 2) co.cc Co.cc is completely free domain which is mostly used by blogspot bloggers because of it’s easy to use DNS system. Creating a co.cc for blogger is simple ( for instructions- “click here”). Your website will be like www.yourdomainname.co.cc . To create a .co.cc domain, visit www.co.cc 3)   co.nr co.nr is too like co.cc. Your website will be like  www.yourdomainname.co.nr . You can add it for blogger also.. To create a .co.cc domain, vi...

Binary Search Tree in C++( dynamic memory based )

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 #include<bits/stdc++.h> using namespace std; struct bst { int val; bst * left, * right; }; bst * root = nullptr; void srch ( int num,bst * head) { if (head == nullptr){ cout << " \n Number is not present \a " << endl; return ; } if (head -> val == num) { cout << " \n Number is present \n\a " ; return ; } else { if (num < head -> val) srch(num,head -> left); else srch(num,head -> right); ...

C++ Program to implement Operator Overloading with binary arithmetic op (*,+,-)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 Define a class complex with two data members real & imag. Perform the following operations: – Find the sum of two complex numbers – Find the difference of two complex numbers – Find the Product of two complex numbers – Find the Product of a complex number with a constant

Track Lost Android Phone and Tablet

1. Use the IMEI Number Every Android phone carries a unique IMEI number. It will be printed at the back of your device. If you are unable to find the number, you have to launch your phone app and dial the number *#06#. This will give you the IMEI number of your phone. Store this number in a safe place so that it helps you in locating your phone when it is lost 2 Android Device Manager Google has recently released a new locator feature for Android gadgets called Android Device Manager, which helps its users locate their lost or stolen phones and tablets. It functions in the same way as Lookout and Samsung’s “Find My Mobile”. Here’s how to use Android Device Manager. Go to the Google Settings app, then select Android device manager. By default, the locator feature is activated but to activate remote data wipe, select the box next to “Allow remote factory reset”, then select “activate”. To use this feature, open the site https://www.google.com/android/devicemanager and sig...

Binary Search Tree in Java implementation (reference based, dynamic memory)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 import java.util.Scanner ; class BST { static BST . Node root = null ; public void insert ( int num ) { if ( root == null ) { root = new BST . Node ( num ); } else { // root node is not empty BST . Node temp = root ; while ( temp != null ) { if ( num <= temp . getVal ()) { if ( temp . getLeft () != null ) temp = temp . getLeft (); ...

Exception Handling in Java

Exception Handling() it is an event that stops the normal flow of execution of the program and terminates the program abnormally. Exception always occurs at runtime. There Is two type of Exception: Checked Exception.:-The exception which is checked by the compiler at the compilation time. Ex: IOException,ClassNotFoundException. Unchecked Exception: The Exception which Cant be checked by the compiler at the compilation time and directly occurs at runtime is called an unchecked Exception. Ex:NullPointerException,StringIndexOutOfBoundsException. Error: Error is the type of unchecked Exception which occurs due to programmers mistake, end-user input mistake, resource unavailability or network problem etc. OutOfMemoryError StackOverFlowError etc. 5 keywords to handle Exception: try catch finally throw throws possible combination: try-catch try-catch-catch try-catch-finally try finally try-...

Live bootable linux usb

Unetbootin ..  UNetbootin has built-in support for automatically downloading and loading the following distributions: Ubuntu, Debian, Fedora, PCLinuxOS, Linux Mint, Sabayon Linux, Gentoo, MEPIS, openSUSE, Zenwalk, Slax, Dreamlinux, Arch Linux, Elive, CentOS, Damn Small Linux, Mandriva, SliTaz, FaunOS, Puppy Linux, FreeBSD, gNewSense, Frugalware Linux, NetBSD but can work with others too. Link... Download from here UNetbootin can also be used to load various system utilities, including: Parted Magic , a partition manager that can  resize , repair, backup, and restore partitions. Super Grub Disk , a boot utility that can  restore and repair  overwritten and misconfigured GRUB installs or directly boot various operating systems Backtrack , a utility used for network analysis and penetration testing. Ophcrack , a utility which can recover Windows passwords. NTPasswd , a utility which can reset Windows passwords and edit the registry. ...

How To Add A Background Image Behind Your Blogger Posts

n this latest blog i will show you how to use an image in the background of your blog posts.This background image will only be behind the post area of your blog. Once you choose your image here's how to add it to your blog: 1.Click 'Layout'-->'Edit html' For your blog 2.Find the following piece of code in your blogs html: (Click'CTRL F' on your keyboard for a search box to help find code) #main-wrapper { You will see ' #main-wrapper { 'Followed by code similar to this : #main-wrapper { float: left; width: 486px; margin: 0px 0px 0px 0px; padding: 0px 0px 0px 0px; word-wrap: break-word; /* fix for long text breaking sidebar float in IE */ overflow: hidden; /* fix for long non-text content breaking IE sidebar float */ } 3.Now add the following code Directly Below ' #main-wrapper { ' background:url(PUT IMAGE URL HERE) repeat top right; /* background-attachment: fixed; */ 4.Replace the text 'PUT IMAGE URL HERE...

how to Send a Confirmation Email Upon Form Submission-Woofoo

When someone successfully submits an entry, you can automatically send them a confirmation email to let them know. You can customize the email to include any follow-up info you'd like, and you can choose to include a copy of their entry in the email as well. To set up confirmation emails in Form Settings: Log in and go to  Forms . Hover over  Edit  next to the form you want to edit. Choose  Edit form . Click the  Form Settings  tab. Under Confirmation Options, select  Send Confirmation Email to User . From the  Send To  dropdown, select an Email field from your form. We'll send the confirmation email to the email address the person filling out your form entered into this field. If the dropdown says "No Email Fields Found", add an  Email  field to your form. In the  Reply To  textbox, enter the reply-to email—if someone replies to their confirmation email, this is the email address that their reply will be s...

Internet Download Manager (IDM}

Internet Download Manager (IDM) is a tool to increase download speeds by up to 5 times, resume and schedule downloads. Comprehensive error recovery and resume capability will restart broken or interrupted downloads due to lost connections, network problems, computer shutdowns, or unexpected power outages. Simple graphic user interface makes IDM user friendly and easy to use.Internet Download Manager has a smart download logic accelerator that features intelligent dynamic file segmentation and safe multipart downloading technology to accelerate your downloads. Unlike other download managers and accelerators Internet Download Manager segments downloaded files dynamically during download process and reuses available connections without additional connect and login stages to achieve best acceleration performance. Internet Download Manager supports proxy servers, ftp and http protocols, firewalls, redirects, cookies, authorization, MP3 audio and MPEG video content processing. IDM integra...