Skip to main content

robot.txt what is this and how to use this


What do they do exactly?

Robot.txt files tell your instructions to a search engine robot..

The first thing a search engine spider looks at when it is visiting a page is the robots.txt file. It looks for it because it wants to know what it should do. If you have instructions for a search engine robot, you must tell it those instructions.
The most common problem people have with robot.txt files is that they don't know how to make them.

If you can make web pages, you can also make a robot.txt file. The file is a text file, which means that you can use notepad, wordpad, or any other plain text editor. You can also make them in Frontpage or Dreamweaver by using the "code" view. You can even "copy and paste" them.

So instead of thinking "I am making a robot.txt file", just think, "I am writing a note" they are the exact same process. However you would write a note or a letter on your computer will work for the robot.txt file.
robot.txt files and search robots

What should the robot.txt say?

That depends on what you want it to do.

Most people want robots to visit everything in their website. If this is the case with you, and you want the robot to index all parts of your site, there are three options to let the robots know that they are welcome.
1) Do not have a robot.txt file
If your website does not have a robot.txt file then this is what happens -
A robot comes to visit. It looks for the robot.txt file. It does not find it because it isn't there. The robot then feels free to visit all your web pages and content because this is what it is programmed to do in this situation.
2) Make an empty file and call it robots.txt
If your website has a robot.txt file that has nothing in it then this is what happens -
A robot comes to visit. It looks for the robot.txt file. It finds the file and reads it. There is nothing to read, so the robot then feels free to visit all your web pages and content because this is what it is programmed to do in this situation.
3) Make a file called robots.txt and write the following two lines in it... (these are "instructions" for the robot to follow)

User-agent: *

Disallow:
If your website has a robot.txt with these instructions in it then this is what happens -

A robot comes to visit. It looks for the robot.txt file. It finds the file and reads it. It reads the first line. Then it reads the second line. The robot then feels free to visit all your web pages and content because this is what it is what you told it to do.

What do the robot instructions mean?

Here is an explanation of what the different words mean in a robot.txt file
User-agent:
The "User-agent" part is there to specify directions to a specific robot if needed. There are two ways to use this in your file.

If you want to tell all robots the same thing you put a " * " after the "User-agent" It would look like this...
User-agent: *
(This line is saying "these directions apply to all robots")

If you want to tell a specific robot something (in this example Googlebot) it would look like this...
User-agent: Googlebot
(this line is saying "these directions apply to just Googlebot")
Disallow:
The "Disallow" part is there to tell the robots what folders they should not look at.

This means that if, for example you do not want search engines to index the photos on your site then you can place those photos into one folder and exclude it.

Lets say that you have put all these photos into a folder called "photos". Now you want to tell search engines not to index that folder.

Here is what your robot.txt file should look like:

User-agent: *
Disallow: /photos

The above two lines of text in your robots.txt file would keep robots from visiting your photos folder. The "User-agent *" part is saying "this applies to all robots". The "Disallow: /photos" part is saying "don't visit or index my photos folder".

Googlebot specific instructions

The robot that Google uses to index their search engine is called Googlebot. It understands a few more instructions than other robots. The instructions it follows are well defined in the Google help pages (see resources below).

In addition to the "User-name" and "Disallow" Googlebot also uses the...
Allow:
The "Allow:" instructions lets you tell a robot that it is okay to see a file in a folder that has been "Disallowed" by other instructions.

To illustrate this, let's take the above example of telling the robot not to visit or index your photos. We put all the photos into one folder called "photos" and we made a robot.txt file that looked like this...
User-agent: *
Disallow: /photos

Now let's say there was a photo called mycar.jpg in that folder that you want Googlebot to index. With the Allow: instruction, we can tell Googlebot to do so, it would look like this...

User-agent: *
Disallow: /photos
Allow: /photos/mycar.jpg
This would tell Googlebot that it can visit "mycar.jpg" in the photo folder, even though the "photo" folder is otherwise excluded.
Testing your robot.txt file
If you are using a Google sitemap as part of their webmaster tools, then you can log in and see if Google is having any issues crawling your site. There is also a robot.txt tool that allows you to experiment a little, letting you know if their are any problems with your file prior to putting it online.

Key Concept:


- If you use a robots.txt file, make sure it is correctly written because an incorrect robots.txt file can block the bots that index your website.

Comments

Popular posts from this blog

20 Windows Keyboard Shortcuts You Might Not Know

Global Windows Shortcuts Win+1, 2, 3, 4, etc. will launch each program in your taskbar. It is helpful then to keep your most used programs at the beginning of your task bar so you can open them one right after another. This also works in Windows Vista for the quick launch icons. Win+Alt+1, 2, 3, etc. will open the jump list for each program in the taskbar. You can then use your arrows to select which jump list option you want to open. Win+T will cycle through taskbar programs. This is similar to just hovering over the item with your mouse but you can launch the program with Space or Enter. Win+Home minimizes all programs except current the window. This is similar to the Aero shake and can be disabled with the same registry key. Win+B selects the system tray which isn’t always useful but can come in very handy if your mouse stops working. Win+Up/Down maximizes and restores down the current window so long as that window has the option to be maximized. It is exactly t...

How to Put Google Adsense Below Post Title in Blogger?

Adsense is used by majority of expert bloggers for their website monetization because it is a cookie based contextual advertising system that shows targeted ads relevant to the content and reader. As bloggers are paid on per click basis, they try various ad placements on the blog to  increase the revenue  and get maximum clicks on the ad units. Well, on some blogs, you might have seen Adsense ad units placed below the post title. Do you know why? It is because the area just below the post title gets the most exposure and is the best place to put AdSense ad units to increase  Click Through Rate (CTR). Even though ads below post title work like a charm but this doesn’t mean that it will work for you as well. If you want to find out the best AdSense ads placement for your blog, try experimenting by placing ads at various locations such as header, sidebar, footer, etc. You can try other  blog monetization methods  as well to effectively monetize y...

what is LOREM ipsum and why do designers use it

What is Lorem Ipsum? Lorem Ipsum  is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum. Why do we use it? It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using 'Content here, content here', making it look like readable English. Many desktop publishing packages and web page editors now...

keyboard-shortcuts-that-work-in-all-web-browsers

Each major web browser shares a large number of keyboard shortcuts in common. Whether you’re using Mozilla Firefox, Google Chrome, Internet Explorer, Apple Safari, or Opera – these keyboard shortcuts will work in your browser. Each browser also has some of its own, browser-specific shortcuts, but learning the ones they have in common will serve you well as you switch between different browsers and computers. This list includes a few mouse actions, too. Tabs Ctrl+1-8 – Switch to the specified tab, counting from the left. Ctrl+9 – Switch to the last tab. Ctrl+Tab – Switch to the next tab – in other words, the tab on the right. (Ctrl+Page Up also works, but not in Internet Explorer.) Ctrl+Shift+Tab – Switch to the previous tab – in other words, the tab on the left. (Ctrl+Page Down also works, but not in Internet Explorer.) Ctrl+W, Ctrl+F4 – Close the current tab. Ctrl+Shift+T – Reopen the last closed tab. Ctrl+T – Open a new tab. Ctrl+N – Open a new browser window....

Streamlining Java Web Application Deployment with React WAR Generator

In the ever-evolving world of web development, managing builds and deployments can often be cumbersome and error-prone. Today, we're excited to introduce a tool designed to simplify and streamline this process: the React WAR Generator . What is the React WAR Generator? The React WAR Generator is a Python-based tool that automates the creation of WAR (Web Application Archive) files for Java web applications. It caters specifically to frontend projects built with React or similar frameworks, making it easier to package and deploy your web applications to a Tomcat server. Key Features Profile-Based Builds : With support for multiple profiles ( dev , test , prod , default ), you can build your application according to different environments and configurations. Version File Generation : Optionally generate a version file that integrates versioning information directly into your TypeScript files, ensuring your build versions are always up-to-date. Tomcat Deployment : Simplify your deploy...

13 websites to register your free domain

Register your Free Domain Now!! 1)  .tk Dot TK is a FREE domain registry for websites on the Internet. It has exactly the same power as other domain extensions, but it’s free! Because it’s free, millions of others have been using .TK domains since 2001 – which makes .TK powerful and very recognizable.  Your website will be like www.yourdomainname.tk . It is free for 1 year. It’s a ccTLD domain whixh having the abbreviation  Tokelau. To create a .tk domain, Visit   www.dot.tk 2) co.cc Co.cc is completely free domain which is mostly used by blogspot bloggers because of it’s easy to use DNS system. Creating a co.cc for blogger is simple ( for instructions- “click here”). Your website will be like www.yourdomainname.co.cc . To create a .co.cc domain, visit www.co.cc 3)   co.nr co.nr is too like co.cc. Your website will be like  www.yourdomainname.co.nr . You can add it for blogger also.. To create a .co.cc domain, vi...

Java API call Example using GSON, org.json.json and Jackson [ Simple Get Call] and parsing result as JSON

import com.fasterxml.jackson.databind.JsonNode ; import com.fasterxml.jackson.databind.ObjectMapper ; import com.google.gson.* ; import org.json.JSONArray ; import org.json.JSONObject ; import java.io.* ; import java.net.HttpURLConnection ; import java.net.URL ; public class APICALL { public static void main (String[] args) throws IOException { // String url="https://mocki.io/v1/19a50724-c2e5-46a1-b457-543462cdfde2"; String url= "https://jsonplaceholder.typicode.com/users" ; String line ; StringBuilder resp= new StringBuilder() ; System. out .println(url) ; HttpURLConnection con= (HttpURLConnection) new URL(url).openConnection() ; con.setRequestMethod( "GET" ) ; con.setRequestProperty( "Accept" , "application/json" ) ; System. out .println(con.getResponseMessage()) ; System. out .println(con.getContentType()) ; InputStream inputStream=con.getInput...

remove virus without antivirus

want  to remove virus without antivirus here it is Start->Run->type cmd in each drive type attrib /s /d it will display the list of all files in that drive along with folders.concntrate on files having SHR attribute.normally virus files have two characteristics 1.SHR attribute 2.Queer name like amvo.exe,r6r.exe,autorun.inf etc. Noteme system files also have this attribute like MSDOS.SYS,IO.SYS etc so before deleting googling about that file will help. to delete these files type c:\>del /f /s /a >> to view the content of files with .inf,.vbs,.c etc i.e files which r not batch files or executables.goto explorer n then goto the required drive or folder n type the filename with extension it wil open up in notepad. >>there is another method also.goto the required location n type attrib -s -h -r filename then use gui to see that hiiden file.if it is not n exe or .bat or then open it with notepad.Here you will get some information like a file na...

Creating an Executable Jar File

Creating a jar File in  Eclipse In  Eclipse  Help contents, expand "Java development user guide" ==> "Tasks" ==> "Creating JAR files."  Follow the instructions for "Creating a new JAR file" or "Creating a new runnable JAR file."The  JAR File  and  Runnable JAR File  commands are for some reason located under the  File menu: click on  Export...  and expand the  Java  node. Creating a jar File in  JCreator You can configure a "tool" that will automate the jar creation process.  You only need to do it once. Click on  Configure/Options . Click on  Tools  in the left column. Click  New , and choose  Create Jar file . Click on the newly created entry  Create Jar File  in the left column under  Tools . Edit the middle line labeled  Arguments:  it should have cvfm $[PrjName].jar manifest.txt *.class Click OK. Now set...

How to Use Google for Hacking

Google serves almost 80 percent of all the search queries on the Internet, proving itself as the most popular search engine. However, Google makes it possible to reach not only the publicly available information resources, but also gives access to some of the most confidential information that should never have been revealed. In this post, you will find the information on how to use Google for exploiting security vulnerabilities that exists within many websites. The following are some of the ways to use Google for hacking : 1. Using Google to Hack Security Cameras: There exists many security cameras that are used for monitoring places like parking lots, college campus, road traffic etc. With Google, it is possible to hack these cameras so that you can view the images captured by them in real time. For this, all you have to do is just use the following search query in Google. Type in the Google search box exactly as given below and hit enter: inurl:”viewerframe?m...