Skip to main content

robot.txt what is this and how to use this


What do they do exactly?

Robot.txt files tell your instructions to a search engine robot..

The first thing a search engine spider looks at when it is visiting a page is the robots.txt file. It looks for it because it wants to know what it should do. If you have instructions for a search engine robot, you must tell it those instructions.
The most common problem people have with robot.txt files is that they don't know how to make them.

If you can make web pages, you can also make a robot.txt file. The file is a text file, which means that you can use notepad, wordpad, or any other plain text editor. You can also make them in Frontpage or Dreamweaver by using the "code" view. You can even "copy and paste" them.

So instead of thinking "I am making a robot.txt file", just think, "I am writing a note" they are the exact same process. However you would write a note or a letter on your computer will work for the robot.txt file.
robot.txt files and search robots

What should the robot.txt say?

That depends on what you want it to do.

Most people want robots to visit everything in their website. If this is the case with you, and you want the robot to index all parts of your site, there are three options to let the robots know that they are welcome.
1) Do not have a robot.txt file
If your website does not have a robot.txt file then this is what happens -
A robot comes to visit. It looks for the robot.txt file. It does not find it because it isn't there. The robot then feels free to visit all your web pages and content because this is what it is programmed to do in this situation.
2) Make an empty file and call it robots.txt
If your website has a robot.txt file that has nothing in it then this is what happens -
A robot comes to visit. It looks for the robot.txt file. It finds the file and reads it. There is nothing to read, so the robot then feels free to visit all your web pages and content because this is what it is programmed to do in this situation.
3) Make a file called robots.txt and write the following two lines in it... (these are "instructions" for the robot to follow)

User-agent: *

Disallow:
If your website has a robot.txt with these instructions in it then this is what happens -

A robot comes to visit. It looks for the robot.txt file. It finds the file and reads it. It reads the first line. Then it reads the second line. The robot then feels free to visit all your web pages and content because this is what it is what you told it to do.

What do the robot instructions mean?

Here is an explanation of what the different words mean in a robot.txt file
User-agent:
The "User-agent" part is there to specify directions to a specific robot if needed. There are two ways to use this in your file.

If you want to tell all robots the same thing you put a " * " after the "User-agent" It would look like this...
User-agent: *
(This line is saying "these directions apply to all robots")

If you want to tell a specific robot something (in this example Googlebot) it would look like this...
User-agent: Googlebot
(this line is saying "these directions apply to just Googlebot")
Disallow:
The "Disallow" part is there to tell the robots what folders they should not look at.

This means that if, for example you do not want search engines to index the photos on your site then you can place those photos into one folder and exclude it.

Lets say that you have put all these photos into a folder called "photos". Now you want to tell search engines not to index that folder.

Here is what your robot.txt file should look like:

User-agent: *
Disallow: /photos

The above two lines of text in your robots.txt file would keep robots from visiting your photos folder. The "User-agent *" part is saying "this applies to all robots". The "Disallow: /photos" part is saying "don't visit or index my photos folder".

Googlebot specific instructions

The robot that Google uses to index their search engine is called Googlebot. It understands a few more instructions than other robots. The instructions it follows are well defined in the Google help pages (see resources below).

In addition to the "User-name" and "Disallow" Googlebot also uses the...
Allow:
The "Allow:" instructions lets you tell a robot that it is okay to see a file in a folder that has been "Disallowed" by other instructions.

To illustrate this, let's take the above example of telling the robot not to visit or index your photos. We put all the photos into one folder called "photos" and we made a robot.txt file that looked like this...
User-agent: *
Disallow: /photos

Now let's say there was a photo called mycar.jpg in that folder that you want Googlebot to index. With the Allow: instruction, we can tell Googlebot to do so, it would look like this...

User-agent: *
Disallow: /photos
Allow: /photos/mycar.jpg
This would tell Googlebot that it can visit "mycar.jpg" in the photo folder, even though the "photo" folder is otherwise excluded.
Testing your robot.txt file
If you are using a Google sitemap as part of their webmaster tools, then you can log in and see if Google is having any issues crawling your site. There is also a robot.txt tool that allows you to experiment a little, letting you know if their are any problems with your file prior to putting it online.

Key Concept:


- If you use a robots.txt file, make sure it is correctly written because an incorrect robots.txt file can block the bots that index your website.

Comments

Popular posts from this blog

How to Protect an Email Account from being Hacked

If this is the case, then what is the reason for many people to lose their accounts? The answer is very simple. They don’t know how to protect themselves from being hacked! In fact most of the people who lose their email accounts are not the victims of hacking but the victims of Trapping. They lose their passwords not because they are hacked by some expert hackers but they are fooled to such an extent that they themselves give away their password. Are you confused? If so continue reading and you’ll come to know… Now I’ll mention some of the most commonly used online scams which fool people and make them lose their passwords. I’ll also mention how to protect your email account from these scams. 1 . WEBSITE SPOOFING =Website spoofing is the act of creating a website, with the intention of misleading the readers. The website will be created by a different person or organisation (Other than the original) especially for the purposes of cheating. Normally, the website...

Java API call Example using GSON, org.json.json and Jackson [ Simple Get Call] and parsing result as JSON

import com.fasterxml.jackson.databind.JsonNode ; import com.fasterxml.jackson.databind.ObjectMapper ; import com.google.gson.* ; import org.json.JSONArray ; import org.json.JSONObject ; import java.io.* ; import java.net.HttpURLConnection ; import java.net.URL ; public class APICALL { public static void main (String[] args) throws IOException { // String url="https://mocki.io/v1/19a50724-c2e5-46a1-b457-543462cdfde2"; String url= "https://jsonplaceholder.typicode.com/users" ; String line ; StringBuilder resp= new StringBuilder() ; System. out .println(url) ; HttpURLConnection con= (HttpURLConnection) new URL(url).openConnection() ; con.setRequestMethod( "GET" ) ; con.setRequestProperty( "Accept" , "application/json" ) ; System. out .println(con.getResponseMessage()) ; System. out .println(con.getContentType()) ; InputStream inputStream=con.getInput...

keyboard-shortcuts-that-work-in-all-web-browsers

Each major web browser shares a large number of keyboard shortcuts in common. Whether you’re using Mozilla Firefox, Google Chrome, Internet Explorer, Apple Safari, or Opera – these keyboard shortcuts will work in your browser. Each browser also has some of its own, browser-specific shortcuts, but learning the ones they have in common will serve you well as you switch between different browsers and computers. This list includes a few mouse actions, too. Tabs Ctrl+1-8 – Switch to the specified tab, counting from the left. Ctrl+9 – Switch to the last tab. Ctrl+Tab – Switch to the next tab – in other words, the tab on the right. (Ctrl+Page Up also works, but not in Internet Explorer.) Ctrl+Shift+Tab – Switch to the previous tab – in other words, the tab on the left. (Ctrl+Page Down also works, but not in Internet Explorer.) Ctrl+W, Ctrl+F4 – Close the current tab. Ctrl+Shift+T – Reopen the last closed tab. Ctrl+T – Open a new tab. Ctrl+N – Open a new browser window....

5 Best Popular Posts Widgets For Blogger

Adding the Popular Posts Widget for Blogger Just click on your blog title, access the "Layout" menu, click "Add a Gadget" and choose "Popular Posts". A window will appear asking you to configure the widget by choosing which posts you'll feature (e.g. those that were most viewed in the past 7 days or 30 days or from the beginning of your blog). You'll also be asked to choose how many posts you'll feature in your Popular Posts section and select if you'll show the post title only or along with the image thumbnail and/or the snippet. (Remember that each widget style has different requirements, so follow the styles and instructions carefully to find out if you need the snippet and image thumbnail or not). Popular Posts Style 1 - Box within a box This is an interesting widget style since it uses your snippet and image thumbnail in a unique way. Your snippet is written in opaque text and placed in a small transparent box. This, in turn, ...

Windows 10 1703 Fall Creator update/upgrage brings NEW UI ... the fluent Ui

Microsoft is planning to implement these subtle design changes gradually. Some are already available in new updates to existing Windows 10 apps, and more will start to appear in Windows itself as Microsoft updates the operating system with the Fall Creators Update and future updates. "It's going to be a journey," says Microsoft director Aaron Woodman, noting that these design changes will appear over time in Windows and other products. On stage at Build today, Microsoft's Joe Belfiore demonstrated a number of Fluent Design changes. "You're going to see Fluent Design show up in the Windows shell, in our apps, and across devices," explains Joe Belfiore. Microsoft is focusing on light, depth, motion, material, and scale for its Fluent Design, with subtle changes that make the design feel like it's moving during interactions in Windows. An inking demo showed how Microsoft is bringing the pen experience across the entirety of Windows, allowing...

DOWNLOAD SOFTWARES, GAMES , MUCH MORE BY BYPASS PAYPAL PAYMENT

Bypass Paypal Payment: Now Download, Software, Games. etc for free without paying any money. We all know that we need to pay via Paypal for download various things from Internet like Games, Ebooks, Software, etc. However this method works only few websites that have very weak security. So Today you are going to learn another method that is "Bypassing Paypal Payment". STEPS Just paste the below code in your address bar the site where the option come to pay Via Paypal CODE " javascript:top.location=document.getElementsByName('return')[0].value; javascript:void(0) " If this code worked out for you, you will be redirected to the download page, Elase bad luck, try another site SOME IMPORTANT TIPS This method is working only on those sites with weak security feature. Make sure that your browser supports JavaScript. one website on which this method are works you can try these sites which have weak security sites take it from fallowing link pastesite.com/22775

python program to Print Starting Series OF Indian Mobile Number for a State or operator or both

import requests import urllib.request import time from bs4 import BeautifulSoup as bs import re url = ' https://en.wikipedia.org/wiki/Mobile_telephone_numbering_in_India' state_to_extract = "UE" #if set to None all state is considered telecom_to_extracted = None #if set to none all operator from particular city is extracted response = requests . get(url) print (response) soup = bs(response . text, "html.parser" ) one_a_tag = soup . findAll( 'tr' )[ 35 :] lst = [] for k in one_a_tag: s = k . findAll( 'td' ) limit = len (s) i = 0 while True : if i == limit: break no = s[i] . text i += 1 if i == limit: break operator = s[i] . text i += 1 if i == limit: break state = s[i] . text i += 1 if i == limit: break res = f "{no} {operator} {state}" if state_to_extract is None : if telecom_to_extracted is None : lst . append(no) elif telecom_to_e...

Supported OS id and their code in an installer

Hello frnd this is the code and thier Operating system id in windows ,to identify which program run in which windows The ID below indicates application support for Windows Vista -->           --The ID below indicates application support for Windows 7 -->           -The ID below indicates application support for Windows 8 -       supportedOS Id="{4a2f28e3-53b9-4441-ba9c-d69d4a4a6e38}" code provided to you by Rajlive360.tk

Streamlining Java Web Application Deployment with React WAR Generator

In the ever-evolving world of web development, managing builds and deployments can often be cumbersome and error-prone. Today, we're excited to introduce a tool designed to simplify and streamline this process: the React WAR Generator . What is the React WAR Generator? The React WAR Generator is a Python-based tool that automates the creation of WAR (Web Application Archive) files for Java web applications. It caters specifically to frontend projects built with React or similar frameworks, making it easier to package and deploy your web applications to a Tomcat server. Key Features Profile-Based Builds : With support for multiple profiles ( dev , test , prod , default ), you can build your application according to different environments and configurations. Version File Generation : Optionally generate a version file that integrates versioning information directly into your TypeScript files, ensuring your build versions are always up-to-date. Tomcat Deployment : Simplify your deploy...