The robots.txt file

**Vasili** · 09-08-2010, 01:52 PM

Re: The robots.txt file

Excellent ... should be updated somewhat though for maximum benefit, with the addition of auto-discovery coding below for the sitemap.xml:

Complete robots.txt example for XML sitemaps autodiscovery (with no 'disallow' parameters) by adding the "sitemap" line as shown below:

User-agent: *
Allow:
Allow: (etc. for as many as allowing)
Sitemap: http://www.yoursitename.com/sitemap.xml

If you have created a sitemap index file (where you specifically echo your donotfollow parameters by deleting the page/item entries manually that were auto-generated by the sitemap generator), you can also reference that by inserting this line of code instead of the above:

User-agent: *
Disallow: (enter specific files/pages not to be read)
Sitemap: Sitemap: http://www.yoursitename.com/sitemap-index.xml

Basically, before you upload your sitemap.xml file, delete the coding that maps the pages you do not want spidered ... thus, it "mirrors" your 'disallow' instructions in your robots.txt file via simple omission, being sure to alter the robots.txt file as shown above by including the "auto-discovery" of the sitemap code so it becomes a 'Rule'!

**jenvin** · 09-09-2010, 02:15 AM

Re: The robots.txt file

Hi,
This is my first post here and I'm not that html brained, but I did understand the above post on the sitemap.
I have a google sitemap installed on my website, but it won't allow Googlebot-Images access to the images.

This is what I have in the sitemap for crawlwr access
User-agent: *

Disallow: /cgi-bin
Disallow: /admin
Disallow: /account.php
Disallow: /advanced_search.php
Disallow: /checkout_shipping.php
Disallow: /create_account.php
Disallow: /login.php
Disallow: /password_forgotten.php
Disallow: /shopping_cart.php
Disallow: /_vti_bin
Disallow: /_vti_cnf
Disallow: /_vti_log
Disallow: /_vti_pvt
Disallow: /_vti_txt

User-agent: Googlebot-Image

Disallow: /

Should I take out the "dissallow: /" or put under the dissallow "Allow: /images ?

I will be thankful for any replies.

Jen

**HalfDime47** · 09-18-2010, 04:29 PM

Re: The robots.txt file

Vasili, I am having a problem reaching either of the two links in your post. I am using Firefox/3.6.9. Please advise if these are available elsewhere.
Thanks.

**Vasili** · 09-25-2010, 11:23 PM

Re: The robots.txt file

JENVIN
You cannot have conflicting instructions between the files: the robots.txt file will need to clearly state any disallow, and in this case, you must specifically 'rule' that your images are disallowed to be cached.
Also, after auto-generating your sitemap.xml (I prefer not to use Google's version, as it is geared to the advantage of their overall scheme rather than purely W3C compliant), you must carefully delete the code "mention" of your image file/page, so there is no gap or spacing in the code as well as no mention of the file/page in existance: the robots.txt file creates a Rule based on a single-stated disallowing, but there is no "affirmation" of reference to a resource otherwise (no clearly noted mention of the file of page, since deleted from the xml sitemap, see?).
The above was in keeping with the context of the earlier article discussing "hiding" page views, but to answer your question directly, "Yes, in your case you would create a specific 'Agent' mention and a proper 'Allow' Rule, as you show above in your post."

HALFDIME
The links above were SAMPLES (note the word "YourSiteName" in them?)
Replace "yoursitename" with your domain name ....

You can generate a compliant robots.txt and a sitemap.xml both at this site.

**chriscartoons** · 06-28-2011, 11:02 AM

Re: The robots.txt file

i'm adding it now as we speak hahahahahahahaha!!

**sunrise2012** · 07-03-2011, 04:38 PM

Re: The robots.txt file

Hello,

I'm a total beginner at creating websites, so forgive the dumb questions, please.

Where should the "robots.txt" information (Disallow, Allow) be placed?
Somewhere in the html below and on every page in the website? (I have about 35 pages):
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <HTML> <HEAD> <TITLE>xxxxx <META HTTP-EQUIV="Pragma" CONTENT="no-cache"> <META Name="Keywords" Content="xxxxx"> <META Name="Description" Content="xxxxx"> <META NAME="ROBOTS" CONTENT="ALL"> <META NAME="revisit-after" CONTENT="10 days"> <META NAME="author" content="xxxxx"> <META NAME="copyright" content="Copyright 1980-2007 by xxxxx. All Rights Reserved."> <META NAME="resource-type" content="document"> <META NAME="distribution" content="global">

</HEAD>

Also, if there is another better way to create the above, I would really
appreciate knowing that.

Thank you so much!

L.N.

**VodaHost** · 07-03-2011, 05:50 PM

Re: The robots.txt file

Originally posted by sunrise2012 View Post

Where should the "robots.txt" information (Disallow, Allow) be placed?
Somewhere in the html below and on every page in the website? (I have about 35 pages)
Also, if there is another better way to create the above, I would really appreciate knowing that. Thank you so much!

L.N.

As I explained above, the robots.txt is created and saved as a file that is to be uploaded to your public_html folder (Root, or Home Directory) and is not part of or intended to be imported into any web page as coding whatsoever!

The robots.txt, sitemap.xml, and the sitemap.html files can all be auto-created for you without errors in the format required at the site Vasili suggested: www.xml-sitemaps.com
Don't forget that the robots.txt and the sitemap.xml files should include the exact same rules. You can edit any of these files using Notebook, as I mentioned, being sure to save it in the same format with the proper extensions.

**flexworth** · 07-06-2011, 08:51 PM

Re: The robots.txt file

I don't have any pages I wanted to 'disallow' but I did notice an increase in organic traffic when I uploaded the robot.txt blank file to folder.

The robots.txt file

The robots.txt file

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment