Robots Txt not there

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • coffeegourmet
    Second Lieutenant

    • Feb 2007
    • 124

    Robots Txt not there

    Hey all,

    As you can see by my signature below,my coffee site is pointed to Voda Host but has not switched over yet. However my .biz site has been up for a bit and I was just looking at my awstats and it said there is no robots.txt there. I thought I added it before.

    Now I need to really make sure that it is there for my coffee site, as I think google will spider it pretty darn quick. So what do I do to make sure it is there. I know you have to go into the html in the page and enter something there... do I just do it for my index page?

    I just want to make sure I get spidered correctly the first time!
    My Coffee Gourmet -I offer discounts to forum members, so pm me for the discount code!

    We Have Picnic Baskets - Your stop for picnic baskets, picnic backpacks, NCAA Logo Merchandise and more! PM me for a discount code!

    "What you not give up, No one can take from you!" - Our motto from our two month long strike this summer!
  • davidundalicia
    General

    • Mar 2006
    • 6294

    #2
    Re: Robots Txt not there

    Hi coffeegourmet:

    If you do a quick search (top of this page) for robots.txt you should find quite a few threads with the appropiate details.

    I usually add robot.txt to ANY index page as I have quite a few different folders and addon sites............

    To check if its there, view your html code on your index page (right click and view pahe html) or view in your browser and look at the source code.....


    have fun
    Have fun
    Regards..... David

    Step by Step Visual Tutorials for the complete beginner
    Newbies / Beginners Forum
    FREE Membership Login Scripts: - Meta Tags Analyzer
    My Social Networking Site - Free Contact Forms
    Finished your New website!! Now get it noticed Here:

    Comment

    • Bethers
      Major General & Forum Moderator

      • Feb 2006
      • 5224

      #3
      Re: Robots Txt not there

      A robots.txt file is important if you want to exclude pages, etc - otherwise, it won't hurt to not have one. I've yet to add one to any of my BV sites - although I keep saying I'll get around to it one day - if only to put in follow all LOL

      Don't sweat it - but you can add one.
      Beth
      A Child's Palace - Pinata Palace - Moxie Enterprises

      SEO and Marketing Tools
      SEO - The Basics

      Comment

      • Mook25
        Brigadier General

        • Oct 2005
        • 1427

        #4
        Re: Robots Txt not there

        Search engines use robots to crawl or spider web pages on the web, these robots or crawlers are nothing else but special programs written for reading web page information including text, links, graphics, headings etc.

        These crawlers or robots tend to follow a special specification file known as the robots.txt file. For example if a search robot visits a site http://www.seopages.com then it first looks for the robots text file at http://www.seopages.com/robots.txt. If found then the robot follows the instructions in that file is having about how to index that site which pages to read and which not to read. This robots.txt file guides the search robot which part of a website to index and which not to index.

        From what I understand, the robots specification was developed in 1993 and came to be known as the ‘The Robots Exclusion Standard’ and still remains the standard for directing robots with almost all search engines following it. You can learn to define and place a robots file further in this article.

        Basically robots.txt as the file extension implies is just a simple text file without any scripting or programming code in it. It can be created using a simple text editor like notepad and consists of simple text directives.

        Complex word processors should never be used because their formatting can create problems and lead to removal of the site. Almost every website has certain privileged pages containing sensitive and confidential information that is not intended for general users those pages can be disallowed for reading by search engines with robots file. Robots.txt file can be customized to allow only specific search robots to spider the site, and to disallow reading specific directories or files. Let us create a simple robots.txt file here. Open a simple text editor i.e. notepad write the following lines and save as robots:

        #this is a typical example of robots file
        #comments are placed after hash.

        User-agent: *
        Disallow: /cgi-bin/

        This is a typical example of robots.txt file the User-agent line directive specifies the name of the robot or spider that is visiting the website for example “User-agent: googlebot” specifies Googles robot and the instructions following down will be for that robot. A “ User-agent: * “ value means all robots on the web. Further comes the “Disallow” directive. The disallow directive line specifies the file name or folder name that is to be disallowed to read by that specific robot. Disallow field can be left blank also which will specify that all pages are allowed to spider. Here one care is to be taken in the disallow field that each file to be disallowed should be declared on a new line. In other words multiple files should not be written against single disallow directive. For example for multiple files to be disallowed we will define robots.txt as :

        User-agent: Googlebot
        Disallow: information.html
        Disallow: private.html
        Disallow: shipping.html

        User-agent: Architext
        Disallow: /

        In this example Googlebot is disallowed three pages to crawl and Architext, the spider of Excite, is disallowed all the pages of the site. Similarly all spiders can be instructed if you know their names otherwise use ‘ * ’. However if the file that is to be protected is residing in a folder other than root folder( / ) then complete path of the file can be specified. Now the question arises that where should robots.txt be placed on a website. The answer is root directory( / ) where the index file is placed. Remember that there should always be just one Robots.txt file on a website. Website addresses(URL’s) are case-sensitive, and "robots.txt" string must be all in lower-case and exactly same in name. Blank lines are not permitted within a single record in the "robots.txt" file and there must be exactly one “User-agent” field per record. If robots file is placed in wrong folder then it looses its functionality and spiders ignore it making it useless.

        Advantages of having a Robots.txt

        It helps to hide and protect sensitive and confidential information by disallowing spiders to index them.

        It helps in search engine specific optimization of a website (making web pages for particular search engines).

        This file should be very carefully written according to the format specified before uploading to a website because a simple mistake can result in index removal of a complete website from search engines. Don’t indulge in the activity of making too many copies of web pages to be optimized for every search engine present instead be reasonable with the number and keep the target of the major five or seven engines. So now you know What is a robots.txt file? How to define it? How to use it? and Where to place it?
        Arcade Ninja - Free Flash Arcade
        FreeGadget4me.Com - Learn how to get free gadgets delivered direct to your door for free

        Comment

        • Mook25
          Brigadier General

          • Oct 2005
          • 1427

          #5
          Re: Robots Txt not there

          You can check you robots text is valid by using free validator tools such as the one found here
          Arcade Ninja - Free Flash Arcade
          FreeGadget4me.Com - Learn how to get free gadgets delivered direct to your door for free

          Comment

          Working...
          X