Keeping HTTPS pages out of google index

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • elvisyorkie
    Private First Class

    • Dec 2006
    • 5

    Keeping HTTPS pages out of google index

    Hi,

    What is the best way to keep search engines from indexing https pages. When they index both http and https pages you get thrown into supplemental results due to duplicate content.

    I was reading that you should have a robots.txt file for both the http and https. But I don't know where to put the file for the https pages.

    Can anyone help.

    Thanks

    Larry
  • Collectors-info
    General

    • Feb 2006
    • 8703

    #2
    Re: Keeping HTTPS pages out of google index

    You can place tags similar to this in the head of your page.
    Do a Google search on Meta tags no index, or similar.




    <META name="ROBOTS" content="NOINDEX">
    Use this for pages with many links on them, but not much useful data. Because "follow" is the default, you don't have to include it.
    Index, but do not follow links

    <META name="ROBOTS" content="NOFOLLOW">
    Use this for pages which have useful content but links which may be irrelevant or obsolete.
    Do not index or follow links

    <META name="ROBOTS" content="NOINDEX,NOFOLLOW">
    This is for pages which should not be indexed at all. If you put that in every page, the site should not be indexed.
    Index and follow links

    <META name="ROBOTS" content="INDEX,FOLLOW">
    This is the default behavior: you don't have to include this tag.

    Note: if you add Robots META tags to a framed site, be sure to include them on both the FRAMESET and the FRAME pages.
    Regards Chris.

    Collectables, Collecting, collectors-info.com

    www.chrismorris.co.uk

    House build project

    Comment

    • elvisyorkie
      Private First Class

      • Dec 2006
      • 5

      #3
      Re: Keeping HTTPS pages out of google index

      Chris,

      This will not work on a site that has a ssl cert. You want the search engines to index all you http urls and not index your https urls. When they index both you get duplicate content and end up in the supplemental results.

      Larry

      Comment

      • elvisyorkie
        Private First Class

        • Dec 2006
        • 5

        #4
        Re: Keeping HTTPS pages out of google index

        To remove only the https version of indexed pages from search engines, place the following in robots.txt file in the folder which serves the secured page of your site.
        User-agent: *
        Disallow: /

        The question is where is the folder which serves the secured pages of my site?

        thanks

        Larry

        Comment

        • Karen Mac
          General

          • Apr 2006
          • 8332

          #5
          Re: Keeping HTTPS pages out of google index

          Larry

          You dont need it in a folder. Just add to your robots.txt file
          User-Agent:* refers to the bots if you want to name them specifically.
          Disallow: https:/

          That would disallow all bots from your https pages so you dont inadvertantly get duplicate content smacked because bots havent quite figured out everything with these new algorythms yet. :)

          This goes in your ROOT folder for the site in question. If its your MAIN domain, then it goes in the public html folder along with the other files you see listed there. If its an addon, then put it inside the addon folder Ie: elvisyorkshireterrior

          Karen

          VodaHost

          Your Website People!
          1-302-283-3777 North America / International
          02036089024 / United Kingdom
          291916438 / Australia

          ------------------------

          Top 3 Best Sellers

          Web Hosting - Unlimited disk space & bandwidth.

          Reseller Hosting - Start your own web hosting business.

          Search Engine & Directory Submission - 300 directories + (Google,Yahoo,Bing)


          Comment

          • Karen Mac
            General

            • Apr 2006
            • 8332

            #6
            Re: Keeping HTTPS pages out of google index

            To clarify, when i said it didnt need a folder, I meant its own folder or an https folder. It goes in either the public html folder or the domain folder in the root.

            Karen

            VodaHost

            Your Website People!
            1-302-283-3777 North America / International
            02036089024 / United Kingdom
            291916438 / Australia

            ------------------------

            Top 3 Best Sellers

            Web Hosting - Unlimited disk space & bandwidth.

            Reseller Hosting - Start your own web hosting business.

            Search Engine & Directory Submission - 300 directories + (Google,Yahoo,Bing)


            Comment

            • elvisyorkie
              Private First Class

              • Dec 2006
              • 5

              #7
              Re: Keeping HTTPS pages out of google index

              Karen,

              I did what you said, but what concerns me is that this is what google says about this issue.

              Each port must have its own robots.txt file. In particular, if you serve content via both http and https, you'll need a separate robots.txt file for each of these protocols. For example, to allow Googlebot to index all http pages but no https pages, you'd use the robots.txt files below.
              For your http protocol (http://yourserver.com/robots.txt):
              User-agent: *
              Allow: /
              For the https protocol (https://yourserver.com/robots.txt):
              User-agent: *
              Disallow: /

              The other concern I have is that I use relative link urls instead of absolute urls which from what I read may contribute to the problem but if you follow using the two robot.txt files you can solve the problem.

              I've contacted customer service and had to wait over 8 hrs for a response. They finally responded and this was their response :

              "You can not really have the https version of a page not list and the http version of a page listed. At the end of the day, they are the exact same page."

              Based upon this response one really wonders how they even turn their computers on.

              Karen if you know how I can do what google says has to be done it would be greatly appreciated. Or If you know someone that I can call to resolve this problem would also be great.

              Thanks

              Larry

              Comment

              • Bethers
                Major General & Forum Moderator

                • Feb 2006
                • 5224

                #8
                Re: Keeping HTTPS pages out of google index

                Larry,
                I got your phone message - but I'm on vacation.

                However, you shouldn't have ANY pages https EXCEPT your checkout pages - and if this was the case, it wouldn't be a problem.

                So - what pages are both? NONE should be both. Checkout should be https -the rest should be http.

                Now, if you for some stupid reason have made some pages accessible both ways - then the robots.txt of nofollow to the https will work.

                Again - YOU SHOULD NOT HAVE a problem if you have the HTTPS pages only the pages they SHOULD BE

                And, YES I"M YELLING.
                Beth
                A Child's Palace - Pinata Palace - Moxie Enterprises

                SEO and Marketing Tools
                SEO - The Basics

                Comment

                • Karen Mac
                  General

                  • Apr 2006
                  • 8332

                  #9
                  Re: Keeping HTTPS pages out of google index

                  Larry

                  Define PORT or what you think they are saying is a PORT. Each website should have its own robots.txt file, you could install one in your admin area, but .. i dont know that would really serve any purpose other than to backup the root one.

                  I would define port.. as each website. Now.. your website https whatever is only virtual, because you have a dedicated ip, and encryption, but its the same SITE as your http, the only difference is when the encryption is called for. So one robots text should cover this. Now there is only ONE way for google to get this information. Either your software isnt reverting back to the http when a product is followed into the cart, and then continue shopping is hit, and the user remains in https mode. Google would also follow this path. The only other way is that when you created your sitemap, you also allowed those urls and didnt omit them.

                  Now if you create another domain or subdomain, then you would need another robots.txt file for this ROOT OR PORT. You dont technically have an HTTPS ROOT to install a robot txt file on, so I think you are complicating what googles intent is. You can also in webmaster tools set your preferences in google for this domain.

                  Karen

                  VodaHost

                  Your Website People!
                  1-302-283-3777 North America / International
                  02036089024 / United Kingdom
                  291916438 / Australia

                  ------------------------

                  Top 3 Best Sellers

                  Web Hosting - Unlimited disk space & bandwidth.

                  Reseller Hosting - Start your own web hosting business.

                  Search Engine & Directory Submission - 300 directories + (Google,Yahoo,Bing)


                  Comment

                  • Karen Mac
                    General

                    • Apr 2006
                    • 8332

                    #10
                    Re: Keeping HTTPS pages out of google index

                    Ok.. I just went and read up on this headache.. and what you said Larry was true, however they dont give you HOW to do this, and when I looked at yours i think i put the slash in the wrong place.

                    Http is normally port 80 and Https i think is 443 or something like this. Https is hypertext transfer protocol over secure sockets layer. (look all that up that will keep you busy for 2 or 3 hours)

                    Owing that you cant SEE or have access really to either of the ports, I dont have the froggiest idea how google would expect you to create 2 robots text files.

                    So.. I would stick with one robots txt file per root. Your domain isnt changing only the http or https is the culprit, so it would also stand then to reason, at least to my tired brain, that the https only kicks in when you are IN the shopping cart preparing to TRANSFER info to the virtual terminal via port 443. Therefore, I would disallow whatever that carts name is... and generally that is in your INCLUDES or SCRIPTS and housed in your ADMIN folder. Therefore, my most learned and experienced caffiene deprived brain says...
                    disallow: /ADMIN AREA whatever your cart maybe called, IE: SOHOADMIN, IE: in oscommerce: ADMIN

                    And forget about disallow /https, which most likely would give a syntax error anyway since its NOT a folder, but a virtual folder. and you could only access this on WMH hosting by port number.

                    And by the way, I didnt generate your site map, but make sure there are no references in it to the https protocols or to the admin or SECURE area of your store.

                    And by the way Larry, I went into your cart and ran a test order and I never did get an HTTPS protocol, which, I should have gotten while putting in the card numbers in the cart, and when the fake card was declined, you have some include syntax error come up.. so id say your cart isnt exactly par somewhere. The ssl is installed just fine. I checked that, but its in your direction to the https within your cart admin area.

                    God I HATE GOOGLE.. LOL

                    Ok.. im going to bed now, Ill get Matt Cutts Hate mail tomorrow! :)

                    Karen

                    VodaHost

                    Your Website People!
                    1-302-283-3777 North America / International
                    02036089024 / United Kingdom
                    291916438 / Australia

                    ------------------------

                    Top 3 Best Sellers

                    Web Hosting - Unlimited disk space & bandwidth.

                    Reseller Hosting - Start your own web hosting business.

                    Search Engine & Directory Submission - 300 directories + (Google,Yahoo,Bing)


                    Comment

                    • Karen Mac
                      General

                      • Apr 2006
                      • 8332

                      #11
                      Re: Keeping HTTPS pages out of google index

                      OH.. One more thing.. NO PHONE CALLS TIL 11am. Im sleeping in! IF my phone rings somebody better be bleeding... or else they will be!

                      Karen

                      VodaHost

                      Your Website People!
                      1-302-283-3777 North America / International
                      02036089024 / United Kingdom
                      291916438 / Australia

                      ------------------------

                      Top 3 Best Sellers

                      Web Hosting - Unlimited disk space & bandwidth.

                      Reseller Hosting - Start your own web hosting business.

                      Search Engine & Directory Submission - 300 directories + (Google,Yahoo,Bing)


                      Comment

                      • Bethers
                        Major General & Forum Moderator

                        • Feb 2006
                        • 5224

                        #12
                        Re: Keeping HTTPS pages out of google index

                        Larry,
                        I don't know what you've done - but I'm going to checkout on your site - and I never am even hitting the secure pages - therefore would never give you my credit card info.

                        You somehow - it's like you're cloning your site for the https - instead of installing it where it's needed. I can't tell you how to fix it- but it's definitely wrong - and no robot txt file is gonna fix this - you need to fix the pages.

                        Back to vacation :)
                        Beth
                        A Child's Palace - Pinata Palace - Moxie Enterprises

                        SEO and Marketing Tools
                        SEO - The Basics

                        Comment

                        Working...
                        X