Jump to content

Google:Sitemap contains urls which are blocked by robots.txt


KenSTJ

Recommended Posts

Anyone seen this before?

This is a new site and a first time sitemap submission to Google Looking at the fact that index is involved - this can't be good for the crawler....

 

The warnings/examples given by G are:

"Value: http://www.mydomain.com/en/index.php?controller=authentication"

"Value: http://www.mydomain.com/es/index.php?controller=authentication"

"Value: http://www.mydomain.com/fr/index.php?controller=authentication"

 

I noted some other issues in this Forum dealing with language - is this related?

Edited by KenSTJ (see edit history)
Link to comment
Share on other sites

Are you using friendly-URL's ? By using friendly URL's this do not happen, but that is another problem with PS 1.5. which is fatal for still running shops. See here:

 

http://forge.prestas.../browse/PNM-971

 

We haven't released our upgraded shop to PS 1.5. because exact one year ago we had the same problem with sitemap. Google removed all our links from the index. We lost all our good positions...

Link to comment
Share on other sites

as a new shop we went with the latest version 1.5.3.1. Yes, we are using friendly URLs. The index page we had set to "home". However, I just removed it to make the friendly equal "blank".

 

What might I look for in robots to figure out what Google is warning about - do you know?

 

Thanks for the quick response!

Link to comment
Share on other sites

  • 2 months later...

So what is the answer to this? I am running into the same issue running 1.5.3. I have removed "Disallow: /*controller=authentication" from robots.txt and resubmitted the sitemap, but it still gives the same warnings.

 

Sitemap contains urls which are blocked by robots.txt.

https://shop.sunstat...=authentication

 

I have gone through the entire sitemap and compared it to robots.txt and https://shop.sunstatetechnology.com/index.php?controller=authentication is the only URL in the sitemap that matches anything in robots.txt. So why with that entry removed from robots.txt am I still receiving the errors?

Edited by sscardefield (see edit history)
Link to comment
Share on other sites

So what is the answer to this? I am running into the same issue running 1.5.3. I have removed "Disallow: /*controller=authentication" from robots.txt and resubmitted the sitemap, but it still gives the same warnings.

 

Sitemap contains urls which are blocked by robots.txt.

https://shop.sunstat...=authentication

 

I have gone through the entire sitemap and compared it to robots.txt and https://shop.sunstat...=authentication is the only URL in the sitemap that matches anything in robots.txt. So why with that entry removed from robots.txt am I still receiving the errors?

 

Hello sscardefield

 

I can help you on this, Please share your both robots.txt and sitemap.xml links. plus please mention the urls that for which you receiveing error these errors.

Link to comment
Share on other sites

Hello alastairbrian, thanks for the response. Here is my robots.txt and sitemap.xml.

 

http://shop.sunstate....com/robots.txt

 

http://shop.sunstate...com/sitemap.xml

 

And here is the errors I'm getting from Google.

 

qoVjWr2.png

 

The robots.txt used to contain "Disallow: /*controller=authentication" but I removed it. Still get the same error.

 

I wouldn't care if those URL's don't get indexed, but this is preventing Google from going forward with the indexing. The indexing just stays as "pending".

Edited by sscardefield (see edit history)
Link to comment
Share on other sites

I want you look for the robots.txt entries in webmaster its under health - blocked URLs. There you will see your robot.txt as fetched by Google. See is there any Disallow: /*controller=authentication command ? If yes than wait for 2 to 3 days as Google has not fetched your updated robots.txt. The error will be removed after the Google re fetch your updated robots.txt file.

 

Let me know when you are done with it. However if the problem is persistent I will review it more deeply by spending some more time as apparently I can't see any major culprit in robots.txt

Link to comment
Share on other sites

  • 1 month later...

maybe u can try to install google sitemap module. this resolved my robot errors.

 

but in my google webmaster, I can see I have 1260 indexed but there is a total of 1222 blocked by robots....

is it normal?

 

User-agent: *

# Private pages

Disallow: /*orderby=

Disallow: /*orderway=

Disallow: /*tag=

Disallow: /*id_currency=

Disallow: /*search_query=

Disallow: /*back=

Disallow: /*utm_source=

Disallow: /*utm_medium=

Disallow: /*utm_campaign=

Disallow: /*n=

Disallow: /*controller=addresses

Disallow: /*controller=address

Disallow: /*controller=authentication

Disallow: /*controller=cart

Disallow: /*controller=discount

Disallow: /*controller=footer

Disallow: /*controller=get-file

Disallow: /*controller=header

Disallow: /*controller=history

Disallow: /*controller=identity

Disallow: /*controller=images.inc

Disallow: /*controller=init

Disallow: /*controller=my-account

Disallow: /*controller=order

Disallow: /*controller=order-opc

Disallow: /*controller=order-slip

Disallow: /*controller=order-detail

Disallow: /*controller=order-follow

Disallow: /*controller=order-return

Disallow: /*controller=order-confirmation

Disallow: /*controller=pagination

Disallow: /*controller=password

Disallow: /*controller=pdf-invoice

Disallow: /*controller=pdf-order-return

Disallow: /*controller=pdf-order-slip

Disallow: /*controller=product-sort

Disallow: /*controller=search

Disallow: /*controller=statistics

Disallow: /*controller=attachment

Disallow: /*controller=guest-tracking

# Directories

Disallow: /*classes/

Disallow: /*config/

Disallow: /*download/

Disallow: /*mails/

Disallow: /*modules/

Disallow: /*translations/

Disallow: /*tools/

# Files

Disallow: /*en/password-recovery

Disallow: /*en/address

Disallow: /*en/addresses

Disallow: /*en/authentication

Disallow: /*en/cart

Disallow: /*en/discount

Disallow: /*en/order-history

Disallow: /*en/identity

Disallow: /*en/my-account

Disallow: /*en/order-follow

Disallow: /*en/order-slip

Disallow: /*en/order

Disallow: /*en/search

Disallow: /*en/quick-order

Disallow: /*en/guest-tracking

# Sitemap

Sitemap: http://site.com/sitemap.xml

  • Like 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...