Jump to content

How to generate robots.txt to block crawling?


byoung85

Recommended Posts

Hi all,

 

I want to block all bots from crawling my site until the designing is finished. I generated a robots.txt file from the BO and checked/confirmed the file on my server via FTP, but the test from Google Webmaster Tool came back as "allowed".

 

Here's my robots.txt:

 

# robots.txt automaticaly generated by PrestaShop e-commerce open-source solution

#
-

# This file is to prevent the crawling and indexing of certain parts

# of your site by web crawlers and spiders run by sites like Yahoo!

# and Google. By telling these "robots" where not to go on your site,

# you save bandwidth and server resources.

# For more information about the robots.txt standard, see:

#

# All bots

User-agent: *

Disallow: /

# Private pages

Disallow: /*orderby=

Disallow: /*orderway=

Disallow: /*tag=

Disallow: /*id_currency=

Disallow: /*search_query=

Disallow: /*back=

Disallow: /*utm_source=

Disallow: /*utm_medium=

Disallow: /*utm_campaign=

Disallow: /*n=

Disallow: /*controller=addresses

Disallow: /*controller=address

Disallow: /*controller=authentication

Disallow: /*controller=cart

Disallow: /*controller=discount

Disallow: /*controller=footer

Disallow: /*controller=get-file

Disallow: /*controller=header

Disallow: /*controller=history

Disallow: /*controller=identity

Disallow: /*controller=images.inc

Disallow: /*controller=init

Disallow: /*controller=my-account

Disallow: /*controller=order

Disallow: /*controller=order-opc

Disallow: /*controller=order-slip

Disallow: /*controller=order-detail

Disallow: /*controller=order-follow

Disallow: /*controller=order-return

Disallow: /*controller=order-confirmation

Disallow: /*controller=pagination

Disallow: /*controller=password

Disallow: /*controller=pdf-invoice

Disallow: /*controller=pdf-order-return

Disallow: /*controller=pdf-order-slip

Disallow: /*controller=product-sort

Disallow: /*controller=search

Disallow: /*controller=statistics

Disallow: /*controller=attachment

Disallow: /*controller=guest-tracking

# Directories

Disallow: /*classes/

Disallow: /*config/

Disallow: /*download/

Disallow: /*mails/

Disallow: /*modules/

Disallow: /*translations/

Disallow: /*tools/

# Files

Disallow: /*en/password-recovery

Disallow: /*en/address

Disallow: /*en/addresses

Disallow: /*en/authentication

Disallow: /*en/cart

Disallow: /*en/discount

Disallow: /*en/order-history

Disallow: /*en/identity

Disallow: /*en/my-account

Disallow: /*en/order-follow

Disallow: /*en/order-slip

Disallow: /*en/order

Disallow: /*en/search

Disallow: /*en/quick-order

Disallow: /*en/guest-tracking

Disallow: /*en/order-confirmation

 

I'm worried that as if my site is crawled, and I edit/delete certain CMS/product/category pages, I'll end up with too many 404's before my site is actually launched.

 

I attached the exact test result message Google Webmaster Tool returns when testing the robots.txt file's configuration.

 

Any advice? I'd appreciate an expert Prestashop SEO's opinion about this issue.

 

Thanks!

 

 

 

-Alex

post-581170-0-04373000-1372274374_thumb.jpg

Edited by byoung85 (see edit history)
Link to comment
Share on other sites

Let me clarify.

 

I revised the Prestashop generated robots.txt file in the root directory of my website, and replaced the entire code with this code only:

 

User-agent: *

Disallow: /

 

 

Even after doing this, I'm still getting the same message from Google Webmaster Tool stating that Googlebot is "allowed" to crawl the website (same error message I've attached on the original post of this thread).

Link to comment
Share on other sites

this is what I see in your sites robots.txt

 

# robots.txt automaticaly generated by PrestaShop e-commerce open-source solution
# http://www.prestashop.com - http://www.prestashop.com/forums
# This file is to prevent the crawling and indexing of certain parts
# of your site by web crawlers and spiders run by sites like Yahoo!
# and Google. By telling these "robots" where not to go on your site,
# you save bandwidth and server resources.
# For more information about the robots.txt standard, see:
# http://www.robotstxt.org/wc/robots.html
# Googlebot specific
User-agent: Googlebot
Disallow: /
# Private pages
Disallow: /*orderby=
Disallow: /*orderway=
Disallow: /*tag=
Disallow: /*id_currency=
Disallow: /*search_query=
Disallow: /*back=
Disallow: /*utm_source=
Disallow: /*utm_medium=
Disallow: /*utm_campaign=
Disallow: /*n=
Disallow: /*controller=addresses
Disallow: /*controller=address
Disallow: /*controller=authentication
Disallow: /*controller=cart
Disallow: /*controller=discount
Disallow: /*controller=footer
Disallow: /*controller=get-file
Disallow: /*controller=header
Disallow: /*controller=history
Disallow: /*controller=identity
Disallow: /*controller=images.inc
Disallow: /*controller=init
Disallow: /*controller=my-account
Disallow: /*controller=order
Disallow: /*controller=order-opc
Disallow: /*controller=order-slip
Disallow: /*controller=order-detail
Disallow: /*controller=order-follow
Disallow: /*controller=order-return
Disallow: /*controller=order-confirmation
Disallow: /*controller=pagination
Disallow: /*controller=password
Disallow: /*controller=pdf-invoice
Disallow: /*controller=pdf-order-return
Disallow: /*controller=pdf-order-slip
Disallow: /*controller=product-sort
Disallow: /*controller=search
Disallow: /*controller=statistics
Disallow: /*controller=attachment
Disallow: /*controller=guest-tracking
# Directories
Disallow: /*classes/
Disallow: /*config/
Disallow: /*download/
Disallow: /*mails/
Disallow: /*modules/
Disallow: /*translations/
Disallow: /*tools/
# Files
Disallow: /*en/password-recovery
Disallow: /*en/address
Disallow: /*en/addresses
Disallow: /*en/authentication
Disallow: /*en/cart
Disallow: /*en/discount
Disallow: /*en/order-history
Disallow: /*en/identity
Disallow: /*en/my-account
Disallow: /*en/order-follow
Disallow: /*en/order-slip
Disallow: /*en/order
Disallow: /*en/search
Disallow: /*en/quick-order
Disallow: /*en/guest-tracking
Disallow: /*en/order-confirmation

Link to comment
Share on other sites

  • 3 weeks later...

Hi all,

 

I have a different issue. Google has indexed more than 1500 pages on my website, but more than 1100 are blocked by robots.txt. Any suggestions? website is www.rebelism.ro with standard robots.txt and sitemap.xml in the specific directories.

 

Thank you,

Link to comment
Share on other sites

Data from google webmaster tools are:

 

Sitemaps:

942 URLs submitted

863 URLs indexed

 

URL Errors 26 Not found

 

Google Index - index status

 

Total indexed 1,554

Blocked by robots 1,074

I am a novice in interpreting the above, but I'm wondering why there are so many blocked. Will my site appear in google search with the result from indexed URL's or part of them (even though indexed) will not be shown as they could be blocked by robots?

 

I've checked with different free online SERP's checkers and most of the results show as 0 (none), except one string, but my store have hundreds of products. These could be also because it's in google.ro and SERP's took data from .com sites.

 

I have never done any manual change to robots.txt and today I just saw the high number of blocked URL's.

Edited by milea_sorin (see edit history)
Link to comment
Share on other sites

Hi!

in related to crawlers, my product comments always got many unrelated post in my products... is this a web crawlers?

although i set my "all comments must be validated by an employee" to yes, i dont want to allow this kind of post because i dont know where it came from... in fact the last id of my comments exceeds now in 1000+ but only got two real comments related to my products.

Link to comment
Share on other sites

While searching online for your issue I found somewhere that "To remove your site from search engines and prevent all robots from crawling it in the future, place the following robots.txt file in your server root as

User-agent: *

Disallow: /

To remove your site from Search Engine only and prevent just Googlebot from crawling your site in the future, place the following robots.txt file in your server root:

User-agent: Googlebot

Disallow: /

Doing this and submitting via the automatic URL removal system will cause a temporary, 180 day removal of your site from the Search Engine index, regardless of whether you remove the robots.txt file after processing your request."

Edited by aaronweb (see edit history)
Link to comment
Share on other sites

  • 8 months later...
  • 1 year later...

you could just replace the existing contents of robots.txt with:

 

User-agent: *
Disallow: /

Please let me know that adding this code blocks the whole domain from crawling?

What to do in the case, if I want to disallow only certain pages. 

Then how will be this code?

Link to comment
Share on other sites

  • 1 year later...

Hi. I have the cloud version of prestashop 1.6.1.3

 

On my google webmaster tools account i got this:

 

# robots.txt automaticaly generated by PrestaShop e-commerce open-source solution
# This file is to prevent the crawling and indexing of certain parts
# of your site by web crawlers and spiders run by sites like Yahoo!
# and Google. By telling these "robots" where not to go on your site,
# you save bandwidth and server resources.
# For more information about the robots.txt standard, see:
User-agent: *
# Private pages
Disallow: /*?orderby=
Disallow: /*?orderway=
Disallow: /*?tag=
Disallow: /*?id_currency=
Disallow: /*?search_query=
Disallow: /*?back=
Disallow: /*?n=
Disallow: /*&orderby=
Disallow: /*&orderway=
Disallow: /*&tag=
Disallow: /*&id_currency=
Disallow: /*&search_query=
Disallow: /*&back=
Disallow: /*&n=
Disallow: /*controller=addresses
Disallow: /*controller=address
Disallow: /*controller=authentication
Disallow: /*controller=cart
Disallow: /*controller=discount
Disallow: /*controller=footer
Disallow: /*controller=get-file
Disallow: /*controller=header
Disallow: /*controller=history
Disallow: /*controller=identity
Disallow: /*controller=images.inc
Disallow: /*controller=init
Disallow: /*controller=my-account
Disallow: /*controller=order
Disallow: /*controller=order-opc
Disallow: /*controller=order-slip
Disallow: /*controller=order-detail
Disallow: /*controller=order-follow
Disallow: /*controller=order-return
Disallow: /*controller=order-confirmation
Disallow: /*controller=pagination
Disallow: /*controller=password
Disallow: /*controller=pdf-invoice
Disallow: /*controller=pdf-order-return
Disallow: /*controller=pdf-order-slip
Disallow: /*controller=product-sort
Disallow: /*controller=search
Disallow: /*controller=statistics
Disallow: /*controller=attachment
Disallow: /*controller=guest-tracking
# Directories
Disallow: */classes/
Disallow: */config/
Disallow: */download/
Disallow: */mails/
Disallow: */modules/
Disallow: */translations/
Disallow: */tools/
# Files
Disallow: /*en/password-recovery
Disallow: /*en/address
Disallow: /*en/addresses
Disallow: /*en/login
Disallow: /*en/cart
Disallow: /*en/discount
Disallow: /*en/order-history
Disallow: /*en/identity
Disallow: /*en/my-account
Disallow: /*en/order-follow
Disallow: /*en/credit-slip
Disallow: /*en/order
Disallow: /*en/search
Disallow: /*en/quick-order
Disallow: /*en/guest-tracking
Disallow: /*en/order-confirmation
Disallow: /*es/password-recovery
Disallow: /*es/address
Disallow: /*es/addresses
Disallow: /*es/login
Disallow: /*es/caja
Disallow: /*es/discount
Disallow: /*es/order-history
Disallow: /*es/identity
Disallow: /*es/my-account
Disallow: /*es/order-follow
Disallow: /*es/credit-slip
Disallow: /*es/order
Disallow: /*es/search
Disallow: /*es/quick-order
Disallow: /*es/guest-tracking
Disallow: /*es/order-confirmation
# Sitemap
# Directories Allowed By Module Blocked Resources for PrestaShop 
Allow: */modules/socialsharing/
Allow: */modules/blockbanner/
Allow: */modules/bankwire/
Allow: */modules/blockbestsellers/
Allow: */modules/blockcart/
Allow: */modules/blocksocial/
Allow: */modules/blockcurrencies/
Allow: */modules/blockfacebook/
Allow: */modules/blocklayered/
Allow: */modules/blockcms/
Allow: */modules/blockcmsinfo/
Allow: */modules/blockcontact/
Allow: */modules/blockcontactinfos/
Allow: */modules/blockmanufacturer/
Allow: */modules/blockmyaccount/
Allow: */modules/blockmyaccountfooter/
Allow: */modules/blocknewproducts/
Allow: */modules/blocknewsletter/
Allow: */modules/blockpaymentlogo/
Allow: */modules/blocksearch/
Allow: */modules/blockspecials/
Allow: */modules/blockstore/
Allow: */modules/blocksupplier/
Allow: */modules/blocktags/
Allow: */modules/blocktopmenu/
Allow: */modules/blockuserinfo/
Allow: */modules/blockviewed/
Allow: */modules//
Allow: */modules/dashactivity/
Allow: */modules/dashtrends/
Allow: */modules/dashgoals/
Allow: */modules/dashproducts/
Allow: */modules/graphnvd3/
Allow: */modules/gridhtml/
Allow: */modules/homeslider/
Allow: */modules/homefeatured/
Allow: */modules/productpaymentlogos/
Allow: */modules/pagesnotfound/
Allow: */modules/sekeywords/
Allow: */modules/statsbestcategories/
Allow: */modules/statsbestcustomers/
Allow: */modules/statsbestproducts/
Allow: */modules/statsbestsuppliers/
Allow: */modules/statsbestvouchers/
Allow: */modules/statscatalog/
Allow: */modules/statscheckup/
Allow: */modules/statsdata/
Allow: */modules/statsequipment/
Allow: */modules/statsforecast/
Allow: */modules/statslive/
Allow: */modules/statsnewsletter/
Allow: */modules/statsorigin/
Allow: */modules/statspersonalinfos/
Allow: */modules/statsregistrations/
Allow: */modules/statssales/
Allow: */modules/statssearch/
Allow: */modules/statsstock/
Allow: */modules/statsvisits/
Allow: */modules/themeconfigurator/
Allow: */modules/gamification/
Allow: */modules/blockwishlist/
Allow: */modules/sendtoafriend/
Allow: */modules/cronjobs/
Allow: */modules/pssupport/
Allow: */modules/blockcategories/
Allow: */modules/followup/
Allow: */modules/blocklink/
Allow: */modules/blockpermanentlinks/
Allow: */modules/carriercompare/
Allow: */modules/ganalytics/
Allow: */modules/newsletter/
Allow: */modules/productscategory/
Allow: */modules/referralprogram/
Allow: */modules/gsitemap/
Allow: */modules/powatag/
Allow: */modules//
Allow: */modules/sarbacanedesktop/
Allow: */modules/blockcustomerprivacy/
Allow: */modules/shopgate/
Allow: */modules/yotpo/
Allow: */modules/cashondelivery/
Allow: */modules/producttooltip/
Allow: */modules/crossselling/
Allow: */modules/favoriteproducts/
Allow: */modules/statsproduct/
Allow: */modules/statscarrier/
Allow: */modules/blocksharefb/
Allow: */modules/gapi/
Allow: */modules/getresponse/
Allow: */modules/dateofdelivery/
Allow: */modules/magnalister/
Allow: */modules/blocklanguages/
Allow: */modules/vipadvancedurl/
Allow: */modules/brinkscheckout/
Allow: */modules/fsadconversion/
Allow: */modules/newsletterpopupli/
Allow: */modules/mailalerts/
Allow: */modules/loyalty/
Allow: */modules/productcomments/
Allow: */modules//
Allow: */modules/livechatpro/
Allow: */modules/mailchimp/
Allow: */modules/blockedresources/
Allow: */modules/lgsitemaps/
Allow: */modules/pdgooglehreflangtag/
 
I want to ALLOW everything. I dont want to have disallow at all.
I have change the file to:
 

# robots.txt automaticaly generated by PrestaShop e-commerce open-source solution

# http://www.prestashop.com - http://www.prestashop.com/forums

# This file is to prevent the crawling and indexing of certain parts

# of your site by web crawlers and spiders run by sites like Yahoo!

# and Google. By telling these "robots" where not to go on your site,

# you save bandwidth and server resources.

# For more information about the robots.txt standard, see:

# http://www.robotstxt.org/robotstxt.html

User-agent: *

# Private pages

allow: /*?orderby=

allow: /*?orderway=

allow: /*?tag=

alow: /*?id_currency=

allow: /*?search_query=

Allow: /*?back=

Allow: /*?n=

Allow: /*&orderby=

Allow: /*&orderway=

Allow: /*&tag=

Allow: /*&id_currency=

Allow: /*&search_query=

Allow: /*&back=

Allow: /*&n=

Allow: /*controller=addresses

Allow: /*controller=address

Allow: /*controller=authentication

Allow: /*controller=cart

Allow: /*controller=discount

Allow: /*controller=footer

Allow: /*controller=get-file

Allow: /*controller=header

Allow: /*controller=history

Allow: /*controller=identity

Allow: /*controller=images.inc

Allow: /*controller=init

Allow: /*controller=my-account

Allow: /*controller=order

Allow: /*controller=order-opc

Allow: /*controller=order-slip

Allow: /*controller=order-detail

Allow: /*controller=order-follow

Allow: /*controller=order-return

Allow: /*controller=order-confirmation

Allow: /*controller=pagination

Allow: /*controller=password

Allow: /*controller=pdf-invoice

Allow: /*controller=pdf-order-return

Allow: /*controller=pdf-order-slip

Allow: /*controller=product-sort

Allow: /*controller=search

Allow: /*controller=statistics

Allow: /*controller=attachment

Allow: /*controller=guest-tracking

# Directories

Allow: */classes/

Allow: */config/

Allow: */download/

Allow: */mails/

Allow: */modules/

Allow: */translations/

Allow: */tools/

# Files

Allow: /*en/password-recovery

Allow: /*en/address

Allow: /*en/addresses

Allow: /*en/login

Allow: /*en/cart

Allow: /*en/discount

Allow: /*en/order-history

Allow: /*en/identity

Allow: /*en/my-account

Allow: /*en/order-follow

Allow: /*en/credit-slip

Allow: /*en/order

Allow: /*en/search

Allow: /*en/quick-order

Allow: /*en/guest-tracking

Allow: /*en/order-confirmation

Allow: /*es/password-recovery

Allow: /*es/address

Allow: /*es/addresses

Allow: /*es/login

Allow: /*es/caja

Allow: /*es/discount

Allow: /*es/order-history

Allow: /*es/identity

Allow: /*es/my-account

Allow: /*es/order-follow

Allow: /*es/credit-slip

Allow: /*es/order

Allow: /*es/search

Allow: /*es/quick-order

Allow: /*es/guest-tracking

Allow: /*es/order-confirmation

# Sitemap

Sitemap: https://plum.pswebstore.com/1_index_sitemap.xml

# Directories Allowed By Module Blocked Resources for PrestaShop

Allow: */modules/socialsharing/

Allow: */modules/blockbanner/

Allow: */modules/bankwire/

Allow: */modules/blockbestsellers/

Allow: */modules/blockcart/

Allow: */modules/blocksocial/

Allow: */modules/blockcurrencies/

Allow: */modules/blockfacebook/

Allow: */modules/blocklayered/

Allow: */modules/blockcms/

Allow: */modules/blockcmsinfo/

Allow: */modules/blockcontact/

Allow: */modules/blockcontactinfos/

Allow: */modules/blockmanufacturer/

Allow: */modules/blockmyaccount/

Allow: */modules/blockmyaccountfooter/

Allow: */modules/blocknewproducts/

Allow: */modules/blocknewsletter/

Allow: */modules/blockpaymentlogo/

Allow: */modules/blocksearch/

Allow: */modules/blockspecials/

Allow: */modules/blockstore/

Allow: */modules/blocksupplier/

Allow: */modules/blocktags/

Allow: */modules/blocktopmenu/

Allow: */modules/blockuserinfo/

Allow: */modules/blockviewed/

Allow: */modules/cheque/

Allow: */modules/dashactivity/

Allow: */modules/dashtrends/

Allow: */modules/dashgoals/

Allow: */modules/dashproducts/

Allow: */modules/graphnvd3/

Allow: */modules/gridhtml/

Allow: */modules/homeslider/

Allow: */modules/homefeatured/

Allow: */modules/productpaymentlogos/

Allow: */modules/pagesnotfound/

Allow: */modules/sekeywords/

Allow: */modules/statsbestcategories/

Allow: */modules/statsbestcustomers/

Allow: */modules/statsbestproducts/

Allow: */modules/statsbestsuppliers/

Allow: */modules/statsbestvouchers/

Allow: */modules/statscatalog/

Allow: */modules/statscheckup/

Allow: */modules/statsdata/

Allow: */modules/statsequipment/

Allow: */modules/statsforecast/

Allow: */modules/statslive/

Allow: */modules/statsnewsletter/

Allow: */modules/statsorigin/

Allow: */modules/statspersonalinfos/

Allow: */modules/statsregistrations/

Allow: */modules/statssales/

Allow: */modules/statssearch/

Allow: */modules/statsstock/

Allow: */modules/statsvisits/

Allow: */modules/themeconfigurator/

Allow: */modules/gamification/

Allow: */modules/blockwishlist/

Allow: */modules/sendtoafriend/

Allow: */modules/cronjobs/

Allow: */modules/pssupport/

Allow: */modules/blockcategories/

Allow: */modules/followup/

Allow: */modules/blocklink/

Allow: */modules/blockpermanentlinks/

Allow: */modules/carriercompare/

Allow: */modules/ganalytics/

Allow: */modules/newsletter/

Allow: */modules/productscategory/

Allow: */modules/referralprogram/

Allow: */modules/gsitemap/

Allow: */modules/powatag/

Allow: */modules/sendinblue/

Allow: */modules/sarbacanedesktop/

Allow: */modules/blockcustomerprivacy/

Allow: */modules/shopgate/

Allow: */modules/yotpo/

Allow: */modules/cashondelivery/

Allow: */modules/producttooltip/

Allow: */modules/crossselling/

Allow: */modules/favoriteproducts/

Allow: */modules/statsproduct/

Allow: */modules/statscarrier/

Allow: */modules/blocksharefb/

Allow: */modules/gapi/

Allow: */modules/getresponse/

Allow: */modules/dateofdelivery/

Allow: */modules/magnalister/

Allow: */modules/blocklanguages/

Allow: */modules/vipadvancedurl/

Allow: */modules/brinkscheckout/

Allow: */modules/fsadconversion/

Allow: */modules/newsletterpopupli/

Allow: */modules/mailalerts/

Allow: */modules/loyalty/

Allow: */modules/productcomments/

Allow: */modules//

Allow: */modules/livechatpro/

Allow: */modules/mailchimp/

Allow: */modules/blockedresources/

Allow: */modules/lgsitemaps/

Allow: */modules/pdgooglehreflangtag/

 

My problem is that I dont know where to locate the root domain of my shop in order to file the new robots.txt file

I have enter my FTP and I can not find it anywhere

Can someone please help me

 

My store is 1.6.1.3 cloud version of prestashop

www.plumshoponline.com

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...