Jump to content

Robots / Spiders creating shopping carts


Guest

Recommended Posts

Recently upgarded to 1.5.3.1 and noticed we are getting a new shopping cart created avery cinute or so, always with one item in it, and then abandoned

 

I guess it is search engine bots but how do I stop it

 

I have generated the robots.txt file and it is in the route of the website

 

 

# robots.txt automaticaly generated by PrestaShop e-commerce open-source solution

# http://www.prestashop.com - http://www.prestashop.com/forums

# This file is to prevent the crawling and indexing of certain parts

# of your site by web crawlers and spiders run by sites like Yahoo!

# and Google. By telling these "robots" where not to go on your site,

# you save bandwidth and server resources.

# For more information about the robots.txt standard, see:

# http://www.robotstxt.org/wc/robots.html

User-agent: *

# Private pages

Disallow: /*orderby=

Disallow: /*orderway=

Disallow: /*tag=

Disallow: /*id_currency=

Disallow: /*search_query=

Disallow: /*back=

Disallow: /*utm_source=

Disallow: /*utm_medium=

Disallow: /*utm_campaign=

Disallow: /*n=

Disallow: /*controller=addresses

Disallow: /*controller=address

Disallow: /*controller=authentication

Disallow: /*controller=cart

Disallow: /*controller=discount

Disallow: /*controller=footer

Disallow: /*controller=get-file

Disallow: /*controller=header

Disallow: /*controller=history

Disallow: /*controller=identity

Disallow: /*controller=images.inc

Disallow: /*controller=init

Disallow: /*controller=my-account

Disallow: /*controller=order

Disallow: /*controller=order-opc

Disallow: /*controller=order-slip

Disallow: /*controller=order-detail

Disallow: /*controller=order-follow

Disallow: /*controller=order-return

Disallow: /*controller=order-confirmation

Disallow: /*controller=pagination

Disallow: /*controller=password

Disallow: /*controller=pdf-invoice

Disallow: /*controller=pdf-order-return

Disallow: /*controller=pdf-order-slip

Disallow: /*controller=product-sort

Disallow: /*controller=search

Disallow: /*controller=statistics

Disallow: /*controller=attachment

Disallow: /*controller=guest-tracking

# Directories

Disallow: /*classes/

Disallow: /*config/

Disallow: /*download/

Disallow: /*mails/

Disallow: /*modules/

Disallow: /*translations/

Disallow: /*tools/

# Files

Disallow: /*en/password-recovery

Disallow: /*en/address

Disallow: /*en/addresses

Disallow: /*en/authentication

Disallow: /*en/cart

Disallow: /*en/discount

Disallow: /*en/order-history

Disallow: /*en/identity

Disallow: /*en/my-account

Disallow: /*en/order-follow

Disallow: /*en/order-slip

Disallow: /*en/order

Disallow: /*en/search

Disallow: /*en/quick-order

Disallow: /*en/guest-tracking

 

 

 

This website chcks robots files

http://tool.motoricerca.info/robots-checker.phtml

 

And it reports this kind of error

 

 

Line 74 Disallow: /*en/guest-tracking

The "*" wildchar in file names is not supported by (all) the user-agents addressed by this block of code. You should use the wildchar "*" in a block of code exclusively addressed to spiders that support the wildchar (Eg. Googlebot). Line 75

 

WARNING: The tool has found some directory paths that don't include a trailing slash character.

Since a missing trailing slash can be both a deliberate decision or an error, and since this tool can't ipotize the real intentions of the webmaster, here follow some clarifications that could prevent a potential problem:

The following command will disable just the directory "private" and all its contents:

Disallow: /private/

...while the following command will disable both the "private" directory and any file or directory path starting with the text "/private" (so "/private-eye.html", "/privateroom/page.html", etc.):

Disallow: /private

 

 

So does this mean the robts file is not working correctly (hence all the extra shopping carts) or is it some other issue?

Link to comment
Share on other sites

Really there is nothing you can do about it, the spiders are going to do what they want..

 

ReallY? this is hundreds a day.

Ah well, I will just delete them in the table :(

Link to comment
Share on other sites

If it is hundreds a day, start reverse ip searching them. I know google and bing have a tool that you can enter the ip address in and it will tell you if it was them. I would try to locate where they are coming from. If you only sell locally ( like jsut in the us or just in the eu) you might consider blocking baidu if it is them, there bot does things like that.

Link to comment
Share on other sites

  • 1 month later...

I found the "bots" were putting 1 item in each cart every few minutes. Also I have a live chat / tracking system on (e.g. livezilla, or Zopim) which shows no one on the site at the time of the carts being made

 

Anyway I delete all unconverted carts after a few days anyway, just to try and keep the database size down

Edited by Guest (see edit history)
Link to comment
Share on other sites

  • 1 month later...
  • 3 months later...

How can I block an IP?

Thanks in advance!

 

PS 1.3.2.3.

 

You will need access to your site logs to see the Ip address of visitors, then lookup the ip address to see if they are bots

Use a site such as this:

http://whatismyipaddress.com/ip-lookup

 

Then to block that IP address you can add the code to your htaccess file: this site will generate the code for you

http://whatismyipaddress.com/ip-lookup

Link to comment
Share on other sites

Thanks haylau and dh42.

I checked the IP addresses to obtain a list of them I would like to block. My problem now is that I don't know the code to do it. Could you help me? This is my .htaccess code:

 

# .htaccess automaticaly generated by PrestaShop e-commerce open-source solution
# http://www.prestashop.com - http://www.prestashop.com/forums

# URL rewriting module activation
RewriteEngine on
RewriteCond %{HTTP_HOST} ^myshop.com
RewriteRule ^(.*)$ http://www.myshop.com/tienda/$1 [R=301,L]

# URL rewriting rules
RewriteRule ^([a-z0-9]+)\-([a-z0-9]+)(\-[_a-zA-Z0-9-]*)/([_a-zA-Z0-9-]*)\.jpg$ /tienda/img/p/$1-$2$3.jpg [QSA,L,E]
RewriteRule ^([0-9]+)\-([0-9]+)/([_a-zA-Z0-9-]*)\.jpg$ /tienda/img/p/$1-$2.jpg [QSA,L,E]
RewriteRule ^([0-9]+)(\-[_a-zA-Z0-9-]*)/([_a-zA-Z0-9-]*)\.jpg$ /tienda/img/c/$1$2.jpg [QSA,L,E]
RewriteRule ^lang-([a-z]{2})/([a-zA-Z0-9-]*)/([0-9]+)\-([a-zA-Z0-9-]*)\.html(.*)$ /tienda/product.php?id_product=$3&isolang=$1$5 [QSA,L,E]
RewriteRule ^lang-([a-z]{2})/([0-9]+)\-([a-zA-Z0-9-]*)\.html(.*)$ /tienda/product.php?id_product=$2&isolang=$1$4 [QSA,L,E]
RewriteRule ^lang-([a-z]{2})/([0-9]+)\-([a-zA-Z0-9-]*)(.*)$ /tienda/category.php?id_category=$2&isolang=$1 [QSA,L,E]
RewriteRule ^([a-zA-Z0-9-]*)/([0-9]+)\-([a-zA-Z0-9-]*)\.html(.*)$ /tienda/product.php?id_product=$2$4 [QSA,L,E]
RewriteRule ^([0-9]+)\-([a-zA-Z0-9-]*)\.html(.*)$ /tienda/product.php?id_product=$1$3 [QSA,L,E]
RewriteRule ^([0-9]+)\-([a-zA-Z0-9-]*)(.*)$ /tienda/category.php?id_category=$1 [QSA,L,E]
RewriteRule ^content/([0-9]+)\-([a-zA-Z0-9-]*)(.*)$ /tienda/cms.php?id_cms=$1 [QSA,L,E]
RewriteRule ^([0-9]+)__([a-zA-Z0-9-]*)(.*)$ /tienda/supplier.php?id_supplier=$1$3 [QSA,L,E]
RewriteRule ^([0-9]+)_([a-zA-Z0-9-]*)(.*)$ /tienda/manufacturer.php?id_manufacturer=$1$3 [QSA,L,E]
RewriteRule ^lang-([a-z]{2})/(.*)$ /tienda/$2?isolang=$1 [QSA,L,E]

# Catch 404 errors
ErrorDocument 404 /tienda/404.php

Edited by c.carlos.s (see edit history)
Link to comment
Share on other sites

Thanks haylau and dh42.

I checked the IP addresses to obtain a list of them I would like to block. My problem now is that I don't know the code to do it. Could you help me? This is my .htaccess code:

 

# .htaccess automaticaly generated by PrestaShop e-commerce open-source solution
# http://www.prestashop.com - http://www.prestashop.com/forums

# URL rewriting module activation
RewriteEngine on
RewriteCond %{HTTP_HOST} ^myshop.com
RewriteRule ^(.*)$ http://www.myshop.com/tienda/$1 [R=301,L]

# URL rewriting rules
RewriteRule ^([a-z0-9]+)\-([a-z0-9]+)(\-[_a-zA-Z0-9-]*)/([_a-zA-Z0-9-]*)\.jpg$ /tienda/img/p/$1-$2$3.jpg [QSA,L,E]
RewriteRule ^([0-9]+)\-([0-9]+)/([_a-zA-Z0-9-]*)\.jpg$ /tienda/img/p/$1-$2.jpg [QSA,L,E]
RewriteRule ^([0-9]+)(\-[_a-zA-Z0-9-]*)/([_a-zA-Z0-9-]*)\.jpg$ /tienda/img/c/$1$2.jpg [QSA,L,E]
RewriteRule ^lang-([a-z]{2})/([a-zA-Z0-9-]*)/([0-9]+)\-([a-zA-Z0-9-]*)\.html(.*)$ /tienda/product.php?id_product=$3&isolang=$1$5 [QSA,L,E]
RewriteRule ^lang-([a-z]{2})/([0-9]+)\-([a-zA-Z0-9-]*)\.html(.*)$ /tienda/product.php?id_product=$2&isolang=$1$4 [QSA,L,E]
RewriteRule ^lang-([a-z]{2})/([0-9]+)\-([a-zA-Z0-9-]*)(.*)$ /tienda/category.php?id_category=$2&isolang=$1 [QSA,L,E]
RewriteRule ^([a-zA-Z0-9-]*)/([0-9]+)\-([a-zA-Z0-9-]*)\.html(.*)$ /tienda/product.php?id_product=$2$4 [QSA,L,E]
RewriteRule ^([0-9]+)\-([a-zA-Z0-9-]*)\.html(.*)$ /tienda/product.php?id_product=$1$3 [QSA,L,E]
RewriteRule ^([0-9]+)\-([a-zA-Z0-9-]*)(.*)$ /tienda/category.php?id_category=$1 [QSA,L,E]
RewriteRule ^content/([0-9]+)\-([a-zA-Z0-9-]*)(.*)$ /tienda/cms.php?id_cms=$1 [QSA,L,E]
RewriteRule ^([0-9]+)__([a-zA-Z0-9-]*)(.*)$ /tienda/supplier.php?id_supplier=$1$3 [QSA,L,E]
RewriteRule ^([0-9]+)_([a-zA-Z0-9-]*)(.*)$ /tienda/manufacturer.php?id_manufacturer=$1$3 [QSA,L,E]
RewriteRule ^lang-([a-z]{2})/(.*)$ /tienda/$2?isolang=$1 [QSA,L,E]

# Catch 404 errors
ErrorDocument 404 /tienda/404.php

 

why not start here rather than the .htaccess

http://www.prestashop.com/forums/topic/245300-module-free-ban-ip-address-block-any-user-you-want/

Link to comment
Share on other sites

Because that module only block one IP address and I need to block several IP address, so I think the best way is modifying the .htaccess file.

Thanks anyway.

 

point being, if you like the module..you can upgrade it to unlimited...and modifying the .htaccess isn't apparently the best way since you don't know how, if put in the wrong place can be overwritten by ps....

 

but if you pursue the .htaccess method, then you should be looking/searching outside of ps, as you will find better answers/resources...

Link to comment
Share on other sites

point being, if you like the module..you can upgrade it to unlimited...and modifying the .htaccess isn't apparently the best way since you don't know how, if put in the wrong place can be overwritten by ps....

 

but if you pursue the .htaccess method, then you should be looking/searching outside of ps, as you will find better answers/resources...

 

Well, I was going to upgrade the module you recommended me (http://www.prestashop.com/forums/topic/245300-module-free-ban-ip-address-block-any-user-you-want/) but I checked that it doesn't work in my PS 1.3.2.3. Yesterday I entered the IP address 173.199.114.235 (ahrefs robot) which create a lot of phantom carts in my shop but it is still doing it! Not blocking.

 

Could anybody help me to modify my .htaccess or tell me where to search for any solution? This is awful...

Thanks.

Link to comment
Share on other sites

Thanks haylau and dh42.

I checked the IP addresses to obtain a list of them I would like to block. My problem now is that I don't know the code to do it. Could you help me? This is my .htaccess code:

 

# .htaccess automaticaly generated by PrestaShop e-commerce open-source solution
# http://www.prestashop.com - http://www.prestashop.com/forums

# URL rewriting module activation
RewriteEngine on
RewriteCond %{HTTP_HOST} ^myshop.com
RewriteRule ^(.*)$ http://www.myshop.com/tienda/$1 [R=301,L]

# URL rewriting rules
RewriteRule ^([a-z0-9]+)\-([a-z0-9]+)(\-[_a-zA-Z0-9-]*)/([_a-zA-Z0-9-]*)\.jpg$ /tienda/img/p/$1-$2$3.jpg [QSA,L,E]
RewriteRule ^([0-9]+)\-([0-9]+)/([_a-zA-Z0-9-]*)\.jpg$ /tienda/img/p/$1-$2.jpg [QSA,L,E]
RewriteRule ^([0-9]+)(\-[_a-zA-Z0-9-]*)/([_a-zA-Z0-9-]*)\.jpg$ /tienda/img/c/$1$2.jpg [QSA,L,E]
RewriteRule ^lang-([a-z]{2})/([a-zA-Z0-9-]*)/([0-9]+)\-([a-zA-Z0-9-]*)\.html(.*)$ /tienda/product.php?id_product=$3&isolang=$1$5 [QSA,L,E]
RewriteRule ^lang-([a-z]{2})/([0-9]+)\-([a-zA-Z0-9-]*)\.html(.*)$ /tienda/product.php?id_product=$2&isolang=$1$4 [QSA,L,E]
RewriteRule ^lang-([a-z]{2})/([0-9]+)\-([a-zA-Z0-9-]*)(.*)$ /tienda/category.php?id_category=$2&isolang=$1 [QSA,L,E]
RewriteRule ^([a-zA-Z0-9-]*)/([0-9]+)\-([a-zA-Z0-9-]*)\.html(.*)$ /tienda/product.php?id_product=$2$4 [QSA,L,E]
RewriteRule ^([0-9]+)\-([a-zA-Z0-9-]*)\.html(.*)$ /tienda/product.php?id_product=$1$3 [QSA,L,E]
RewriteRule ^([0-9]+)\-([a-zA-Z0-9-]*)(.*)$ /tienda/category.php?id_category=$1 [QSA,L,E]
RewriteRule ^content/([0-9]+)\-([a-zA-Z0-9-]*)(.*)$ /tienda/cms.php?id_cms=$1 [QSA,L,E]
RewriteRule ^([0-9]+)__([a-zA-Z0-9-]*)(.*)$ /tienda/supplier.php?id_supplier=$1$3 [QSA,L,E]
RewriteRule ^([0-9]+)_([a-zA-Z0-9-]*)(.*)$ /tienda/manufacturer.php?id_manufacturer=$1$3 [QSA,L,E]
RewriteRule ^lang-([a-z]{2})/(.*)$ /tienda/$2?isolang=$1 [QSA,L,E]

# Catch 404 errors
ErrorDocument 404 /tienda/404.php

 

Read my post #1- it gives you a link to the code required - but as El patron says you do have to be careful

Link to comment
Share on other sites

Read my post #1- it gives you a link to the code required - but as El patron says you do have to be careful

 

Thank you, but post #1 shows a link to check the robots.txt, nothing about .htaccess (no errors were found in my robots.txt file, however this file doesn't work properly as it doesn't block robots that are creating carts in my shop, even several false carts per minute!).

Link to comment
Share on other sites

to be sure you place the apache directives in the correct place read this

http://www.prestasho...ustom-htaccess/

 

Thanks patron, I've read your link but as you can see in the post #13 my .htaccess doesn't include the lines "start" and "end" (I have PS 1.3.2.3).

Anyway, my problem is that I don't know what code I have to include in my .htaccess to block several IP addresses.

Any help about this?

Thanks!

Link to comment
Share on other sites

Thank you, but post #1 shows a link to check the robots.txt, nothing about .htaccess (no errors were found in my robots.txt file, however this file doesn't work properly as it doesn't block robots that are creating carts in my shop, even several false carts per minute!).

 

Sorry, post #10

 

It is always good to review the whole thread when asking questions to see if it has already been answered

Link to comment
Share on other sites

Sorry, post #10 It is always good to review the whole thread when asking questions to see if it has already been answered

 

Hi haylau, you are absolutely right, but your post #10 haven't got such a link (or I'm not able to find the htaccess generation service in it).

Thanks.

Link to comment
Share on other sites

Thanks patron and haylau. I have used both links (http://www.toshop.com/htaccess-generator.cfm and http://www.htaccesstools.com/block-ips/) to create the htaccess (they gave me different codes). With the new htaccess I have not many phantoms carts but it still happens (now I have around 250 phantom carts for 200 visitors per day). Baidu spider and ahrefs robot are causing this.

 

Any idea?

Thanks again.

Link to comment
Share on other sites

Well, I got block baidu spider and ahrefs bot, but not bing bot which also is creating several false carts... (I would't like to block bing to access my page so I guess I have to accept this).

Here I found an excelent article that explains very clearly how to block bots: http://www.thesitewizard.com/apache/block-bots-with-htaccess.shtml

Thanks community!

Edited by c.carlos.s (see edit history)
Link to comment
Share on other sites

  • 2 years later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...