Jump to content
DataKick

[Free Module] Blackhole for Bad Bots

Recommended Posts

Blackhole For Bad Bots

automagically ban bad bots who don't follow robots.txt instruction

blackholebots.png

This free module is based on a very simple idea:

  1. you instruct all robots visiting your website to NOT open specific url

  2. this module will add hidden link from all pages on your website to this forbidden page. This link is perfectly visible to all robots, but normal visitors will not notice it at all (without looking into web page source code)

  3. when anyone access this forbidden page, his IP address will be immediately added to blacklist

  4. blacklisted visitor are forbidden from viewing content from your website

  5. shop administrator is notified about new entries to blacklist. They will receive email with WHOIS information about the visitor - his IP address, location, network, etc.

And that’s it. This trap will not affect any good robots who are following robots.txt directives. On other hand, all bad bots and crawlers will be eventually trapped and forbidden from ever collecting information from your site again.

 

Module activation

Before you install this module, you need to edit your robots.txt file, and add following two lines

User-agent: *
Disallow: /blackhole/

Now install module and that's all.

Optionally, you can test it by navigating to www.yourdomain.com/blackhole/. You should be banned from your own site. To lift the ban, reset module from your back office.

 

Moderation

At the moment there isn’t any UI to manage blacklist. If you want to remove some IP address from blacklist, you have to make manual changes in database table called ps_blackholebots_blacklist

 

Compatibility

  • Module works on both PS1.6 and PS1.7
  • Module can also be installed on thirtybees fork

 

Download module

 

 

 

Edited by DataKick (see edit history)
  • Like 1

Share this post


Link to post
Share on other sites

I apologize for the link, that was copy and paste mistake, it's been removed from my original post. However, I refuse to remove reference from compatibility section - I see thirtybees as a dialect of prestashop, and I have built this FREE module for all users of prestashop platform.

  • Like 2

Share this post


Link to post
Share on other sites

Very interesting module, I installed it on one shop and we'll see how many bots will fall into the trap, thanks!

Share this post


Link to post
Share on other sites

Found this after looking at the original blackhole for bad bots. Seems to work a treat to block access. Thanks Petr.

Think it will also give me a solution to the constant requests for a wordpress admin login page.

Only issue I've found is no email of the whois data..

 

Share this post


Link to post
Share on other sites
10 hours ago, gritstonecycles said:

Found this after looking at the original blackhole for bad bots. Seems to work a treat to block access. Thanks Petr.

Think it will also give me a solution to the constant requests for a wordpress admin login page.

Only issue I've found is no email of the whois data..

 

 

Yeah, I've forgot to implement the sending email in generic way - there's hardcoded language id 1, and it expects it's english. Stupid me. I'll fix this in next version.

I'm not sure if this catch requests for wp admin - these bots are just trying some predefined urls, and I guess none of them is /blackhole/ :) But I can add this option to next version of the module - you could specify (multiple) blacklisted urls, and access to any one of them would block you.

 

Share this post


Link to post
Share on other sites
18 hours ago, DataKick said:

 

Yeah, I've forgot to implement the sending email in generic way - there's hardcoded language id 1, and it expects it's english. Stupid me. I'll fix this in next version.

I'm not sure if this catch requests for wp admin - these bots are just trying some predefined urls, and I guess none of them is /blackhole/ :) But I can add this option to next version of the module - you could specify (multiple) blacklisted urls, and access to any one of them would block you.

 

 

Using english but thought language id was 2. look forward to the next version.

wrt to wp-admin was thinking of using  .htaccess to redirect requests to the blackhole dir.

 

Share this post


Link to post
Share on other sites
11 hours ago, gritstonecycles said:

 

Using english but thought language id was 2. look forward to the next version.

wrt to wp-admin was thinking of using  .htaccess to redirect requests to the blackhole dir.

 

 

Hi friend, I have to put what in the htaccess of my site ?

Share this post


Link to post
Share on other sites
1 hour ago, Soyons zen said:

 

Hi friend, I have to put what in the htaccess of my site ?

 

Still got to work that out. trying to solve a different .htaccess problem first. Think you should be able to do a redirect of /wp-admin to /blackhole/

Share this post


Link to post
Share on other sites

HI,

I put the code robots.txt: User-agent: *
Disallow: /blackhole/

------------------------------------

.htaccess I put this code User-agent: *
Disallow: /blackhole/  on which line ?
 

Share this post


Link to post
Share on other sites
4 minutes ago, Soyons zen said:

HI,

I put the code robots.txt: User-agent: *
Disallow: /blackhole/

------------------------------------

.htaccess I put this code User-agent: *
Disallow: /blackhole/  on which line ?
 

 

If you are just implementing the blackholebots module you only need the robots.txt entry as per the instructions at the top of this topic. Do not add anything to .htaccess as you described.

Share this post


Link to post
Share on other sites
1 hour ago, gritstonecycles said:

 

If you are just implementing the blackholebots module you only need the robots.txt entry as per the instructions at the top of this topic. Do not add anything to .htaccess as you described.

 

ok Thank you...! 

Share this post


Link to post
Share on other sites
On 04/04/2018 at 11:33 AM, selectshop.at said:

Why advertising for TB on Prestashop Forum ?

 

it is not serious :)

Edited by Soyons zen (see edit history)
  • Like 1

Share this post


Link to post
Share on other sites
5 minutes ago, Soyons zen said:

Since I install the module I have more spam..I usually had 20 to 30 spam a day

 

This module will not protect your from mail spam. It is designed to protects you from bots that do not follow robots.txt instructions.

If there's a dedicated bot that's accessing (for example) only contact form page, than such bot obviously won't be affected by this mechanism. Since it never enters the /blackhole/ trap. To protect yourself from mail spam, use some sort of captcha.  

This blackhole mechanism is useful, for example, to detect and blacklist bots that are generating abandoned carts in your shop.

 

Share this post


Link to post
Share on other sites
1 hour ago, DataKick said:

 

This module will not protect your from mail spam. It is designed to protects you from bots that do not follow robots.txt instructions.

If there's a dedicated bot that's accessing (for example) only contact form page, than such bot obviously won't be affected by this mechanism. Since it never enters the /blackhole/ trap. To protect yourself from mail spam, use some sort of captcha.  

This blackhole mechanism is useful, for example, to detect and blacklist bots that are generating abandoned carts in your shop.

 

 

 

ok Thank you, Since I install this module more Russian spam?

Share this post


Link to post
Share on other sites
Just now, Soyons zen said:

ok Thank you, Since I install this module more Russian spam?

 

That's a coincidence, my friend. Do you have some sort of captcha set up on your contact form?

Share this post


Link to post
Share on other sites
1 hour ago, DataKick said:

 

That's a coincidence, my friend. Do you have some sort of captcha set up on your contact form?

 

Yes coincidence my friend, Yes captcha it despite that I received spam ?

Edited by Soyons zen (see edit history)

Share this post


Link to post
Share on other sites
34 minutes ago, Soyons zen said:

 

Yes captcha it despite that I received spam ?

 

If you receive email spam even when you have captcha installed, then it is either real-world spammer (doubtful), or you don't have captcha set up correctly. I've been investigating such problem for one of my client recently, and I found out that free captcha the module he was using didn't actually check captcha on backend. It only rendered captcha dialog on frontend. Completely useless. 

  • Like 1

Share this post


Link to post
Share on other sites

I've just released new version 1.0.1. If you upgrading from 1.0.0, then you'll have to RESET the module!

 

This release:

- fixes issue with email language - newly, default shop language is used. So do not forget to translate email template to your default language!

- adds support for shops with disabled friendly urls. If you have disabled friendly url, you'll have to add this url to your robots.txt: /modules/blackholebots/blackhole/ Or, to be sure, you can add both:

 

User-agent: *
Disallow: /blackhole/
Disallow: /modules/blackholebots/blackhole/

 

@gritstonecycles - with this new version, you can add this line to your .htaccess file to catch access to /wp_admin 

 

RewriteRule ^wp_admin(.*)$ ${ENV:REWRITEBASE}index.php?fc=module&module=blackholebots&controller=blackhole [NC,L]

 

Somewhere at the beginning of your rewrite rules, must be above rule for dispatcher

 

Edited by DataKick (see edit history)
  • Like 1

Share this post


Link to post
Share on other sites
1 hour ago, DataKick said:

I've just released new version 1.0.1. If you upgrading from 1.0.0, then you'll have to RESET the module!

 

This release:

- fixes issue with email language - newly, default shop language is used. So do not forget to translate email template to your default language!

- adds support for shops with disabled friendly urls. If you have disabled friendly url, you'll have to add this url to your robots.txt: /modules/blackholebots/blackhole/ Or, to be sure, you can add both:

 


User-agent: *
Disallow: /blackhole/
Disallow: /modules/blackholebots/blackhole/

 

@gritstonecycles - with this new version, you can add this line to your .htaccess file to catch access to /wp_admin 

 


RewriteRule ^wp_admin(.*)$ ${ENV:REWRITEBASE}index.php?fc=module&module=blackholebots&controller=blackhole [NC,L]

 

Somewhere at the beginning of your rewrite rules, must be above rule for dispatcher

 

 

ok, thanks my friend i'm testing..!
 

 

Share this post


Link to post
Share on other sites
1 hour ago, DataKick said:

I've just released new version 1.0.1. If you upgrading from 1.0.0, then you'll have to RESET the module!

 

This release:

- fixes issue with email language - newly, default shop language is used. So do not forget to translate email template to your default language!

- adds support for shops with disabled friendly urls. If you have disabled friendly url, you'll have to add this url to your robots.txt: /modules/blackholebots/blackhole/ Or, to be sure, you can add both:

 


User-agent: *
Disallow: /blackhole/
Disallow: /modules/blackholebots/blackhole/

 

@gritstonecycles - with this new version, you can add this line to your .htaccess file to catch access to /wp_admin 

 


RewriteRule ^wp_admin(.*)$ ${ENV:REWRITEBASE}index.php?fc=module&module=blackholebots&controller=blackhole [NC,L]

 

Somewhere at the beginning of your rewrite rules, must be above rule for dispatcher

 

 

 

Thank You, Great job everything works fine..! :) https://www.getdatakick.com/extras/blackholebots/

Edited by Soyons zen (see edit history)

Share this post


Link to post
Share on other sites
1 hour ago, DataKick said:

I've just released new version 1.0.1. If you upgrading from 1.0.0, then you'll have to RESET the module!

 

This release:

- fixes issue with email language - newly, default shop language is used. So do not forget to translate email template to your default language!

- adds support for shops with disabled friendly urls. If you have disabled friendly url, you'll have to add this url to your robots.txt: /modules/blackholebots/blackhole/ Or, to be sure, you can add both:

 


User-agent: *
Disallow: /blackhole/
Disallow: /modules/blackholebots/blackhole/

 

@gritstonecycles - with this new version, you can add this line to your .htaccess file to catch access to /wp_admin 

 


RewriteRule ^wp_admin(.*)$ ${ENV:REWRITEBASE}index.php?fc=module&module=blackholebots&controller=blackhole [NC,L]

 

Somewhere at the beginning of your rewrite rules, must be above rule for dispatcher

 

 

Works very well.

Had to create a "gb" folder in the mails folder for the transalated emails as the transalation function in prestashop couldn't find them. But sorted after that.

 Thank you for the update and the rewrite rule.

 

Share this post


Link to post
Share on other sites

Hi. I recently started getting a lot of russian spam mail too on my test shop. Before that I got a cPanel security warning that I only got when I do heavy file modification. The test shop was completely idle in the meantime, I have two extra domains on which I test stuff. I have very hardcore passwords.

This seems like a useful app. I don't know any programming, though I'm learning things as I go, do you have any resources/reading material on this topic(security. spam, bad bots) that helped you and you liked?

Share this post


Link to post
Share on other sites
1 hour ago, vedrana said:

Hi. I recently started getting a lot of russian spam mail too on my test shop. Before that I got a cPanel security warning that I only got when I do heavy file modification. The test shop was completely idle in the meantime, I have two extra domains on which I test stuff. I have very hardcore passwords.

This seems like a useful app. I don't know any programming, though I'm learning things as I go, do you have any resources/reading material on this topic(security. spam, bad bots) that helped you and you liked?

 

 

HI, you need to install a recaptcha to block spam https://developers.google.com/recaptcha/

Edited by Soyons zen (see edit history)

Share this post


Link to post
Share on other sites

 

2 minutes ago, Soyons zen said:

 

 

HI, you need to install a captcha to block spam

 

Yes, I am doing that, but something also got into my server and I have no idea where to start checking or what to do about it. I guess I'll go through the logs to find unusual ip addresses, maybe I'll find clues to what has been modified, but other than that I don't see after spending so much time finding the cause how I could block others in the future. The problem with IP blocking is that dynamic IP adresses are a thing and I'm sure bots take advantage of them. 

I live in Romania, we have superfast supercheap internet compared to a lot more advanced and modern countries and every time I disconnect and reconnect from my internet at home I get a new IP assigned. There's an urban legend going that there are lots of spies living/operating here. That would explain the cool internet. The location of IP is not really precise either, it's kind of a 50-70 km radius sometimes, other times it's close enough. So I might be missing something obvious, but I don't see how blocking IP's would work other than blocking a few malitious individuals.

Share this post


Link to post
Share on other sites
17 hours ago, vedrana said:

 

 

Yes, I am doing that, but something also got into my server and I have no idea where to start checking or what to do about it. I guess I'll go through the logs to find unusual ip addresses, maybe I'll find clues to what has been modified, but other than that I don't see after spending so much time finding the cause how I could block others in the future. The problem with IP blocking is that dynamic IP adresses are a thing and I'm sure bots take advantage of them. 

I live in Romania, we have superfast supercheap internet compared to a lot more advanced and modern countries and every time I disconnect and reconnect from my internet at home I get a new IP assigned. There's an urban legend going that there are lots of spies living/operating here. That would explain the cool internet. The location of IP is not really precise either, it's kind of a 50-70 km radius sometimes, other times it's close enough. So I might be missing something obvious, but I don't see how blocking IP's would work other than blocking a few malitious individuals.

 

 

HI, first remove the visible email address example the footer, ask your host what he can do.

Share this post


Link to post
Share on other sites
On 4/10/2018 at 6:44 PM, vedrana said:

 

 

Yes, I am doing that, but something also got into my server and I have no idea where to start checking or what to do about it. I guess I'll go through the logs to find unusual ip addresses, maybe I'll find clues to what has been modified, but other than that I don't see after spending so much time finding the cause how I could block others in the future. The problem with IP blocking is that dynamic IP adresses are a thing and I'm sure bots take advantage of them. 

I live in Romania, we have superfast supercheap internet compared to a lot more advanced and modern countries and every time I disconnect and reconnect from my internet at home I get a new IP assigned. There's an urban legend going that there are lots of spies living/operating here. That would explain the cool internet. The location of IP is not really precise either, it's kind of a 50-70 km radius sometimes, other times it's close enough. So I might be missing something obvious, but I don't see how blocking IP's would work other than blocking a few malitious individuals.

 

You are right, blocking IP addresses will not help you against targeted attacks. That's just not possible. And this module is not designed to help you with that problem.

You are talking about bots that intentionally tries to attack and take over your server. These programs (usually) tries to exploit known vulnerabilities. They often check whether you have installed some module that let them upload php script onto your server. If they find one, they upload php script and have complete control over your server. They can read and modify your database, send emails, or even use your server to attack other servers.

 

Can my module help you with this?

 

Not really. My module only detects when someone access /blackhole/ url, and in that case it adds the visitor to blacklist. These bots have no reasons to visit this url, so they wont be affected. 

You could add some .htaccess rules and redirect some of these know vulnerabilities to blackhole. That could help a bit. For example, if bot tries to access /modules/attributewizardpro/file_upload.php, it will be trapped and other attacks from this particular IP address will be prevented. But if attacker uses some sort of ip annonymizer that won't really help.

 

Detecting attack vector

 

When your server has been infected, you obviously need to remove any php script that was uploaded to your server.

But you also need to find the attack vector asap, or there will be reappearance.

To find vulnerability in your system, look for POST requests in your access log, and eliminate those requests that are OK. If you have shell access, you can use grep program to quickly search through access log.

For example, start with this (fix the path to your access log):

grep POST /var/log/apache2/access.log | head -n 100

this will get you first 100 POST requests. Not really helpful, but it's a start. First of all, get rid of harmless requests (post to admin, login, and /?rand=).

grep POST /var/log/apache2/access.log | grep -v "/admin561wkvz9k" | grep -v "/?rand=" | grep -v "/en/login" | head -n 100

And now you need to go over result, detect requests that are know to be harmless, and remove them from the output by adding another

.... | grep -v "/url/to/harmless" | head -n 100

to the pipeline.  Soon, there will be only a handful of post requests, and you'll shortly spot the attack. 

 

Shortcut


You can take a shortcut, if you wish. You can directly search for (failed) attacks by adding | grep " 404 " to the pipeline. That will filter the result to only those requests that were NOT found on your server. 

Example from my demo account:

grep POST /var/log/apache2/access.log | grep " 404 " | head -n 10

result:

172.68.62.77 - - [02/Oct/2017:01:16:25 -0400] "POST //modules/columnadverts/uploadimage.php HTTP/1.1" 404 25160 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.63 Safari/537.31"
172.68.62.77 - - [02/Oct/2017:01:16:26 -0400] "POST //modules/simpleslideshow/uploadimage.php HTTP/1.1" 404 25164 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.63 Safari/537.31"
172.68.62.77 - - [02/Oct/2017:01:16:27 -0400] "POST //modules/productpageadverts/uploadimage.php HTTP/1.1" 404 25167 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.63 Safari/537.31"
172.68.62.77 - - [02/Oct/2017:01:16:28 -0400] "POST //modules/homepageadvertise/uploadimage.php HTTP/1.1" 404 25164 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.63 Safari/537.31"
172.68.62.77 - - [02/Oct/2017:01:16:29 -0400] "POST //modules/soopamobile/uploadimage.php HTTP/1.1" 404 25158 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.63 Safari/537.31"
172.68.62.77 - - [02/Oct/2017:01:16:29 -0400] "POST //modules/homepageadvertise2/uploadimage.php HTTP/1.1" 404 25169 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.63 Safari/537.31"
172.68.62.77 - - [02/Oct/2017:01:16:30 -0400] "POST //modules/jro_homepageadvertise/uploadimage.php HTTP/1.1" 404 25172 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.63 Safari/537.31"
172.68.62.77 - - [02/Oct/2017:01:16:31 -0400] "POST //modules/attributewizardpro/file_upload.php HTTP/1.1" 404 25166 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.63 Safari/537.31"
172.68.62.77 - - [02/Oct/2017:01:16:32 -0400] "POST //modules/attributewizardpro.OLD/file_upload.php HTTP/1.1" 404 25167 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.63 Safari/537.31"
172.68.62.77 - - [02/Oct/2017:01:16:33 -0400] "POST //modules/advancedslider/ajax_advancedsliderUpload.php?action=submitUploadImage%26id_slide=php HTTP/1.1" 404 25391 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.63 Safari/537.31"

As you can see, there was an attack attempt from ip address 172.68.62.77. The attacker was looking for some way to upload php script, using vulnerabilities in modules.

Now, you can test if this attack was successful, by filtering access log to successful post requests from this ip address

grep POST /var/log/apache2/access.log | grep "172.68.62.77" | grep " 200 " | head -n 10

Thankfully, on my server this returned empty result.

If there's some 200 response, you have probably found vulnerability in your system. If it's a module, DELETE it immediately (or at least, rename the module dictionary). Disabling the module will not help -- the php file would still be accessible. Then contact module creator and notify them about the issue. Chances are that it's already been fixed, and you just used old version of module...

 

 

 

 

 

Edited by DataKick (see edit history)

Share this post


Link to post
Share on other sites

Hi, Do you have a solution, despite a captcha + your module I can receive spam I noticed that the sendings are done through the WebMaster option Thank you my friend

Share this post


Link to post
Share on other sites
19 hours ago, Soyons zen said:

Hi, Do you have a solution, despite a captcha + your module I can receive spam I noticed that the sendings are done through the WebMaster option Thank you my friend

 

send me link to your site, I'll verify that captcha is installed correctly. 

Share this post


Link to post
Share on other sites

Hi, how do I whitelist those good bot like googlebot and bingbot? The bingbot keep falling in that trap.

Share this post


Link to post
Share on other sites
2 hours ago, bananaguy said:

Hi, how do I whitelist those good bot like googlebot and bingbot? The bingbot keep falling in that trap.

 

In 1.0.1 version there wasn't any way to whitelist the 'good' one, it simply catches everything that don't follow robots.txt rules. It seems like bingbot is one of the bad guys :) Now, seriously.

I've just released new version 1.0.2 that whitelist some common good bots 

Share this post


Link to post
Share on other sites

hi Datakick,

congratulations for your module, it works like a charm.

How could I pass the $ip variable (with cURL method?) at abuseipdb that use an api with this syntax? https://www.abuseipdb.com/report/json?key=[API_KEY]&category=[CATEGORIES]&comment=[COMMENT]&ip=[IP]

And then: is there a way to add some IPs in white list too?

Thank you!

 

 

Edited by c00l75 (see edit history)

Share this post


Link to post
Share on other sites

@Nakatika no it cannot, it has been asked in the past. You need to install working reCaptcha and upgrade to 1.7.5.2 as the account creation using URL has been fixed there.

Share this post


Link to post
Share on other sites
44 minutes ago, lukash4 said:

How can I add IP addresses to the white list?

There's no white list

  • Sad 1

Share this post


Link to post
Share on other sites
6 hours ago, philipp.haugg said:

What are the pros & cons of using this solution? Can it affect the SEO of the website?

All search engine robots adhere to robots.txt standard. So they will not be affected by this 'trap' at all, so you won't see any negative (or positive) impart on your ranking.

This will catch bad bots -- those that ignore robots.txt files. For example, if your competitor use wget command to copy your entire website, he will not succeed...

I have caught plenty of bots, and I always check the whois information. I didn't see any 'legit' bots in there yet.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...

Important Information

Cookies ensure the smooth running of our services. Using these, you accept the use of cookies. Learn More