Hong Manh's BlogHong Manh's Blog Sharing Technology Knowledge

Bad Bots Blocking – Apache, Nginx & CSF – Tutorial

Over the past few years, bad bots have become a significant problem for many server administrators and website operators. These bad bots will often relentlessly target servers, performing thousands of requests and retrieving huge amounts of data in a short period of time.

These activities can cause a spike in server resource usage, which affects the performance of a server and makes it much slower for normal visitors. In some cases, high load from aggressive bots can make a server less stable, causing websites to crash or become unresponsive.

Fortunately, there are several approaches available for blocking bad bots – crawlers, and scrapers. In this post, I am going to explain what aggressive bots, scrapers, and crawlers are, and the risk they pose. Then, I’ll share the best techniques for dealing with aggressive Chinese bots.

What Scrapers and Crawlers Bots Are?

Internet bots are software applications that perform automated tasks over the Internet. The two most common types of bots operating online are crawlers and scrapers.

Crawlers will visit websites to read and assess content, including xml sitemaps, images, links, and HTML documents. Crawling is mostly performed by search engines to assess the content on websites. Although crawling isn’t usually performed with malicious intent, overly-active crawlers can use a substantial amount of server resources.

Scrapers are programmed to extract data from websites. They are often quite sophisticated, using AI like techniques to complete web forms and access the information they require. In many cases, scrapers use websites in unintended ways, exploiting the services that are provided to normal users.

What Risks Are Posed By Bad Bots?

Aggressive bots can perform hundreds of requests per second, eating up valuable server resources including RAM, CPU, and hard drive space. This can dramatically affect server response time and performance.

An server inundated with aggressive bots may experience:

  • Software more prone to crashing
  • Error codes
  • Overheating
  • A slow-down of routine operations like server backups
  • Slow or unresponsive websites
  • Partial delivery of content
  • Complete denial of service
  • Additional bandwidth and energy costs

On top of these types of technical problems, Aggressive bots are often malicious and looking for ways to exploit server resources.

Identifying Bad Bots – Crawlers and Scrapers

So, how do you know that you have a bot problem? There are several warning signs, including:

  • High Server Resource Usage
    If suddenly your website becomes slow or you get notifications about high CPU, Ram, Network usage on your server.
  • Unusually high page views
    If you notice that a server has a sudden and unusual increase in page views, bad bots may be responsible.
  • High bounce rate
    The bounce rate is the number of users who visit a page and leave before clicking anything. This behaviour is often a result of bots that are programmed to scrape or index a single page.
  • Very long or short session durations
    The amount of time that users spend on a website is relatively uniform. Humans take about the same amount of time to assess or consume content. If you notice that many sessions are extremely short or long in duration, it may be a bot in operation.
  • Spike in traffic from China
    If a server suddenly experiences a massive influx of Chinese IPs, a bot is likely responsible.
  • Junk content
    Some bots will fill forms and submit content in an effort to spam your email box. This content often looks like gibberish.

Blocking Bad Bot User Agents For A Single Site (Nginx & Apache)

There are several strings often found in the user agents data of bad bots, and this is why this the most effective way of blocking bad bots is by blacklisting several strings on the user-agent header. A few examples would be:

  • Mb2345Browser (Chinese web crawler)
  • Kinza (Prominent spam bot)
  • Sogou (Chinese web crawler for Sougou search engine)
  • Baiduspider (Chinese web crawler for Baidu)
  • LieBaoFast (Chinese web crawler)
  • MicroMessenger (Bot associated with WeChat)
  • zh_CN (refers to Chinese specific localisation settings)

As a matter of fact, we have recently observed 4 different bad bots to be aggressively crawling websites on our cPanel Server Management customer servers. We’ll use them as an example below and we would also suggest to keep them on your list.

If you don’t know where to find your server logs, check the following paths depending on your main web server. Look for many requests with weird strings in the user-agent and block them.

For cPanel Servers, you can find your log files located on the following path for Apache:

/etc/apache2/logs/domlogs

Apache Web Server Log Files

/var/log/apache2 or /var/log/httpd

Nginx Web Server Log Files

/var/log/nginx

You may check the log files real-time by issuing the following command (adjust your path based on your install and domain name):

tail -f /var/log/apache2/example.com.access.log

Option 1. Apache Bad Bot User Agent Blocking Through .htaccess

Append the following lines in your .htaccess file

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT (LieBaoFast|UCBrowser|MQQBrowser|Mb2345Browser) [NC]
RewriteRule .* - [F,L]

As soon as the lines above are placed in your .htaccess file any user-agents matching words “LieBaoFast or UCBrowser or MQQBrowser or Mb2345Browser” will receive a 403 response from your web server..

Option 2. Nginx Bad Bot User Agent Blocking Using Nginx Configurations

In order to block bad user agents in Nginx, you will need to edit the nginx vhost file for the respective website and then restart nginx.

Edit /etc/nginx/sites-enabled/<yoursite>.conf and place the following configuration inside your server { } block.

if ($http_user_agent ~ (LieBaoFast|UCBrowser|MQQBrowser|Mb2345Browser) ) {
return 403;
}

Exit and save your configuration file and in order for changes to take effect reload your nginx web server by issuing the following command

Make sure your nginx configuration looks good by issuing the following command:

nginx -t

If no errors show up you can reload your configuration using the following command:

service nginx reload

While this solution will stop some of the latest aggressive bad bots, you may come across with more bad bots having unusual user agents. Check server logs to discover the more user agents, then block or rate limit each one. You can also find more bad bots by visiting BotReports.com.

Option 3. Blocking Bad Bots, Crawlers, and Scrapers IP Addresses and/or Countries

If only a very small number of bots are aggressively targeting your server, you can block their IP addresses with a firewall rule. However, many people find that blocking a bad bot in this way can cause it to become more aggressive as it sends bots on additional IPs to quickly retrieve more data.

The large number of IPs created by bad bots is most likely because they are using from huge bot farms or the malware-compromised computers of Internet users. The IP network of bad bots seems to be never-expanding, so it can be difficult to keep up when manually blocking IPs.

To block a single aggressive bot IP Address using ConfigServer Security & Firewall (CSF)

csf -d IPHERE

If you have observed that bad bots are coming from IP’s belonging to a specific country and you are certain that you are not expecting any legitimate traffic on your website from that country you can block the entire country.

To block a COUNTRY using ConfigServer Security & Firewall (CSF)

1. Edit the file /etc/csf/csf.conf

nano /etc/csf/csf.conf

2. In order for the country block to work, you’ll need to sign up with an account on maxmind.com and use the serial number on your csf.conf (It’s FREE):

MM_LICENSE_KEY = "YOUR LICENSE HERE"

3. Navigate few lines down and look for the following option

CC_DENY_PORTS = ""

4. To block a country you’ll need to find the country’s code from maxmind.com and place it on the setting below. In our example, if we use CN then we’re blocking China.

CC_DENY_PORTS = "CN"

5. Next, we adjust a setting to block only web ports (80 and 443). That way your server can still communicate with email servers, dns resolvers, etc.

CC_DENY_PORTS_TCP = “80,443”

6. After adjusting the setting, close and save the configuration file. Then restart CSF using the following command:

csf -r

Thanks for reading Bad Bots Blocking – Apache, Nginx & CSF. For more server admin hint and tips, subscribe to our blog.

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Press ESC to close

0
Would love your thoughts, please comment.x
()
x