Malicious Bot Detection Through A Complex Proxy Network

PUBLISHED ON April 17, 2019
LAST UPDATED August 27, 2021

A malicious entity operating a botnet to execute credential stuffing or password spraying attacks will frequently be stopped after a series of application login attempts by an app security solution, such as a WAF, and specific bot protection solutions. As a result of these attempts, IP addresses used by the attacker will often end up in IP reputation lists and will be blacklisted by many sites.

A more advanced attacker evades detection by rotating their IP addresses every few tries, staying below the radar of application authentication processes and security teams. It is not unusual for us to see a credential stuffing or comment spam botnet using as many as 10,000 different IPs consisting of hacked servers, workstations, and increasingly more IoT devices.

However, it is not easy to manage over 10,000 hacked devices, yet alone acquire that many shell accounts. So some clever attackers use a combination of TOR exit nodes, open proxies, and VPN services to mask their true IP addresses. These techniques enable an attacker to use far fewer bots to have the same amount of impact as a larger network.

During our investigation we found attackers using an unusual approach to amplify their impact and evade detection a custom network of private proxy servers distributed across multiple networks and Virtual Private Server (VPS) providers. This custom network of proxies gave the offenders the advantage of relatively clean, un-blacklisted IP space from which to launch their attacks.

In this blog we describe how we detect bot activity and network of private proxy servers, the network’s composition, and our approach for protecting our customers from its malicious behavior.

Detecting the Botnet

ThreatX Labs first noticed the bot activity when a customer received a high volume of requests with “proxy_ip:X.X.X.X:60000” in the “User-Agent” header. These requests were coming from multiple IPs and had a very similar user agent string:

 GET /path/to/resource.css HTTP/1.1
  User-Agent: proxy_ip:X.X.X.X:60000 Mozilla/5.0 (Macintosh; 
Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/70.0.3538.102 Safari/537.36

We track the prevalence of User-Agents observed across our platform and flag unusual characteristics in these for further analysis. While there is nothing terribly unusual about a Chrome 70 browser (October 2018) running on Mac OS X 10.13.6 (July 2018), the proxy_ip: portion is not frequently seen, especially at the high volume we observed for this particular customer.

After investigating a few of the entities making these requests, we determined the “proxy-ip:” User-Agent was only being sent when they requested static resources like .css. To confirm, we pulled all requests made from the offending entities and checked:

$ awk -F" '{print $6}' botnet-ip-requests | grep -oP "Mozilla.+" | sort -u
  Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) 
Chrome/70.0.3538.102 Safari/537.36 proxy_ip:X.X.X.X:60000 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36

All requests from these IPs were using the same Mac OS X 10_13_6/Chrome/70.0.3538 User-Agent, but only a small fraction included a “proxy-ip:X.X.X.X:60000” string before the rest of the header.

Proxy IP Requests

Requests including proxy_ip: in the user-agent header, 2019-03-26 – 2019-04-12

The ThreatX behavioral analytics engine stores attacker metadata collected across all of our customers. Correlation between customer metadata based on IPs including “proxy_ip:” in their User-Agent header revealed malicious behavior targeting the sites of multiple customers. The behavior included form/comment spam and attempts to scrape large amounts of content from customer sites.

In total, we observed about 7400 IP addresses used in the activity:

$ wc -l ~/observed-botnet-ips
  7419 observed-botnet-ips
 

After reviewing the data and determining several customers were affected, we deployed updated WAF signatures to block the entities that had sent the “proxy_ip:” alongside their malicious traffic, as well as stop this behavior in the future. This resulted in upwards of 7000 malicious IP addresses blocked with the potential to detect and block many more.

The proxy-ip:X.X.X.X:60000 user agent characteristic of this attack was different and intriguing, so we conducted additional investigation into the attacking IPs.

Analyzing the Proxies

We started by sending a few requests to one of the entities involved in the attack. Sending a HTTP request to the suspicious IP address on tcp/60000 got us the following error from a squid proxy server:

$ curl -v http://X.X.X.X:60000
  ERROR

The requested URL could not be retrieved


  ...

Generated Wed, 10 Apr 2019 10:09:23 GMT by localhost (squid)


  
 

Ok, so we knew this IP was running a proxy server. Using open proxies is a pretty well-known method to conceal your true IP address and evade detection. So we tried proxying a request:

$ curl --proxy http://X.X.X.X:60000 http://blog.threatx.com -v
  ERROR

Cache Access Denied.


  ...

Generated Wed, 10 Apr 2019 10:16:49 GMT by localhost (squid)


  
 

Great! The server responded as expected, but required authentication in order to proxy traffic. This means the entity obfuscating their traffic was not just leveraging public open proxies, it was also using a customized set of proxy servers in order to pull off this attack. The reason they did this is because open proxies will forward traffic for anyone and often end up in IP reputation blacklists. TOR operates in a similar way ‚Äì obfuscating the client’s true IP address and only exposing that of the final (exit) node. Deploying your proxies into a provider that does not much mind what is sent through their network also reduces the likelihood you will lose access to your tools unexpectedly.

We looked into the networks where the attack originated, using Team Cymru’s bulk IP to ASN lookup tool to summarize the proxy IPs by BGP network prefix:

$ netcat whois.cymru.com 43  uniq -c | sort -nr
  1651 104.164.0.0/15
  ...
 

From this, we found our observed proxy IP addresses were concentrated in a few networks. This raised the question: how many IPs in these networks not observed (or not yet observed as part of the botnet attack) were also hosting these private squid proxy servers?

Identifying the Botnet Scale

Finding 7400 IP addresses proxying malicious traffic is cool, but our research found these addresses mostly fell into a few different networks ‚Äì what if there was more to uncover here? We decided to search for other potential proxy servers running nearby.

We started by searching shodan for a few of our observed IP addresses and network prefixes, as well as for servers listening on tcp port 60000. This returned very little information, so we turned to masscan to try to detect additional neighboring proxies from our own lab:

$ cat ~/proxy-network-list | while read network ; do
  network_=$(echo "${network}" | sed 's///_/')
  masscan -p60000 "${network}" -Pn 
  --max-rate 10000 
  --output-format json 
  --output-filename masscan-$network_.json
 done
 

A few minutes(!) later we had scanned around 400000 neighboring addresses and discovered over 28000 hosts were listening on tcp/60000:

$ wc -l ~/proxy-network-live-ips
  28140 proxy-network-live-ips
 

After spot checking a few IPs per network from our results (not in the original botnet list), we were able to confirm these addresses were running the same private squid proxy and could be used as part of the same attack. We then summarized again by network prefix to get a count of live proxies per network:

$ netcat whois.cymru.com 43  uniq -c | sort -nr
  5504 104.164.0.0/15
  ...
 

We then calculated proxy density per network prefix ‚Äì high density (e.g. 254 proxy addresses in a /24 network) would confirm the network was controlled by our attackers.

Our final results:

 # confirmed networks (high proxy density)
  Count network/CIDR Organization
  4048 38.79.208.0/20 PSINet, Inc. / Cogent
  4035 38.128.48.0/20 PSINet, Inc. / Cogent
  2019 216.173.64.0/21 Network Layer Technologies Inc
  254 166.88.170.0/24 YHSRV / EGIHosting
  249 206.246.67.0/24 Nomurad SSE (C04726371) / NuNet Inc.
  248 206.246.115.0/24 NuNet Inc.
  248 104.171.148.0/24 DedFiberCo
  247 216.198.86.0/24 CloudRoute, LLC
  247 206.246.89.0/24 Expressway Agriculture (C04648072) / NuNet Inc.
  247 206.246.73.0/24 INTERNETCRM, INC. (C04860575) / NuNet Inc.
  247 206.246.102.0/24 ? / NuNet Inc.
  247 104.171.158.0/24 DedFiberCo
  246 104.237.244.0/24 DedFiberCo
  246 104.171.150.0/24 DedFiberCo
  244 104.171.157.0/24 DedFiberCo
 
  # suspicious networks (low proxy density)
  Count network/CIDR Organization
  5504 104.164.0.0/15 EGIHosting
  2741 181.177.64.0/18 My Tech BZ
  1014 107.164.0.0/16 EGIHosting
  1011 107.164.0.0/17 EGIHosting
  505 166.88.160.0/19 EGIHosting
  251 142.111.128.0/19 EGIHosting
 

Many of these networks were made up almost entirely of proxy servers. What’s more, we saw substantial overlap in the Organization assigned for each prefix ‚Äì some of these providers were not just hosting a few malicious services but were instead hubs of malicious and suspicious activity.

ThreatX Responsive Actions

The IPs/networks identified as part of this attack have been blocked from accessing the sites of all of ThreatX customers.

We reported the proxy IP addresses as well as sample log data to the abuse contacts on record for each of these networks. We also published the proxy IP addresses we observed, as well as confirmed and suspicious networks used in this attack, through our ThreatX Labs github.

Summary

Many hosting providers turn a blind eye to VPN and proxy infrastructure set up on their networks, relying primarily on abuse reports to detect any malicious activity from their IP space. While a few proxy server instances are not necessarily cause for alarm, entire network blocks and thousands of servers configured and actively in use for this activity should give providers some pause.

With ever increasing attacks coming from China, Russia, and other regions, some companies will turn to Geo-IP blocking traffic destined for their websites ‚Äì banning all access for requests from outside of their country of origin. We found Geo-IP databases to be ineffective in blocking this kind of botnet/proxy network activity. In fact, most of the malicious traffic observed was identified by these sources as “United States” in origin, but further investigation by the ThreatX Labs team determined the networks involved in the attack had been assigned or delegated to Chinese entities.

There are similar issues with common IP reputation lists. We found very little public information regarding malicious activity from these networks, though they appear to have been in use for this activity for some time. Proxy infrastructure like this can be redeployed with some effort — we expect that once this article is published and these IP addresses become public knowledge, a new infrastructure with new IP addresses will be spawned.

ThreatX WAAP goes beyond basic Geo-IP and IP reputation based blocking to track attacker behavior against our customer’s individual sites. This enables us to develop a custom risk score for each potential attacker and block, interrogate, or tarpit malicious traffic regardless of its supposed origin country/region, and before taking any known IP reputation into account.

Based on this attack, and others we observe against our customers every day, ThreatX has developed advanced active interrogation and deception capabilities which enable us to see past obfuscation techniques like the proxy network above and stop malicious entities before they’re successful.

Tags

About the Author