I need to check 200 million domains for availability and accessibility to one or another CMS.

I use PHP 7.1 and do a lot of checking.

Iron and settings

  1. Server: Multicore CPU, 64GB RAM, SSD disks, 500 Mbits dedicated bandwidth (OVH server).
  2. In the resolv.conf Google DNS file: 8.8.8.8 / 8.8.4.4
  3. ulimit -n is set to 655350
  4. I use nload for the analysis of loading of the channel

Testing

I checked the first 1 million domains from the database using a different number of parallel running processes. I ran into the problem that increasing the number of processes, the number of domains that do not respond within 30 seconds, has greatly increased. Here are the results.

1. 1000 processes

Test: 1,000,000 domains, 1000 parallel processes, average load of a channel is 85 Mbits, the total scan time is 1 hour. Result: 65% of domains were successfully resolved , 35% were not resolved due to a timeout.

2. 300 processes

Test: 1,000,000 domains, 300 parallel processes, average channel load 70 Mbits, total scan time 2 hours. Result: 85% of domains have been successfully resolved , 15% have not been resolved due to a timeout.

findings

As we can see, increasing the number of processes by 3 times, we do not get an increase in channel loading by 3 times. The number of domains that were not available / unresolved is greatly increasing. At the same time, the check speed was increased 2 times.

Question

Where is the bottleneck of such a test? How can I use all the channel bandwidth of 500 Mbit? Should I use my own DNS server and if so, how to configure it correctly?

I will be glad to any ideas and advice!

  • one
    Using the number of threads in excess of the number of processor cores is not advisable and leads to gigantic overhead of context switching. theoretically, a part of the timeout may be associated even with the fact that the processor did not get the process on time. In addition, the standard OS rezolva library is also not designed for this. You must use a separate asynchronous rezolv library. And it is very desirable to perform other actions after rezolv as a small number of threads in asynchronous mode. - Mike
  • After that, look at what stage exactly what amount of timeout occurs. When resolving names or subsequent connection to the host. It is also worth paying attention to the entire intermediate infrastructure when connecting to the Internet, including part of the provider. Many home-based routers cannot cope with so many connections. Also, many providers claiming 500 Mbps cut the number of simultaneous connections. And even if they do not cut, but they use NAT they may overflow the NAT table - Mike
  • In addition, no data on restrictions on DNS servers 8.8.8.8 I have not seen, but they may well be. But if you use your own DNS server, this can lead to a much greater load on the NAT table. So, ideally, you need to make sure that there is no NAT in principle anywhere on the route. - Mike
  • And also note that measuring the loading of the channel in megabits makes sense only when downloading a large amount of information. And with your load of small requests for a huge number of hosts, megabits already have practically no such value. A bottleneck in general may be for example in the interrupt handling of your network card. At such speeds, server network cards are highly recommended and it is possible to tune the interrupt handling in the OS (into the flesh until the core of the processing interrupt is fixed and another load is removed from it) - Mike

0