There is a Java application whose main activity is automated work on the Internet from many accounts at the same time (NOT spam or other bad activity). To work with the network using Apache HTTP Client 4.5.2. The application goes to the Internet through different HTTP proxy (for 1 proxy 2 accounts) with login / password authorization, and also uses cookies.
Initially, because of the very poor knowledge of Java, the development environment, and, most importantly, of the English language, the work of a multithreaded application was built as follows:
Select an account from the CSV file (primitive analogue of the Database), create a separate stream for it, in which we create an object of class Task, which implements the main functionality of the program (parsing received pages and making decisions based on the data, as well as updating the CSV file) . In this object (class), in turn, an object of the Network class is created, in which all work with the network is performed (and the obtained data is used in the Task).
Thus, for each stream, an instance of the HTTP Client is created. When I did this, I already knew approximately that HttpClient has some peculiarities of working in a multithreaded environment, but it seemed to me that creating a separate instance for each stream is the optimal solution to this problem.
And this code even quite tolerably worked, while there were not very many threads. By some miracle, well, or I just did not notice the error, because they were few.
But when there were a lot of threads, the code stopped working correctly, which is quite natural. Namely, at the beginning of the program, all the threads are started, but quite quickly most of them go to the Wait (lock) created by HttpClient and do not work (roughly speaking, it hangs).
Naturally, I turned to Google and the documentation, and almost immediately realized that you cannot create a separate instance of the HTTP Client for each stream, but you need to create one for the entire application and use
PoolingHttpClientConnectionManager But at the same time for each stream to create its own HttpContext (in it, if I do not confuse, cookies + in my case are stored proxy authorization (I use the preemptive authorization method)).
You can read more about this in the HTTP Client documentation , there is also an example, I recommend looking for what would be the most interesting thing I ask further.
In general, everything is clear. But how to do it specifically in my application, given its architecture described above? It was specifically made so that at least somehow be correct from the point of view of the PLO, with the idea of re-using separate classes in other applications.
Of course, you can do everything - choosing an account (and creating a stream if it fits), and the main logic of the application, and working with the network in one class, creating the HTTP Client once. But it will come out a class of huge sizes, which will be very inconvenient to read and edit, and it is almost impossible to use it in other applications. It seems to me that this is categorically wrong from the point of view of the PLO, and this cannot be done.
How then to solve this problem correctly?
While writing my question, I got the idea that you can create an HTTP Client once in a class where an account is selected, and then just transfer it like this:
public GetThread(CloseableHttpClient httpClient, HttpGet httpget) in each stream, and there already pass the same way to the Network class, and work with it.
Then, in theory, the structure of the program should be preserved, and there will be no problems with multithreading in the HTTP Client.
Do I think true? If not, how will it be right?
PS This is my first serious Java application, please do not judge strictly, everything comes with experience. Thank you in advance for your answers and help.