I will give a complete example. Consider a bulletin board. I want to first learn about any product. I, for example, need acer aspire 5742g. I choose the CPU, the amount of OP, vidyuhi and price tag. As soon as a product appears that fits my filters, I receive an SMS. Those. Ads that were posted earlier are not interesting to me. There may be 50 such filters. The algorithm is as follows:
Take the filter, take away the page with a curl.
Collect the ID of the goods.
Compare them with those already in the database.
If there is an ID that is not in the database - send SMS and add it to the database.
Otherwise, do nothing.
The problem is that as soon as N users appear (for example, we take 100) and each one has 50 filters, then this board of kroons will send 5,000 requests every minute (and if you take 1,000 users, it’s generally 50k = _ =). You will not get rid of this only if you look for the same filter among all and make only one request. But it will be grains.
Well, naturally, the server of the bulletin board will not be happy with such activity ... Buy Proxy? And how much? Roughly speaking on the 1st on the user? Only such a solution? I will be glad to hear your thoughts on this!