Who made the parser? please tell me how long it takes to parse 40,000 products approximately
Closed due to the fact that the question is too general by the participants Alexey Shimansky , D-side , Streletz , zRrr , Nick Volynkin ♦ Jun 22 '16 at 4:03 .
Please correct the question so that it describes the specific problem with sufficient detail to determine the appropriate answer. Do not ask a few questions at once. See “How to ask a good question?” For clarification. If the question can be reformulated according to the rules set out in the certificate , edit it .
- It is necessary to clarify the details of the question! Try to write more detailed questions. To get an answer, explain exactly what you see the problem, how to reproduce it, what you want to get as a result, etc. Give a sample code. - Anton Komyshan 7:09 pm
- Well, I don’t know how to write out how much time it takes to parse 40,000 products that are in another site, it’s an example of a code to know how fast it is for everyone or not - misha11
- Depends on the system when parsing volumes, but I think seconds 10 - pnp2000
- @ misha11, so to you then to another site, polling site :) - Anton Komyshan
- 4 5 seconds for each entry is a very long time yes? - misha11
2 answers
There are many nuances, so the answer is abstract.
- What time to parse threads? The more workers, the faster they get 40 pieces of goods. If there are too many requests, we can put a server or get a ban.
How long does the server respond? Depends on server performance and its physical location and location of parsers.
Is the server banning? Will we send direct requests or parse through a proxy? Surely run through the proxy will be slower.
- Sometimes requests will fall off, so additional attempts will be needed.
Suppose a server normally holds a load of 50 threads, does not ban, and responds on average for 4s. Then 40000 * 4/50 = 3200 sec. Roughly speaking, one hour.
- Thanks for the informative answer ,! - misha11
There is an option to calculate approximately
time curl "http://url/of/one/item.html" In this way, you will get the processing time for a single object that you pick up for yourself.
using the siege utility - you can check for how many objects you can calculate, more precisely, how many objects the source server will give per second.
- Parsing from about 2 seconds to 5 6 is it somehow not so written or everything is going on for so long? - misha11
- This is, in fact, normal. Next, understand such a moment that when you load your parser in many threads, you need to monitor the response time of the source server: if it changes in a big way, then you should reduce the number of threads until the response time increases to stop - Alexander
- Well, it seems to me that this prasser is written incorrectly, there are no threads, everything works from one script, but such a thing is not clear if you write an appeal to php to some objects so that they run and parse the data (some kind of multithreading), and do it through internal Proxy ip then when accessing from the server to the desired resource, that host will read these calls from the external ip and it will be like a ddos ​​well, in any case, they will most likely block access or the proxy will work differently? - misha11 4:56