I know how to get a document, I found libraries for processing the received document, but I don’t know how to let the program know where to stop so that everything that was already sparsened does not parse again. And how to implement the transition to pagination, too.

Tell me, please, about where to dig, what to look for, what to read. Or share an exemplary algorithm.

Closed due to the fact that the essence of the question is not clear to the participants aleksandr barakin , Nicolas Chabanovsky Nov 16 '16 at 7:18 .

Try to write more detailed questions. To get an answer, explain what exactly you see the problem, how to reproduce it, what you want to get as a result, etc. Give an example that clearly demonstrates the problem. If the question can be reformulated according to the rules set out in the certificate , edit it .

    2 answers 2

    I sincerely do not understand the essence of the question ... but, if you need to parse normally to php, then you need:

    1) knowledge of curl (and multicurl);

    2) knowledge of string functions of php (at least strlen, strpos, substr, str_replace) and cycles (while and do + while);

    3) the ability to write regular expressions (it is convenient to test them here https://regex101.com ).

    You should also have an understanding of working with arrays and strings in general. Everything else is a combination of the three elements listed above.

      so that the program does not parse what has already been sparsened, you must set the condition for it "if it is not yet sparse, then it is necessary to parse, otherwise - it is not necessary to parse" :)