I attach the script for parsing the site data. Runs from eksel. It performs its function, but in the process of work it hangs and lasts longer than necessary. (I measured the time on a small volume). How can you optimize this script using masivvy for example? Prompt pliz arrays go tight - I do not understand which side to take the bull by the horns?

Public Sub parse_kiev() Dim http As Object, html As New HTMLDocument, user_items As Object, titleElem As Object, _ detailsElem As Object, user_item As HTMLHtmlElement Dim i As Integer Dim strT As String Dim j As Integer Dim d As Integer Dim startT As Variant Dim finishT As Variant startT = Timer For j = 1 To 1244 Set http = CreateObject("MSXML2.XMLHTTP") http.Open "GET", "https://rieltor.ua/users/?page=" & j & "", False http.send html.body.innerHTML = http.responseText Set user_items = html.getElementsByClassName("user-item") If j = 1 Then i = 1 Else: i = d For Each user_item In user_items Set titleElem = user_item.getElementsByTagName("div")(1) Sheets(2).Cells(i, 1).Value = titleElem.getElementsByTagName("a")(0).innerText Sheets(2).Cells(i, 2).Value = titleElem.getElementsByTagName("a")(0).href Sheets(2).Cells(i, 3).Value = titleElem.getElementsByTagName("strong")(0).innerText Sheets(2).Cells(i, 4).Value = titleElem.getElementsByTagName("div")(5).innerText Sheets(2).Cells(i, 4).Value = titleElem.getElementsByTagName("div")(5).innerText Sheets(2).Cells(i, 5).Value = titleElem.getElementsByTagName("span")(0).innerText strT = titleElem.getElementsByTagName("span")(1).innerText Sheets(2).Cells(i, 6).Value = titleElem.getElementsByTagName("span")(1).innerText i = i + 1 d = i Next Next finishT = Timer MsgBox "Время выполнения макроса " & (finishT - startT) / 60 & " мин." End Sub 

    2 answers 2

    In this case, the "bottleneck" of the task is not so much the work with COM objects (sheets, ranges, etc.), as the following 2 points:

    1. Parsing the text of a web page (html.body.innerHTML = .responseText)
    2. Network requests to a remote server (http.Open / http.Send)

    On the 1st point, we can not really affect. But we can cope with the 2nd. In the "Tools-> References" dialog, it is necessary to additionally enable the "Microsoft WinHTTP Services" library and enable the asynchronous operation of the WinHttpRequest object.

    One thing: you should not run too many asynchronous requests at the same time, otherwise this could be perceived by the remote server as a network attack. Get the error "Bad gateway" or something else. I limited myself to the 10th simultaneous queries, but if you want, you can experiment with this value.

    As a result, I had a macro that worked in 3 minutes and a little (03:05)

     Sub parse_v_2() Const request_count As Long = 1244 Const simultaneous_count As Long = 10 Dim http() As WinHttp.WinHttpRequest Dim html As New MSHTML.HTMLDocument Dim user_items As MSHTML.IHTMLElementCollection Dim user_item As MSHTML.HTMLHtmlElement Dim i As Long, j As Long Dim startT As Double ReDim http(1 To request_count) ' создаём сразу все запросы, но не спешим выполнять их For j = 1 To request_count Set http(j) = New WinHttp.WinHttpRequest Call http(j).Open("GET", "https://rieltor.ua/users/?page=" & CStr(j), True) Next j Let startT = Now Let i = 0 ' запускаем на выполнение первые 10 запросов For j = 1 To simultaneous_count Call http(j).send Next j For j = 1 To request_count ' ожидаем выполнения j-го запроса Call http(j).WaitForResponse(5000) ' запускаем новый запрос взамен "выбывшего" If ((j + simultaneous_count) <= request_count) Then _ Call http(j + simultaneous_count).send ' обработка результатов If (http(j).Status = 200) Then Let html.body.innerHTML = http(j).responseText Set user_items = html.getElementsByClassName("user-item") For Each user_item In user_items Let i = i + 1 With user_item.getElementsByTagName("div")(1) Лист2.Cells(i, 1) = .getElementsByTagName("a")(0).innerText Лист2.Cells(i, 2) = .getElementsByTagName("a")(0).href Лист2.Cells(i, 3) = .getElementsByTagName("strong")(0).innerText Лист2.Cells(i, 4) = .getElementsByTagName("div")(5).innerText Лист2.Cells(i, 5) = .getElementsByTagName("span")(0).innerText Лист2.Cells(i, 6) = .getElementsByTagName("span")(1).innerText End With Next user_item Else Лист2.Cells(i + 2, 1).Value = "Были успешно обработаны " & CStr(j - 1) & " страниц" Лист2.Cells(i + 3, 1).Value = "Ошибка при обработке " & CStr(j) & "-й страницы: " & http(j).statusText Exit For End If Next j For j = 1 To request_count Set http(j) = Nothing Next j Erase http Set html = Nothing Set user_items = Nothing Set user_item = Nothing MsgBox "Время выполнения макроса " & Format((Now - startT), "hh:mm:ss") End Sub 
    • The author will have to combine the tips in one code) - vikttur
    • Yes, it is worth trying) But, it seems to me, a noticeable profit will be only on weak hardware, where the code will rest on the processor speed (the main Excel thread will load one of the cores completely, 100%) - velial

    Yes, work with leaf objects is slow, you need to switch to arrays. If possible. determine the number of rows to write data, if not, set the array dimension with a margin ( ReDim in the example - 5000).

    With each change on the screen it is redrawn. Slow operation. Application.ScreenUpdating - disable / enable screen refresh. This will speed up the work of the macro, and paired with arrays - dozens of times.

    The rest is in detail.

    Integer variables will steal for a few moments, because still converted to Long . You can bite off a bit of memory for the sake of time.

    The With statement not only makes the code more readable, it also speeds up access to the object.

    The use of the strT string variable is strT . Why write something in it (and at each step of the cycle) if it is not used anywhere?

    The variable d duplicates the variable i , superfluous.

    Do not forget to free memory ( Set переменная = Nothing ). The speed will not affect and usually the memory itself is released upon completion of the macro, but there are cases when the memory registers remain occupied. Therefore, it is better to clean by force.

    It turns out something like this:

     Sub parse_kiev2() Dim aData() Dim http As Object, user_items As Object, titleElem As Object, detailsElem As Object Dim html As New HTMLDocument, user_item As HTMLHtmlElement Dim i As Long, j As Long Dim startT As Double startT = Now ReDim aData(1 To 5000, 1 To 6) Application.ScreenUpdating = False For j = 1 To 1244 Set http = CreateObject("MSXML2.XMLHTTP") With http .Open "GET", "https://rieltor.ua/users/?page=" & j & "", False .send html.body.innerHTML = .responseText End With Set user_items = html.getElementsByClassName("user-item") If j = 1 Then i = 1 Else i = d For Each user_item In user_items Set titleElem = user_item.getElementsByTagName("div")(1) i = i + 1 With titleElem aData(i, 1) = .getElementsByTagName("a")(0).innerText aData(i, 2) = .getElementsByTagName("a")(0).href aData(i, 3) = .getElementsByTagName("strong")(0).innerText aData(i, 4) = .getElementsByTagName("div")(5).innerText aData(i, 5) = .getElementsByTagName("span")(0).innerText aData(i, 6) = .getElementsByTagName("span")(1).innerText End With Next user_item Next j Sheets(2).Cells(1, 1).Resize(i, 6).Value = aData Application.ScreenUpdating = True MsgBox "Время выполнения макроса " & Format((Now - startT), "hh:mm:ss") Set user_items = Nothing: Set http = Nothing End Sub