There is a site aviasales.

I need to know the cheapest direction for a particular route on a certain date.

They have a free API .

There is a method that completely suits me. ( Returns the list of prices found by our users in the last 48 hours, in accordance with the filters set. ) However, if no one has searched for any route, then this method will not return anything to me. Ie, in order for this route to appear in this method you just need to search for it (go to the link like: https://www.aviasales.ru/search/KZN2601MOW30012)/ Then I can already get a price through the IPA.

Now I just started the zennoPoster program, which daily walks on the given url. But I do not like this approach, because computer must be constantly running. Renting a windows machine somewhere is expensive.

If on php, then just calling file_get_contents will not work, because the upload is javascript. Calculate the sequence, i.e. to search for what the airspace itself is doing inside and to which urls it addresses is long and unreliable, since the algorithm can always change unpredictably.

What variants of this decision still can be? Tell me please

  • Not quite clear - you do not like the solution with a constantly running computer. But for the same thing to work on php - you need exactly the same constantly running computer, server. If you are interested in how to do this in php, you need to use curl with the CURLOPT_COOKIEJAR and CURLOPT_COOKIEFILE . Then through the Kurl it is very easy to emulate walking through the pages as in a browser - working with cookies. Only it may not be suitable if the site has additional protection from this: for example, a one-pixel picture-beacon is loaded, or a request-beacon is made from deafly obfuscated js. - Goncharov Alexander
  • He wrote "It is expensive to rent a car somewhere." There are tools only for Linux - user2244523 February

3 answers 3

You can help selenium . For php there is a php-webdriver facebook.

In a similar question I wrote a recipe for using php-webdriver with phantomjs. The development of phantomjs seems to have been curtailed and now use headless chrome instead (the essence is approximately the same).

    It seems to me that in this situation it would be a convenient decision to take a regular php hosting with cron support, and hang a script on it with the necessary frequency, which will perform the necessary actions. If in order to generate the necessary data it is enough just to go along the URL, then for this it will be enough to use curl, and then get the data and do the necessary manipulations with them.

    For more sophisticated work with sites you can use, as mentioned above? php-webdriver.

    But in any case, be prepared for the fact that it may be necessary to correct something in the algorithm for obtaining data for any chosen solution, since both the layout on the website and the logic of its work may change, and you will have to adapt your decision to these changes.

    • I don’t think that php-webdriver will work on "regular php hosting", for it to work in the system you need a GUI to launch browsers in which work will take place - kandellak
    • If it comes to using Selenium, then yes, then you have to look for another solution, such as a virtual server or a cloud, and if you can just get along with curl, then virtual hosting with curl and cron support will be more than enough. - MWS
    • A sense of the crown. There the javascript is loading. You can go to the site aviasales. - user2244523

    As far as I know, working with sockets on php is tears and pain (I tried to parse Yandex real estate), so I advise you to look towards python and scrapy, respectively! And so - to emulate a browser, as colleagues wrote above