There is a page on the site that is available only after you log in to the site.

Need to parse this page. How can this be done? That is, first, whether somehow to log in programmatically, and then how to parse, I'll figure it out :)

The languages ​​you want to use are Python or Java (more preferable). But if you need some other specific language or framework for this, then I’ll learn, no big deal. Need a recommendation, I will be grateful

  • See how authorization goes on the site: ctrl + shift + i -> network. Go to the login page -> login -> click on the first entry in the network tab. Look at what the user sends (the Form Data tab), with what headers (request headers) - pay attention to the cookie string, it may be useful, and what the server responds (response headers) - focus on Set-Cookie - these are some of these cookies We will use in subsequent requests. - XxX
  • Next, write an authorization function that will send a post-request for authorization (to the page specified in the Referer field in the same network'a tab). If everything went well, the program should find and write (in the database, for example) all keys + values ​​for the Set-Cookie in the response. in subsequent requests just add these cookies to the heder, and you will be happy :) If at some point the server starts to return 403 errors - re-use the authorization function and update the values ​​of the cookies - XxX

2 answers 2

Look at the authorization on the site: ctrl + shift + i ( chrome ) -> network . Go to the login page -> login -> click on the first entry in the network tab.

Take a look:

  1. what the user sends ( Form Data tab)
  2. with what headers ( request headers ) - pay attention to the line with the cookie, it can be useful
  3. and that the server responds ( response headers ) - attention to Set-Cookie - these are the cookies (or some of them) that the program will / can use in its requests. Also, a token can be used for authentication - view the response body.

Next, write an authorization function that will send a post-request for authorization ( to the page specified in the Referer field in the same network'a tab ). If everything went well, the program should find and write ( in the database, for example ) all the values ​​for a Set-Cookie or some kind of token in the server response header. In subsequent requests, simply add these cookies to the request header.

If at some point the server starts to return 403 errors - re-use the authorization function and update the cookie values.

    You can see the parameters of the request sent by the browser when logging in to the browser console. If this request is programmatically repeated, you will need to obtain authorization data. It may be a cookie for example, or a token in the response body.

    Next you need through the same browser console to see which request the site sends to a private resource. Authorization data must be attached to the request. In the form of a request header usually.

    Next you need to repeat this request programmatically. The language can be used any, nothing new and special need to learn. The main thing is to find out the parameters of the request.