How to parse vacancies with API Zarplata.ru in R

Question

There is a joe site "Zarplata.ru" API

https://api.zp.ru/v1/

How to parse jobs in R with jsonlite? That is, exactly how to apply? For example, through the HeadHunter API, this is correctly done like this:

  string <-"https://api.hh.ru/vacancies?text=\"'machine+learning\"&page=" for (pageNum in 0:5){ # Всего страниц data <- fromJSON(paste0(string, pageNum)) vacanciesdf <- rbind(vacanciesdf, data.frame( data$items$area$name, # Город data$items$salary$currency, # Валюта data$items$salary$from, # Минимальная оплата data$items$employer$name, # Название компании data$items$name,#Название должности data$items$snippet$requirement)) # Требуемые навыки print(paste0("Upload pages:", pageNum + 1)) Sys.sleep(3) }

How to solve a similar problem through the API "Zarplata.ru" to be able to set a keyword and sort the data columns data.frame?

Answer 1 · 2018-02-01T03:17:44

Sample code to execute a query using the crul package.

 # HHTP клиент cl <- crul::HttpClient$new(url = "https://api.zp.ru") # Запрос к API resp <- cl$get(path = "v1/vacancies", query = list(scope = "public", q = "machine+learning", limit = 100L)) # Парсинг ответа ans <- jsonlite::fromJSON(resp$parse(encoding = "UTF-8")) # Количество записей в результате выдачи cat(ans$metadata$resultset$count) #> 2 # Извлекаем необходимые поля res <- ans$vacancies data.frame( header = res$header, published_at = as.Date(res$publication$published_at), salary = res$salary, education = res$education$title, experience_length = res$experience_length$title, schedule = res$schedule$title, working_type = res$working_type$title, requirements = res$requirements, url = paste0("https://www.zp.ru", res$url), company = res$company$title, address = paste(res$address$city$title, res$address$street, res$address$building) ) #> header published_at salary education #> 1 Senior, Middle Data scientist 2017-12-08 договорная высшее #> 2 Junior Data scientist 2017-12-08 договорная высшее #> experience_length schedule working_type #> 1 3-5 лет гибкий график полная занятость #> 2 без опыта гибкий график полная занятость #> requirements #> 1 Высшее образование, стаж работы 3-5 лет, полная занятость #> 2 Высшее образование, без опыта, полная занятость #> url #> 1 https://www.zp.ru/vacancy/Senior_Middle_Data_scientist?id=139080429 #> 2 https://www.zp.ru/vacancy/Junior_Data_scientist?id=139080474 #> company address #> 1 СКБ Контур Екатеринбург Малопрудная 5 #> 2 СКБ Контур Екатеринбург Малопрудная 5

To upload all results in case there are more than 100, you need to use the offset parameter.

The return fields are described in the API documentation.

If necessary, you can extract only the required fields using the fields parameter. For example:

 query = list(scope = "public", q = "machine+learning", limit = 100L, fields = "header,company.title")

Pagination example:

 #' @title Функция для выгрузки вакансий с сайта zp.ru #' @param cl HTTP клиент. Создаётся при помощи `crul::HttpClient`. #' @param query Строка, содержащая запрос. #' @param limit Целое число от 1 до 100, определяющее количество результатов. #' @return data.frame с результатами запроса fetch_vacancies <- function(cl, query) { limit <- 100L q <- list( scope = "public", q = query, limit = limit ) fetch_data <- function(query) { # Запрос к API resp <- cl$get(path = "v1/vacancies", query = query) # Проверка статуса ответа resp$raise_for_status() # Парсинг ответа jsonlite::fromJSON(resp$parse(encoding = "UTF-8")) } extract_data <- function(data) { res <- data$vacancies data.frame( header = res$header, published_at = res$publication$published_at, salary = res$salary, education = res$education$title, experience_length = res$experience_length$title, schedule = res$schedule$title, working_type = res$working_type$title, requirements = res$requirements, url = paste0("https://www.zp.ru", res$url), company = res$company$title, address = paste(res$address$city$title, res$address$street, res$address$building) ) } ans <- fetch_data(q) res <- extract_data(ans) # Орабатываем случай, если результатов больше 100 if (ans$metadata$resultset$count > limit) { e <- new.env() # Доабвляем уже полученные данные e[["0"]] <- res offset <- 101L count <- ans$metadata$resultset$count # Повторяем запросы и парсинг со смещением в 100 while (offset < count) { q$offset <- offset res <- extract_data(fetch_data(q)) e[[as.character(offset)]] <- res # Предотвращаем спам запросов Sys.sleep(0.4) # Выводим сообщение cat("\rFetch page ", (offset - 1L) / 100L) offset <- offset + 100L } # Собираем все результаты res <- do.call(rbind, as.list(e)) } # Добавляем обработанный запрос в атрибуты рзультата attr(res, "query") <- ans$metadata$query$searched_q return(res) } # HHTP клиент cl <- crul::HttpClient$new(url = "https://api.zp.ru") query <- "Аналитик" res <- fetch_vacancies(cl, query)

Do I understand correctly that in this way we spars specific skills, but how can we bring the resumes to the data breakdown by signs through them?
The answer is an example of working with the zp.ru site API.
If you want a more specific answer, ask a more specific question with links to API documentation pages.
What specific information do you want to get through the API and in what format to save it is not clear from the question.
The question is how to modify it for unloading vacancies with all the data on the columns for this vacancy (that is, salary, city, country, requirements, etc.).
On Hh.ru this is understandable, in the above code it can be seen, but from Zp.ru, somehow it is not quite obvious
You won’t think that your student teacher will help you on stakoverflow.com :)

How to parse vacancies with API Zarplata.ru in R

1 answer 1

More articles: