Hello!

It is necessary to obtain the URL of the address to all posts of the Reddit site, according to the search query. For this, I use the reddit_urls () function from the R-package "RedditExtratoR".

My problem. Some links contain characters in Spanish (for example, 'ó'), which the R-function returns to me in a specific encoding with a backslash ('\').

For example. Browser link: https://www.reddit.com/r/Barca/comments/4g4fmp/match_thread_fc_barcelona_vs_sporting_de_gijón/

The link that the reddit_urls () function returns to me is: " http://www.reddit.com/r/Barca/comments/4g4fmp/match_thread_fc_barcelona_vs_sporting_de_gij \ 363n /"

As a result, R is unable to work with the following address:

> reddit_content('http://www.reddit.com/r/Barca/comments/4g4fmp/match_thread_fc_barcelona_vs_sporting_de_gij\363n/') Warning messages: 1: In grepl("^https?://(.*)", URL[i]) : input string 1 is invalid in this locale 2: In file(con, "r") : cannot open URL 'https://www.reddit.com/r/Barca/comments/4g4fmp/match_thread_fc_barcelona_vs_sporting_de_gij      n/.json?limit=500': HTTP status was '503 Service Unavailable' 3: In file(con, "r") : cannot open URL 'https://www.reddit.com/r/Barca/comments/4g4fmp/match_thread_fc_barcelona_vs_sporting_de_gij      n/.json?limit=500': HTTP status was '503 Service Unavailable' 

I need to re-encode part of the URL with a slash "\ 363n" on a character that will restore the link to be processed for further processing in R.

    2 answers 2

    Use the url_encode function from the urltools package. Characters in the URL do not seem to be in UTF-8 encoding.

     url <- "http://www.reddit.com/r/Barca/comments/4g4fmp/match_thread_fc_barcelona_vs_sporting_de_gij\363n/" cat(url) http://www.reddit.com/r/Barca/comments/4g4fmp/match_thread_fc_barcelona_vs_sporting_de_gij n/ cat(urltools::url_encode(url)) http://www.reddit.com/r%2fBarca%2fcomments%2f4g4fmp%2fmatch_thread_fc_barcelona_vs_sporting_de_gij%f3n%2f cat(iconv(url, "cp1252", "utf8")) http://www.reddit.com/r/Barca/comments/4g4fmp/match_thread_fc_barcelona_vs_sporting_de_gijón/ cat(urltools::url_encode(iconv(url, "cp1252", "utf8"))) http://www.reddit.com/r%2fBarca%2fcomments%2f4g4fmp%2fmatch_thread_fc_barcelona_vs_sporting_de_gij%c3%b3n%2f 

      Thank you for your help! It turned out that the problem is common among Unix-like systems. Found a solution using the iconv () function here: https://stackoverflow.com/questions/14503143/generating-extended-ascii-in-r-on-a-linux-platform