Data is being parsed from the site:

r = requests.get('http://ip-api.com/') print(r) soup = BeautifulSoup(r.content,'html.parser') soup 

By itself, the soup has a rather convenient form, but it is not clear how to pull data from it for later conversion to pd.DataFrame:

 { "country" : "Russia", "countryCode" : "RU", "region" : "MOW", "regionName" : "Moscow", "city" : "Moscow", "district" : "", "zip" : "125480", "lat" : 55.7522, "lon" : 37.6156, "timezone" : "Europe/Moscow", "isp" : "NCNET", "org" : "", "as" : "AS42610 PJSC Rostelecom", "mobile" : false, "proxy" : false, "query" : "37.204.225.193" } 

How to get pd.DataFrame from soup?

  • On the rights of offtopic, I note that on this site it seems to be a limit of 150 requests per minute. I don’t know what exactly you need info from IP-address, but isn’t it easier to use free geoip databases from maxmind? dev.maxmind.com/geoip/geoip2/geolite2 - nobody

2 answers 2

The API can send data directly to JSON, so you can use the read_json() method and read the data in the DataFrame:

 location_df = pd.read_json('http://ip-api.com/json', lines=True) 

For information on several addresses, they have the Batch API : Following the example from the documentation, you can do something like:

 ips = [ {"query": "208.80.152.201", "fields": "city,country,countryCode,query", "lang": "ru"}, {"query": "8.8.8.8"}, {"query": "24.48.0.1"} ] r = requests.post('http://ip-api.com/batch', json=ips) locations_df = pd.read_json(r.content) 
  • Well, then the next stage is how to register ip in this address, if I need data from different ip addresses, in other words, how to fill the table now? - Stepan Sokol
  • one
    @StepanSokol They still have a batch API. I updated the answer. - Andrey

Use pd.read_html () :

 In [49]: url = 'https://ipinfo.io/AS42610' In [50]: df = pd.read_html(url)[0] In [51]: df Out[51]: Netblock Description Num IPs 0 109.173.0.0/17 PJSC Rostelecom 32768 1 178.140.0.0/16 PJSC Rostelecom 65536 2 185.19.20.0/22 PJSC Rostelecom 1024 3 188.255.0.0/17 PJSC Rostelecom 32768 4 188.32.0.0/16 PJSC Rostelecom 65536 5 37.110.0.0/17 NCNET Broadband customers 32768 6 37.110.128.0/19 NCNET Broadband customers 8192 7 37.204.0.0/16 PJSC Rostelecom 65536 8 46.242.0.0/17 PJSC Rostelecom 32768 9 5.228.0.0/16 PJSC Rostelecom 65536 10 77.37.128.0/17 NKS broadband customers 32768 11 84.253.64.0/18 PJSC Rostelecom 16384 12 85.30.192.0/18 PJSC Rostelecom 16384 13 87.240.40.0/21 Central Telegraph Public Joint-stock Company 2048 14 87.240.48.0/20 Central Telegraph Public Joint-stock Company 4096 15 90.154.64.0/18 PJSC Rostelecom 16384 16 95.84.128.0/18 NCNET Broadband customers 16384 17 95.84.192.0/18 PJSC Rostelecom 16384 

UPDATE : if you use the Batch API, as in the answer @Andrey, you can use json_normalize () :

 ips = [ {"query": "208.80.152.201", "fields": "city,country,countryCode,query", "lang": "ru"}, {"query": "8.8.8.8"}, {"query": "24.48.0.1"} ] r = requests.post('http://ip-api.com/batch', json=ips) res = json_normalize(r.json(), errors='ignore') 

result:

 In [65]: res Out[65]: as city country countryCode ... regionName status timezone zip 0 NaN San Francisco США US ... NaN NaN NaN NaN 1 AS15169 Google LLC Mountain View United States US ... California success America/Los_Angeles 94043 2 AS5769 Videotron Telecom Ltee Québec Canada CA ... Quebec success America/Toronto G1X [3 rows x 14 columns] 

one line in vertical view:

 In [66]: res.loc[2] Out[66]: as AS5769 Videotron Telecom Ltee city Québec country Canada countryCode CA isp Le Groupe Videotron Ltee lat 46.7749 lon -71.3344 org Videotron Ltee query 24.48.0.1 region QC regionName Quebec status success timezone America/Toronto zip G1X Name: 2, dtype: object 
  • I need to submit ip-addresses to the parser so that it returns data to these addresses to me - Stepan Sokol
  • @StepanSokol, can you give an example? For example parsim trace. url, at the output, want to get this DataFrame ...? - MaxU 1:09 pm
  • Example: Parsim ip-api.com/#37.204.225.194 (but somehow so that it was json like the commentator above) and we get the string region, city, lat, lon, country zip and so on. In the future, the ip after the grid will change many times and df must be constantly updated with new information - Stepan Sokol