Task: get a link to the image from the code that looks like this

<script type="text/javascript">window._sharedData = {"activity_counts":null,"config": __много кода__ "dimensions":{"height":1350,"width":1080},"display_url":"https://scontent-arn2-1.cdninstagram.com/vp/d60f1dbd5f2a609e6a653569a8a13a11/5B807715/t51.2885-15/e35/31297890_370834860094260_8326112321717927936_n.jpg" 

The desired link is after the "display_url" tag.

Through BeautifulSoup failed. Can somehow through regular expressions be possible?

Reported as a duplicate by jfs python May 14 '18 at 7:21 .

A similar question was asked earlier and an answer has already been received. If the answers provided are not exhaustive, please ask a new question .

  • What does “failed” mean? It should be great - andreymal
  • ValueError: Expecting value: line 1 column 1 (char 0) - Roman Oaks
  • The error itself says nothing, show the code that gives this error - andreymal

1 answer 1

You need soup to get the necessary script tag from the page code. The content of this tag is a json object.

 from bs4 import BeautifulSoup import json import re src = '<script type="text/javascript">window._sharedData = {"activity_counts":null,"config":"","dimensions":{"height":1350,"width":1080},"display_url":"https://scontent-arn2-1.cdninstagram.com/vp/d60f1dbd5f2a609e6a653569a8a13a11/5B807715/t51.2885-15/e35/31297890_370834860094260_8326112321717927936_n.jpg"}</script>' soup = BeautifulSoup(src, 'lxml') script = soup.find('script') #здесь нужно указать доп.условия поиска именно вашего тега json_text = re.findall('^s*window\._sharedData\s*=\s*({.*?})\s*\s*$', script.string, flags=re.DOTALL | re.MULTILINE)[0] js = json.loads(json_text) print(js['display_url'])