You need to pull out the html code from the marketwatch.com website. I do it this way:
#первый две строки не важны, в них я вытаскиваю рандомный юзерагент и использую его в запросе useragents = open('/home/ubuntu/bot/useragents.txt').read().split('\n') useragent = {'User-Agent': choice(useragents)} response = requests.get(url, verify=True, headers=useragent) return response.text Previously, it all worked, but now apparently put some kind of protection and all that returns it to me:
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> <meta http-equiv="cache-control" content="max-age=0" /> <meta http-equiv="cache-control" content="no-cache" /> <meta http-equiv="expires" content="0" /> <meta http-equiv="expires" content="Tue, 01 Jan 1980 1:00:00 GMT" /> <meta http-equiv="pragma" content="no-cache" /> <meta http-equiv="refresh" content="10; url=/distil_r_captcha.html?requestId=3130ab7e-7dc1-4353-bfe0-15e09c163fc9&httpReferrer=%2Finvesting%2Ffuture%2Fdjia%2520futures" /> <script type="text/javascript"> (function(window){ try { if (typeof sessionStorage !== 'undefined'){ sessionStorage.setItem('distil_referrer', document.referrer); } } catch (e){} })(window); </script> <script type="text/javascript" src="/lxwtsparqmgdowhx.js" defer></script> <style type="text/css">#d__fFH{position:absolute;top:-5000px;left:-5000px}#d__fF{font-family:serif;font-size:200px;visibility:hidden}#fwqssyztxufxfzwwduebdqwxedwrzazqaaavux{display:none!important}</style></head> <body> <div id="distilIdentificationBlock"> </div> </body> </html> How to bypass the protection and reach the content?