JS Need to get the text of the h1 tag from the page of another site?

Question

There is a group of sites. Their addresses are known. Each of them has a title on the main one registered in the h1 tag. You need to make a script that will bypass them all and gather all the texts from these tags into an array. I tried (so far without a crawl) like this:

<script language="JavaScript" src="http://code.jquery.com/jquery-1.12.4.min.js"></script> <script type="text/javascript"> $.get('http://siteadress.ru', null, function(txt){ alert($(txt).text($("h1").text())); }); </script>

But it, naturally, did not pass, since $ .get only gets the text of the document, and does not generate the site. And the site is in php, in drupal, therefore in the text index.php there is no h1.

What will prompt the advice of the wisest?

Even you, my friend, started some sort of crap, sites on client js parsit.
Yes, and cors did, they just will not give them to you, if they are not yours of course.
It is necessary to collect from 50+ sites: 1) Location (pull out of a call from a Yandex card) 2) Organization name (h1) ... and create from this a consolidated map of the divisions.
Issue an apishka from the server, which your client will pull.
There is some basic web security restriction called the same origin policy.
In short, to your client from one site, another site just will not give anything away.
Open a console on this site and try sending a het request to your site from the example, and you will see that something went wrong.

Duck Learns to Take Cover Duck Learns to Take Cover 6.119 3 21 48 · Accepted Answer · 2016-10-14T11:18:31

In general, only browser-based js can not be done this way .

Why: Same origin policy.

This is the basic principle of web security. He has many nuances, but in general it says that no one will give anything to the client of one site another site just by http request.

You can open the browser console on this site, pens to send a request to the site from the question and make sure that something went wrong.

What to do:
If all sites are yours, then you can allow cross-domain queries to them, but in this case you do not need these strange operations, why parse your own sites when you can get information from more reliable sources.

In general, this is done on the server. Cross-domain restrictions are implemented mainly at the browser level, that is, the server can collect and parse sites, put some API on the client, which the client will simply call and receive the necessary information.

It also costs by adding cors-anywhere.herokuapp.com before siteadress.ru to get cors-anywhere.herokuapp.com/http://siteadress.ru is a good service :)
@TimurMusharapov, yeah, thanks, I did not know about the service) But in my opinion, in any case, it would be more logical to sip my server layer.

JS Need to get the text of the h1 tag from the page of another site?

1 answer 1

More articles: