There is a group of sites. Their addresses are known. Each of them has a title on the main one registered in the h1 tag. You need to make a script that will bypass them all and gather all the texts from these tags into an array. I tried (so far without a crawl) like this:

<script language="JavaScript" src="http://code.jquery.com/jquery-1.12.4.min.js"></script> <script type="text/javascript"> $.get('http://siteadress.ru', null, function(txt){ alert($(txt).text($("h1").text())); }); </script> 

But it, naturally, did not pass, since $ .get only gets the text of the document, and does not generate the site. And the site is in php, in drupal, therefore in the text index.php there is no h1.

What will prompt the advice of the wisest?

  • Even you, my friend, started some sort of crap, sites on client js parsit. Yes, and cors did, they just will not give them to you, if they are not yours of course. - Duck Learns to Take Cover
  • Then tell me how to be. It is necessary to collect from 50+ sites: 1) Location (pull out of a call from a Yandex card) 2) Organization name (h1) ... and create from this a consolidated map of the divisions. - Anthony Pirozhenko
  • Parse sites on the server. Issue an apishka from the server, which your client will pull. - Duck Learns to Take Cover
  • one
    There is some basic web security restriction called the same origin policy. A significant part is wired at the browser level. In short, to your client from one site, another site just will not give anything away. There are many nuances, but in general it is. Open a console on this site and try sending a het request to your site from the example, and you will see that something went wrong. - Duck Learns to Take Cover
  • Now I will bear in response perhaps. - Duck Learns to Take Cover

1 answer 1

In general, only browser-based js can not be done this way .

Why: Same origin policy.

This is the basic principle of web security. He has many nuances, but in general it says that no one will give anything to the client of one site another site just by http request.

You can open the browser console on this site, pens to send a request to the site from the question and make sure that something went wrong.

What to do:
If all sites are yours, then you can allow cross-domain queries to them, but in this case you do not need these strange operations, why parse your own sites when you can get information from more reliable sources.

In general, this is done on the server. Cross-domain restrictions are implemented mainly at the browser level, that is, the server can collect and parse sites, put some API on the client, which the client will simply call and receive the necessary information.