C # code is available:

private void button1_Click(object sender, EventArgs e) { string urik = "http://www.cy-pr.com/analysis/"; foreach(string s in richTextBox1.Text.Split('\n')) { string retVal = new WebClient().DownloadString(urik + s); richTextBox2.Text = "Подставляем сайты... "; } } 

In richTextBox1 there is a list of sites that need to be substituted in turn, and after each “delivery” a web page is loaded, for example, they substituted “vk.com”: http://www.cy-pr.com/analysis/ya.ru/ and you need to copy Yandex CY (CY) and Google PageRank (PR) from this page. The ya.ru website has a TIC of 12000, it’s like in html between the characters: "inline;"> 12000 </span> , and PR google will be 6, in html it looks like this: id="pr">6</span> .

How to copy TIC and PR from each site and add the result of Tits separately (add all tic of all sites), and calculate PR separately from all sites and display?

Help me please.

  • I can suggest regular expressions, and how to use it in the C # read on the Internet. CY: \ "inline; \"> (\ d +) <\ / span> PR: id = \ "pr \"> (\ d +) <\ / span> In both expressions, the result is returned in the first group. - ReinRaus

1 answer 1

This task is easier to solve with the Html Agility Pack , since it already contains a lot to parse web pages. For example, to get a specific span on a page for a specific id, do this:

 var document = new HtmlDocument(); document.Load("foo.html"); var node = document.DocumentNode.SelectSingleNode("//span[ @id ='something']"); if (node != null) { var innerText = node.InnerText; // Получить текст внутри span }