Hello! I have already read many similar topics, but mostly some of them were asked in them. Very much agree with these words. Well, to be honest, nothing at all is unclear. I understood that the information from html is parsed either through LINQ or through CSS selectors. I am not familiar with the first one, CSS is superficial. But still this option is intuitive to me or something closer, so I would like to receive answers in the form of CSS selectors.

Immediately the question: can the whole info be parsed in both ways? Or are there only cases when only one of the methods works? Or there are cases where it is generally impossible?)

Now directly to the task. I want to parse contact data from the site of the intercom. For example, take this page. Parsyu whole page to start

var parser = new HtmlParser(); var doc = parser.Parse("ссыль"); 

How, for example, parse the name? I look at the source, I see that the name is in the block
div class="df_panel" . It seems to be this unit with a unique name, so you can narrow down the search

 var div = doc.QuerySelector("div.df_panel"); 

This is where the questions immediately begin. I figured out myself that if a class name is specified in a div block, then it is written as shown. If, for example, div id="test" , then the request is already written in a different way (it took a long time to get based on a bunch of examples from different forums)

 var div = doc.QuerySelector("div[id="test""); 

So where is something written about this? I understand that some regular expressions are used here. Maybe they are similar to some other parsers, as, for example, it is written here that AngleSharp is very similar to Fizzler. But what if this is a locally arising task for me, and I didn’t deal with any other parsers? How should I understand what to write to me?

Ok, distracted. Dives closest to narrow the range of the search received. (Distract again - by the way, but what if there was no it at all? Is it possible to somehow obtain certain data if there are no unique identifiers, by means of which the search zone of the desired value is gradually narrowed?). Total see that the name is written in the header tag <h6 itemprop="name">НУЖНОЕ ИМЯ</h6> . How to get this value? Would it be possible to pull out the name if it were written without a title tag at all?

While on this questions I will stop. I would be grateful for any explanation. It is advisable to get answers to more general questions (for example, about, as I suppose, these regular expressions with help or good examples), then maybe I can figure out the rest.

  • 2
    “LINQ ... I am not familiar at all” - you are very vain. LINQ is one of those things worth learning C # for. Be sure to learn. (At the same time, you will also leap over functional languages.) - VladD
  • @VladD, please satisfy my curiosity and name the other two - VenZell
  • @VenZell: Oh. There's a lot there. The biggest is async / await, which turns the most difficult thing that happens in programming - asynchrony - into a child's toy. (Other languages ​​rushed to adopt.) This one by and large is already a good reason to write in C #. Then, first-class support for high-level concepts: properties, events. Although this is not so important, LINQ is more important. - VladD
  • If you are given an exhaustive answer, mark it as correct (a daw opposite the selected answer). - Nicolas Chabanovsky
  • I wrote an article for 1S infostart.ru/public/466196 - Serginio

2 answers 2

The most important

First of all, you need to learn CSS selectors.
And for a better understanding, at least still the basics of HTML.
You can do this, for example, on HTML Academy . Is free.

I would also add that there is no magic — all AngleSharp samples are standard CSS selectors, and not something unusual. (c) ReinRaus

I will answer your question:

But what if this is a locally arising task for me, and I didn’t deal with any other parsers? How should I understand what to write to me?

To understand what to write, I repeat, you need to learn CSS selectors.
You can do this, for example, here: " Do you know the selectors? ".

I will cite here a brief extract from the article mentioned above:

The main types of selectors

The main types of selectors are just a few:

* - any items.
div - elements with this tag.
#id is the element with the given id .
.class - elements with this class.
[name="value"] - attribute selectors (see below).
:visited - “pseudo-classes”, other different conditions on the element (see below).

Selectors can be combined by writing sequentially, without a space:

.c1.c2 - elements simultaneously with two classes c1 and c2
a#id.c1.c2:visited - the a element with the given id , classes c1 and c2 , and the pseudo-class visited

Relations

CSS3 provides four kinds of relationships between elements.

The most famous you probably know:

div p - div p elements that are descendants of div . div > p - only direct descendants. There are two more rare:

div ~ p - right neighbors: all p at the same level of nesting, which go after the div . div + p is the first right neighbor: p at the same nesting level that comes right after the div (if any).

Attribute selectors

On the attribute entirely:

  • [attr] - attribute set,
  • [attr="val"] - attribute is equal to val .

At the beginning of the attribute:

  • [attr^="val"] - the attribute starts with val , for example value .
  • [attr|="val"] - the attribute is equal to val or begins with val- , for example, it is equal to val-1 . On the content:

  • [attr*="val"] - the attribute contains a substring of val , for example, is equal to myvalue .

  • [attr~="val"] - the attribute contains val as one of the values ​​separated by spaces.
    For example: [attr~="delete"] is true for edit delete and wrong for undelete , and also wrong for no-delete .

At the end of the attribute:

  • [attr$="val"] - the attribute ends with val , for example, it is equal to myval .

Where to practice?

CSS Diner - here you need to select the element corresponding to the specified CSS rule.
HTML Academy - here you can learn the basics of layout.
htmlbook - reference for css selectors and html tags.

Answers to other questions

So where is something written about this? I understand that some regular expressions are used here.

These are not regular expressions, but CSS selectors. I wrote about this above.

Is it possible to somehow obtain certain data if there are no unique identifiers, by means of which the search area of ​​the desired value is gradually narrowed?

Yes, combining the child elements on any basis. For example, the first descendant in the parent ( * > *:first-child ), exactly the second element p in its parent ( p:nth-child(2) ), not empty a elements ( a:not(:empty) ) and etc.

Total see that the name is written in the header tag <h6 itemprop="name">НУЖНОЕ ИМЯ</h6> . How to get this value? Would it be possible to pull out the name if it were written without a title tag at all?

If I understood correctly, and I mean a value that is not wrapped in any tag, then the answer will still be - yes. You can search by text.
For this particular case, the decision in the form of a selector would be: h6[itemprop="name"]

once did not try. Not parsing with .df_panel

Your comment code uses the QuerySelector method, which, as I understand it, selects the first element with the specified selector. The first .df_panel of six on the page does not contain the h6 element. Therefore, you do not find anything. I emphasize once again: there are six .df_panel elements on the page.

The code for selecting the item you want

 var seller = doc.QuerySelector("[itemprop='seller']"); var name = seller.QuerySelector("[itemprop='name']").Text(); 
  • Comments are not intended for extended discussion; conversation moved to chat . - Nicolas Chabanovsky

But what if this is a locally arising task for me, and I didn’t deal with any other parsers? How should I understand what to write to me?

Above answered about the fact that you need to know css selectors, however, since the question is lit up in search engines (there are not so many questions on ru-so) - I will add another useful link on the topic.

The official AngleSharp repositories on the github have a demo app .