There is a need to get all the relative paths in the HTML markup. Made up a similar regular expression:

@"(?:src|href)=""([^#](?!http[s]*[:])[^/]{2}(([a-z0-9-.]*/)*)([a-z0-9-.]*?[a-z0-9-]*!?.[az]{2,4})(?!#)\w*\W*)""" 

In general, it works as it should if you use the same, for example in JavaScript. Anchors like #yakor ignored correctly, but there is a problem with anchor links like index.html#yakor in C #, they are not just ignored.

Designed in this calculator, but it's for javascript.

  • one
    HTML parsing is not a good idea. - VladD
  • @VladD Why? - BwehaaFox
  • one
    Read: Stackoverflow.com/q/420354/10105 - VladD 4:05 pm
  • Try to screen # like this \# or like this \x23 - nick_n_a
  • @nick_n_a, alas, did not help. - BwehaaFox

1 answer 1

If the problem is only in parsing the link, as stated in the comments, then it is better not to try to use the regulars again, but to apply the honest Uri class.

Example:

 var uri1 = new Uri("http://www.google.com/index.html#yakor", UriKind.RelativeOrAbsolute); var uri2 = new Uri("/index.html#yakor", UriKind.RelativeOrAbsolute); Console.WriteLine(uri1.IsAbsoluteUri); // true Console.WriteLine(uri2.IsAbsoluteUri); // false 

And to parse the HTML is better to use the tips from here .

  • In general, Uri simplified the cutting off of absolute paths, but the paths like //site.ru/img.jpg have to be manually //site.ru/img.jpg . - BwehaaFox
  • @BwehaaFox: Is this a valid address? - VladD
  • Well, as I was convinced, yes. How do I understand this is a kind of abbreviated alternative http:// for src and href - BwehaaFox