Good day! I am writing a regular expression to automatically download files from the site, I ran into a problem. I cannot find an element with the <a href> HTML tag:

 <div class="views-field views-field-title"> <span class="field-content"><a href="/ru/press/news/40586-maksim-bystrov-prinyal-uchastie-vo-vserossiyskom-soveshchanii-po-itogam-podgotovki"><strong>Максим Быстров принял участие во Всероссийском совещании по итогам подготовки к ОЗП 2017-2018 годов</strong></a></span> 

My regular season is: <div class="views-field views-field-title">[^&/]+<span class="field-content"><a href="([^&/]+)"><strong>([^&/]+)</strong></a></span>

    1 answer 1

    There are several problems in your expression. First, you do not escape special characters (namely, slash). Secondly, you are capturing whitespace in a very strange way. For this there is a special syntax - \ s. Third, to make a regular expression more readable, groups in it should be called. Here is an example of a more or less suitable expression:

     <div class="views-field views-field-title">\s*<span class="field-content">\s*<a href="(?<ref>[^"]*)">\s*<strong>(?<text>[^<]*)<\/strong>\s*<\/a>\s*<\/span> 

    Here you can see how it works .

    In general, in such cases it is better to use ready-made parsers, and look for the data you need through XPath .