Hello! I ask for your help.

There is a text:

By origin natural honey can be floral and [[Fallen honey | padevy]]. Flower and [[honeydew honey]] Flower honey is produced by bees in the process of collecting and processing [[Nectar (sugary juice) | nectar]], secreted by [[nectar]] am plants both flowering and extra-flowering. [[Honeydew honey]] the bees produce by collecting [[Drops (beekeeping) | Drops]] (sweet discharge [[aphids]] and some other insects) and honey dew from the leaves or stems of the plants. Honeydew honey is toxic for bees, therefore it is not left in beehives for the period of wintering of bees. Types of flower honey

I want to choose everything in it in square brackets, for example: [[honeydew honey]] and [[nectary]]

As a separate regular, I want to select entries like these: [[aphids | aphids]] and [[Pad (beekeeping) | pad]] (i.e. with a vertical bar), but so that I can redo it in the preg_replace, taking separate line to the dash and after the dash.

I tried in the first case to do so: \[\[[а-я]+\]\] But this does not work in the entries with spaces. Trying to add a space: \[\[[а-я, ]+\]\] or so \[\[[а-я]+\s\]\] - does not work at all. I think if I understand the principle of the first regular season, the second will not be difficult. However, I would be grateful for the help.

    3 answers 3

    See an example

     # Если эти паттерны будут использоваться в preg_replace, # то выражения между знаками @ нужно обернуть в скобки $patterns = array( 'only_one_variant' => '@\[\[[^\]\|]++\]\]@', 'two_and_more_variants' => '@\[\[[^\]\|]++(?:\|[^\]\|]++)+\]\]@', ); # Разберем сначала паттерн @\[\[[^\]\|]++\]\]@ # @ обозначает начало и конец паттерна # \[\[ равносильно [[ # \]\] равносильно ]] # [^\]\|]++ - искать как можно более длинную последовательность символов без ] и | # Разберем теперь паттерн @\[\[[^\]\|]++(?:\|[^\]\|]++)+\]\]@ # Первую часть уже разобрали, смотри выше. А об этой (?:\|[^\]\|]++)+ подробнее ниже: # ?: указывает на то, что содержимое скобок запоминать не надо. Это просто группировка # \| равносильно | # [^\]\|]++ - искать как можно более длинную последовательность символов без ] и | # + - искать повторения группы в скобках от 1 до бесконечности раз foreach ($patterns as $pattern) { preg_match_all($pattern, $string, $matches); print_r($matches); } unset($matches); 

     Array ( [0] => Array ( [0] => [[падевый мёд]] [1] => [[нектарник]] [2] => [[Падевый мёд]] ) ) Array ( [0] => Array ( [0] => [[Падевый мёд|падевый|падевый]] [1] => [[Нектар (сахаристый сок)|нектара]] [2] => [[Падь (пчеловодство)|падь]] [3] => [[тля|тли]] ) ) 
       /\[\[[а-яёА-Я \(\)\|]+\]\]/U string(23) "[[Падевый мёд|падевый]]" [1]=> string(15) "[[падевый мёд]]" [2]=> string(35) "[[Нектар (сахаристый сок)|нектара]]" [3]=> string(13) "[[нектарник]]" [4]=> string(15) "[[Падевый мёд]]" [5]=> string(28) "[[Падь (пчеловодство)|падь]]" [6]=> string(11) "[[тля|тли]]" 
      • @lampa, matches with one option inside the brackets and with several options inside the brackets need to be separated according to different regulars by the conditions of the problem - VenZell
      • one
        @VenZell yes you can immediately make one aggregator: if (strpos ($ variable, "|")) {// expression} else {// word} - lampa

      [az] + \ s - with this expression you find all the words at the end of which there is a space. If there is no space there is nothing.

      in the first case, the space may or may not be, in the second, there may be one or more words, therefore - ([a-z] + \ s?) +

      Even in your text there are words with a capital letter that you also do not take, consider this