I am writing a regular expression for preg_split () so that the line is broken down by spaces + with all possible punctuation marks. While it looks like this:

$pattern = "/[\s,;.\!\-\?\:\(\)]+/"; 
  1. How to add a long dash to the expression, all possible quotes, etc. (maybe I forgot some other punctuation marks)?

  2. preg_split() with such an expression adds an empty element to the end of the array. It is clear that it can be cut later, but I do not like this decision. How to correct expression so that there is no empty element?

Thank!

  • one
    trim + /\W+/u to help you. - Dmitriy Simushev

1 answer 1

    1. You must decide what exactly the result is needed. If only the punctuation marks, then you have to completely list in the character class all the desired characters, including the Π΄Π»ΠΈΠ½Π½ΠΎΠ΅ Ρ‚ΠΈΡ€Π΅, всС Π²ΠΎΠ·ΠΌΠΎΠΆΠ½Ρ‹Π΅ ΠΊΠ°Π²Ρ‹Ρ‡ΠΊΠΈ ΠΈ Ρ‚.Π΄. - there is no universal character class to describe this.
    2. If you want to achieve a result in which it is divided into all literals that are not spaces and letters, then there is a character class [:punct:] . In regular expression, it will look like this:

       /[\s[:punct:]]+/ 
  1. To exclude empty elements, it suffices to pass the PREG_SPLIT_NO_EMPTY flag to the preg_split function.

     preg_split( "/[\s[:punct:]]+/", $text, -1, PREG_SPLIT_NO_EMPTY ) 

    http://ideone.com/iZjYU0

  • / [\ s [: punct:]] + / does not work, alas ((returns the whole string as it is - humster_spb
  • Sorry, when copy-paste inadvertently lost one colon. See an example on IDEone. If this does not work, give an example of the line. - ReinRaus