Suppose there are two lines

s_1 = "SELECT * FROM t1;" s_2 = "SELECT * FROM t1 WHERE t1.id == 1;" 

And the regular expression:

 regex = re.compile("?:SELECT (?P<columns>.*))(?:FROM (?P<source>))(?:WHERE (?P<where_clause>.*))") 

How to fix it so that when you call "regex.match (s_1)" get the groups "columns", "source" and "where_clause" (c "None")?

1 answer 1

As I understand it, the first two parts are required. Use the lazy quantifier in the second group, and make the third optional:

 import re p = '(?i)SELECT (?P<columns>.*) FROM (?P<source>.*?)(?: WHERE (?P<where_clause>.*))?;$' strs = ['select * from t1;', 'select * from t1 where c1 == 0;'] for s in strs: m = re.match(p, s) if m: print(m.group("columns")) print(m.group("source")) print(m.group("where_clause")) 

See the Python demo , shows

 * t1 None * t1 c1 == 0 

I also assume that the semicolon is required at the end of the line.

  • (?i) - includes case-insensitive search by pattern
  • SELECT (?P<columns>.*) - SELECT , space, 0+ any characters
  • FROM (?P<source>.*?) - space, FROM , space, 0+ any characters
  • (?: WHERE (?P<where_clause>.*))? - optional non-capturing group, find space, WHERE , space, 0+ any characters
  • ; - ;
  • $ - end of line ( re.match binds a match only to the beginning of the line, but not the end, in re.fullmatch not needed).