As I understand it, the first two parts are required. Use the lazy quantifier in the second group, and make the third optional:
import re p = '(?i)SELECT (?P<columns>.*) FROM (?P<source>.*?)(?: WHERE (?P<where_clause>.*))?;$' strs = ['select * from t1;', 'select * from t1 where c1 == 0;'] for s in strs: m = re.match(p, s) if m: print(m.group("columns")) print(m.group("source")) print(m.group("where_clause"))
See the Python demo , shows
* t1 None * t1 c1 == 0
I also assume that the semicolon is required at the end of the line.
(?i) - includes case-insensitive search by patternSELECT (?P<columns>.*) - SELECT , space, 0+ any charactersFROM (?P<source>.*?) - space, FROM , space, 0+ any characters(?: WHERE (?P<where_clause>.*))? - optional non-capturing group, find space, WHERE , space, 0+ any characters; - ;$ - end of line ( re.match binds a match only to the beginning of the line, but not the end, in re.fullmatch not needed).
|$? - Qwertiy ♦(?:WHERE (?P<where_clause>.*))?then this group will simply not be if there was no where - Mike