How to select an empty group from the text if there is no match for re.match?

Question

Suppose there are two lines

s_1 = "SELECT * FROM t1;" s_2 = "SELECT * FROM t1 WHERE t1.id == 1;"

And the regular expression:

 regex = re.compile("?:SELECT (?P<columns>.*))(?:FROM (?P<source>))(?:WHERE (?P<where_clause>.*))")

How to fix it so that when you call "regex.match (s_1)" get the groups "columns", "source" and "where_clause" (c "None")?

Then you need the expression for s_3 = "SELECT * FROM t1 WHERE t1.id == 1 ORDER BY id;"
Have you tried existing SQL recognition tools like sqlparse ?
or based on one of the many libraries that parsers create such as: pyparsing, grako, PLY ?

Wiktor Stribiżew Wiktor Stribiżew 11.8k 2 13 32 · Answer 1 · 2017-03-10T23:30:37

As I understand it, the first two parts are required. Use the lazy quantifier in the second group, and make the third optional:

 import re p = '(?i)SELECT (?P<columns>.*) FROM (?P<source>.*?)(?: WHERE (?P<where_clause>.*))?;$' strs = ['select * from t1;', 'select * from t1 where c1 == 0;'] for s in strs: m = re.match(p, s) if m: print(m.group("columns")) print(m.group("source")) print(m.group("where_clause"))

See the Python demo , shows

 * t1 None * t1 c1 == 0

I also assume that the semicolon is required at the end of the line.

(?i) - includes case-insensitive search by pattern
SELECT (?P<columns>.*) - SELECT , space, 0+ any characters
FROM (?P<source>.*?) - space, FROM , space, 0+ any characters
(?: WHERE (?P<where_clause>.*))? - optional non-capturing group, find space, WHERE , space, 0+ any characters
; - ;
$ - end of line ( re.match binds a match only to the beginning of the line, but not the end, in re.fullmatch not needed).

How to select an empty group from the text if there is no match for re.match?

1 answer 1

More articles: