What is the best way to access match groups in Python regex match group without explicitly creating a match object, or is there a more elegant way to improve the example below?
Here’s a Perl code snippet for reference:
if ($statement =~ /I love (\w+)/) {
print "He loves $1\n";
}
elsif ($statement =~ /Ich liebe (\w+)/) {
print "Er liebt $1\n";
}
elsif ($statement =~ /Je t\'aime (\w+)/) {
print "Il aime $1\n";
}
This is translated into Python regex match group as:
m = re.search("I love (\w+)", statement)
if m:
print("He loves", m.group(1))
else:
m = re.search("Ich liebe (\w+)", statement)
if m:
print("Er liebt", m.group(1))
else:
m = re.search("Je t'aime (\w+)", statement)
if m:
print("Il aime", m.group(1))
However, the nested if-else-cascade and the repeated creation of match objects seem awkward. What is a cleaner or more efficient way to handle this in Python?
I’ve worked with regex quite a bit, and I know how annoying it gets when you keep repeating match object creation. A simple class can make this much cleaner!
import re
class REMatcher:
def __init__(self, matchstring):
self.matchstring = matchstring
def match(self, regexp):
self.rematch = re.match(regexp, self.matchstring)
return bool(self.rematch)
def group(self, i):
return self.rematch.group(i)
statements = ["I love Mary", "Ich liebe Margot", "Je t'aime Marie", "Te amo Maria"]
for statement in statements:
m = REMatcher(statement)
if m.match(r"I love (\w+)"):
print("He loves", m.group(1))
elif m.match(r"Ich liebe (\w+)"):
print("Er liebt", m.group(1))
elif m.match(r"Je t'aime (\w+)"):
print("Il aime", m.group(1))
else:
print("???")
This makes working with Python regex match group much easier since the REMatcher
class takes care of everything, allowing for cleaner and reusable code!
Okay, I see the class-based approach makes things cleaner, but what if we could do this even more concisely? Enter the walrus operator (:=
), which lets us assign and check the match in one go!
import re
statements = ["I love Mary", "Ich liebe Margot", "Je t'aime Marie", "Te amo Maria"]
for statement in statements:
if m := re.match(r"I love (\w+)", statement):
print("He loves", m.group(1))
elif m := re.match(r"Ich liebe (\w+)", statement):
print("Er liebt", m.group(1))
elif m := re.match(r"Je t'aime (\w+)", statement):
print("Il aime", m.group(1))
else:
print("???")
Why is this better?
- No need for a separate class.
- The match object (
m
) is created only when needed.
- The Python regex match group is accessed directly.
I love the assignment expression trick! But what if we have a ton of patterns to match? Instead of manually checking each one, let’s use a dictionary to keep things DRY (Don’t Repeat Yourself)!"
import re
patterns = {
r"I love (\w+)": "He loves",
r"Ich liebe (\w+)": "Er liebt",
r"Je t'aime (\w+)": "Il aime"
}
statements = ["I love Mary", "Ich liebe Margot", "Je t'aime Marie", "Te amo Maria"]
for statement in statements:
for pattern, phrase in patterns.items():
if m := re.match(pattern, statement):
print(phrase, m.group(1))
break
else:
print("???")
Why is this great?
- Easily extendable for more patterns.
- Removes repetitive
if-elif
blocks.
- Still uses assignment expressions for efficiency.
- Keeps Python regex match group extraction smooth.