The idea that regular expressions don’t support inverse matching isn’t entirely accurate. While it’s not their primary purpose, you can simulate this behavior using negative lookarounds. For example, to match a line that doesn’t contain the word “hede”, you can use the following regex pattern:
^((?!hede).)*$
This pattern will match any string or line (without a line break) that does not contain the substring “hede”. If you need to match line breaks as well, you can use the DOT-ALL modifier (the trailing s
in the pattern):
/^((?!hede).)*$/s
Alternatively, you can use it inline:
/(?s)^((?!hede).)*$/
If the DOT-ALL modifier is unavailable, you can achieve the same behavior with the character class [\s\S]
:
/^((?!hede)[\s\S])*$/
Explanation:
A string consists of a list of n characters. Before and after each character, there’s an empty string. So, a list of n characters will have n+1 empty strings. For example, in the string “ABhedeCD”:
┌──┬───┬──┬───┬──┬───┬──┬───┬──┬───┬──┬───┬──┬───┬──┬───┬──┐
S = │e1│ A │e2│ B │e3│ h │e4│ e │e5│ d │e6│ e │e7│ C │e8│ D │e9│
└──┴───┴──┴───┴──┴───┴──┴───┴──┴───┴──┴───┴──┴───┴──┴───┴──┘
Here, the e’s represent empty strings. The regex (?!hede).
looks ahead to ensure that the substring “hede” is not present, and if it’s not, the .
(dot) matches any character except a line break. Lookarounds are zero-width assertions because they don’t consume any characters; they only assert or validate something.
In this example, each empty string is checked to ensure “hede” is not ahead before a character is consumed by the .
(dot). This check is repeated zero or more times to cover the entire input, and the start and end of the input are anchored to ensure the entire input is processed: ^((?!hede).)*$
.
If the input contains “ABhedeCD”, the regex will fail because at e3, the (?!hede)
check fails (indicating “hede” is ahead).