Does `.replace()` support regex? My code `z.write(article.replace('</html>.+', '</html>'))` doesn't remove everything after the `</html>` tag

nehagupta.1798 · June 20, 2024, 8:30am

I am trying to do a grab everything after the ‘’ tag and delete it, but my code doesn’t seem to be doing anything. Does .replace() not support regex?

z.write(article.replace(‘.+’, ‘’))

akanshasrivastava.1121 · June 24, 2024, 5:18pm

Hello Neha Gupta,

The .replace() method in Python does not support regular expressions. Instead, you can use the re.sub() function from the re module to achieve this. Here are three ways to resolve the issue:

Using re.sub() to remove everything after : import re

article = “This is an HTML document. More content here.” modified_article = re.sub(r’.+', ‘’, article)

print(modified_article)

macy-davis · June 26, 2024, 8:47am

Hello Nehagupta,

Using string slicing to remove everything after </html>:

article = “<html>This is an HTML document.</html>More content here.” index = article.find(‘’) modified_article = article[:index + len(‘</html>’)]

print(modified_article)

macy-davis · June 28, 2024, 7:37am

Hey Neha Gupta,

Using re.split() to split the string at and then rejoin the parts before :

article = " <htm>This is an HTML document.</html> More content here." parts = re.split(r’(</html>)', article) modified_article = ‘’.join(parts[:2])

print(modified_article)