I am trying to do a grab everything after the ‘’ tag and delete it, but my code doesn’t seem to be doing anything. Does .replace() not support regex?
z.write(article.replace(‘.+’, ‘’))
I am trying to do a grab everything after the ‘’ tag and delete it, but my code doesn’t seem to be doing anything. Does .replace() not support regex?
z.write(article.replace(‘.+’, ‘’))
Hello Neha Gupta,
The .replace() method in Python does not support regular expressions. Instead, you can use the re.sub() function from the re module to achieve this. Here are three ways to resolve the issue:
Using re.sub() to remove everything after : import re
article = “This is an HTML document. More content here.” modified_article = re.sub(r’.+', ‘’, article)
print(modified_article)
Hello Nehagupta,
Using string slicing to remove everything after </html>:
article = “<html>This is an HTML document.</html>More content here.” index = article.find(‘’) modified_article = article[:index + len(‘</html>’)]
print(modified_article)
Hey Neha Gupta,
Using re.split() to split the string at and then rejoin the parts before :
article = " <htm>This is an HTML document.</html> More content here." parts = re.split(r’(</html>)', article) modified_article = ‘’.join(parts[:2])
print(modified_article)