[Solved] how to remove all staff by using regular Expression [closed]

Question

Whenever you think “I need to match a pattern”, you should think “Regular Expressions” as a good starting point. See doco. It is a little trickier since the input file is unicode.

import re
import codecs

with codecs.open("test.unicode.txt","rb", "utf-8") as f:
    words = []
    for line in f.readlines():
        matches = re.match(b"solution:.+\[(?P<word>\w+).*\]", line, flags=re.U)
        if matches:
            words.append(matches.groups()[0])

print(words)

Output:

[u'katab']

Accepted Answer

Whenever you think “I need to match a pattern”, you should think “Regular Expressions” as a good starting point. See doco. It is a little trickier since the input file is unicode.

import re
import codecs

with codecs.open("test.unicode.txt","rb", "utf-8") as f:
    words = []
    for line in f.readlines():
        matches = re.match(b"solution:.+\[(?P<word>\w+).*\]", line, flags=re.U)
        if matches:
            words.append(matches.groups()[0])

print(words)

Output:

[u'katab']