Whenever you think “I need to match a pattern”, you should think “Regular Expressions” as a good starting point. See doco. It is a little trickier since the input file is unicode.
import re
import codecs
with codecs.open("test.unicode.txt","rb", "utf-8") as f:
words = []
for line in f.readlines():
matches = re.match(b"solution:.+\[(?P<word>\w+).*\]", line, flags=re.U)
if matches:
words.append(matches.groups()[0])
print(words)
Output:
[u'katab']
solved how to remove all staff by using regular Expression [closed]