A very minimal example:
stops = {'remove', 'these', 'words'}
strings = ['please do not remove these words', 'removal is not cool', 'please please these are the bees\' knees', 'there are no stopwords here']
strings_cleaned = [' '.join(word for word in s.split() if word not in stops) for s in strings]
Or you could do:
strings_cleaned = []
for s in strings:
word_list = []
for word in s.split():
if word not in stops:
word_list.append(word)
s_string = ' '.join(word_list)
strings_cleaned.append(s_string)
This is a lot uglier (I think) than the one-liner before it, but perhaps more intuitive.
Make sure you’re converting your container of stopwords to a set
(a hashable container which makes lookups O(1)
instead of list
s, whose lookups are O(n)
).
Edit: This is just a general, very straightforward example of how to remove stopwords. Your use case might be a little different, but since you haven’t provided a sample of your data, we can’t help any further.
2
solved Splitting to words in a list of strings