[Solved] Splitting to words in a list of strings


A very minimal example:

stops = {'remove', 'these', 'words'}

strings = ['please do not remove these words', 'removal is not cool', 'please please these are the bees\' knees', 'there are no stopwords here']

strings_cleaned = [' '.join(word for word in s.split() if word not in stops) for s in strings]

Or you could do:

strings_cleaned = []
for s in strings:
    word_list = []
    for word in s.split():
        if word not in stops:
            word_list.append(word)
    s_string = ' '.join(word_list)
    strings_cleaned.append(s_string)

This is a lot uglier (I think) than the one-liner before it, but perhaps more intuitive.

Make sure you’re converting your container of stopwords to a set (a hashable container which makes lookups O(1) instead of lists, whose lookups are O(n)).

Edit: This is just a general, very straightforward example of how to remove stopwords. Your use case might be a little different, but since you haven’t provided a sample of your data, we can’t help any further.

2

solved Splitting to words in a list of strings