[Solved] Inverted Index in Python not returning desired results


Based on what you’re saying, I think you’re trying to get some data like this:

input = ["hello world", "foo bar", "red cat"]
data_wanted = {
    "foo" : 1,
    "hello" : 0,
    "cat" : 2,
    "world" : 0,
    "red" : 2
    "bar" : 1
}

So what you should be doing is adding the words as keys to a dictionary, and have their values be the index of the substring in strlist in which they are located.

def locateWords(strlist):
d = {}
for i, substr in enumerate(strlist):   # gives you the index and the item itself
    for word in substr.split()
        d[word] = i
return d

If the word occurs in more than one string in strlist, you should change the code to the following:

def locateWords(strlist):
d = {}
for i, substr in enumerate(strlist):
    for word in substr.split()
        if word not in d:
            d[word] = [i]
        else:
            d[word].append(i)
return d

This changes the values to lists, which contain the indices of the substrings in strlist which contain that word.

Some of your code’s problems explained

  1. {} is not a set, it’s a dictionary.
  2. break forces a loop to terminate immediately – you didn’t want to end the loop early because you still had data to process.
  3. d.update(index) will give you a TypeError: 'int' object is not iterable. This method actually takes an iterable object and updates the dictionary with it. Normally you would use a list of tuples for this: [("foo",1), ("hello",0)]. It just adds the data to the dictionary.
  4. You don’t normally want to use d.__setitem__ (which you typed wrong anyway). You’d just use d[key] = value.
  5. You can iterate using a “for each” style loop instead, like my code above shows. Looping over the range means you are looping over the indices. (Not exactly a problem, but it could lead to extra bugs if you’re not careful to use the indices properly).

It looks like you are coming from another programming language in which braces indicate sets and there is a keyword which ends control blocks (like if, fi). It’s easy to confuse syntax when you’re first starting – but if you run into trouble running the code, look at the exceptions you get and search them on the web!

P.S. I’m not sure why you wanted a set – if there are duplicates, you probably want to know all of their locations, not just the first or the last one or anything in between. Just my $0.02.

2

solved Inverted Index in Python not returning desired results