[Solved] Is there a faster solution to solve this kind of job? [closed]

Question

I’m not sure I understand your “job description”, but I think you want this:

def find_matches(tuple_of_dicts, key_to_find):
    return [d for d in tuple_of_dicts if key_to_find in d]

So:

>>> tuple_of_dicts = ({18: None}, {10: None}, {16: None, 18: None, 5: None, 6: None, 7: None, 10: None}, {16: None}, {7: None}, {10: None}, {18: None}, {16: None, 10: None, 18: None, 7: None}, {10: None, 18: None, 7: None}, {10: None}, {7: None}, {7: None}, {10: None, 7: None}, {7: None})
>>> find_matches(tuple_of_dicts, 18)
[{18: None},
 {5: None, 6: None, 7: None, 10: None, 16: None, 18: None},
 {18: None},
 {7: None, 10: None, 16: None, 18: None},
 {7: None, 10: None, 18: None}]

That works in linear time. If your tuple has N dicts, with an average of M members each, you walk the tuple, doing a constant-time dict lookup for each iteration, for a total of O(N).

But you can do even better than linear time, if you’re going to be doing a lot of such searches.

The trick is (as it sounds like you might have suspected) to build an index dictionary, mapping each key to the indices of the dictionaries it’s in, or just to the dictionaries themselves. For example:

>>> dict_of_dicts = {}
>>> for d in tuple_of_dicts:
...     for key in d:
...         dict_of_dicts.setdefault(key, []).append(d)
>>> def find_matches(dict_of_dicts, key_to_find):
...     return dict_of_dicts[key_to_find]

This requires O(N*M) time doing the setup work, and builds a dict of O(N*M) space,* but it’s then a simple O(1) dict lookup for each search. So, as long as you’re doing more than M searches, and you can afford the extra space, it’s a huge gain.

* To be precise: If you have L distinct keys, M total keys, you’re doing N*M lookups in a dict, N*M/L additions to the dict, and N*M appends into M/L-length lists. Since list append is amortized constant time, that’s O(N*M + N*M/L + N*M) = O(N*M) setup time. Meanwhile, the dict is O(N*L) space, and each member is a list of length O(M/L), so the total space used for the lists is O(N*L * M/L) = O(N*M), and the total space for the dict and its lists is O(N*L + N*M) = O(N*M). Finally, the search just hashes the value, looks it up in the dict, and returns a reference to the M/L-length list, all of which are constant-time operations, so each search is O(1).

Accepted Answer

I’m not sure I understand your “job description”, but I think you want this:

def find_matches(tuple_of_dicts, key_to_find):
    return [d for d in tuple_of_dicts if key_to_find in d]

So:

>>> tuple_of_dicts = ({18: None}, {10: None}, {16: None, 18: None, 5: None, 6: None, 7: None, 10: None}, {16: None}, {7: None}, {10: None}, {18: None}, {16: None, 10: None, 18: None, 7: None}, {10: None, 18: None, 7: None}, {10: None}, {7: None}, {7: None}, {10: None, 7: None}, {7: None})
>>> find_matches(tuple_of_dicts, 18)
[{18: None},
 {5: None, 6: None, 7: None, 10: None, 16: None, 18: None},
 {18: None},
 {7: None, 10: None, 16: None, 18: None},
 {7: None, 10: None, 18: None}]

That works in linear time. If your tuple has N dicts, with an average of M members each, you walk the tuple, doing a constant-time dict lookup for each iteration, for a total of O(N).

But you can do even better than linear time, if you’re going to be doing a lot of such searches.

The trick is (as it sounds like you might have suspected) to build an index dictionary, mapping each key to the indices of the dictionaries it’s in, or just to the dictionaries themselves. For example:

>>> dict_of_dicts = {}
>>> for d in tuple_of_dicts:
...     for key in d:
...         dict_of_dicts.setdefault(key, []).append(d)
>>> def find_matches(dict_of_dicts, key_to_find):
...     return dict_of_dicts[key_to_find]

This requires O(N*M) time doing the setup work, and builds a dict of O(N*M) space,* but it’s then a simple O(1) dict lookup for each search. So, as long as you’re doing more than M searches, and you can afford the extra space, it’s a huge gain.

* To be precise: If you have L distinct keys, M total keys, you’re doing N*M lookups in a dict, N*M/L additions to the dict, and N*M appends into M/L-length lists. Since list append is amortized constant time, that’s O(N*M + N*M/L + N*M) = O(N*M) setup time. Meanwhile, the dict is O(N*L) space, and each member is a list of length O(M/L), so the total space used for the lists is O(N*L * M/L) = O(N*M), and the total space for the dict and its lists is O(N*L + N*M) = O(N*M). Finally, the search just hashes the value, looks it up in the dict, and returns a reference to the M/L-length list, all of which are constant-time operations, so each search is O(1).