[Solved] saving large streaming data in python


A little-known fact about the Python native pickle format is that you can happily concatenate them into a file.

That is, simply open a file in append mode and pickle.dump() your dictionary into that file. If you want to be extra fancy, you could do something like timestamped files:

def ingest_data(data_dict):
    filename="%s.pickles" % date.strftime('%Y-%m-%d_%H')
    with open(filename, 'ab') as outf:
        pickle.dump(data_dict, outf, pickle.HIGHEST_PROTOCOL)


def read_data(filename):
    with open(filename, 'rb') as inf:
        while True:
            yield pickle.load(inf)  # TODO: handle EOF error

solved saving large streaming data in python