IIUC, you need to change the key before the reduce
, and then map your values back in the desired format.
You should be able to do the following:
new_rdd = rdd.map(lambda row: ((row[0], row[1][0]), row[1][1]))\
.reduceByKey(sum).
.map(lambda row: (row[0][0], (row[0][1], row[1])))
0
solved python : reduce by key with if condition statement?