[Solved] python : reduce by key with if condition statement?

IIUC, you need to change the key before the reduce, and then map your values back in the desired format. You should be able to do the following: new_rdd = rdd.map(lambda row: ((row[0], row[1][0]), row[1][1]))\ .reduceByKey(sum). .map(lambda row: (row[0][0], (row[0][1], row[1]))) 0 solved python : reduce by key with if condition statement?

[Solved] Working with Dates in Spark

So by just creating a quick rdd in the format of the csv-file you describe val list = sc.parallelize(List((“1″,”Timothy”,”04/02/2015″,”100″,”TV”), (“1″,”Timothy”,”04/03/2015″,”10″,”Book”), (“1″,”Timothy”,”04/03/2015″,”20″,”Book”), (“1″,”Timothy”,”04/05/2015″,”10″,”Book”),(“2″,”Ursula”,”04/02/2015″,”100″,”TV”))) And then running import java.time.LocalDate import java.time.format.DateTimeFormatter val startDate = LocalDate.of(2015,1,4) val endDate = LocalDate.of(2015,4,5) val result = list .filter{case(_,_,date,_,_) => { val localDate = LocalDate.parse(date, DateTimeFormatter.ofPattern(“MM/dd/yyyy”)) localDate.isAfter(startDate) && localDate.isBefore(endDate)}} .map{case(id, _, _, … Read more

[Solved] How to get the specified output without combineByKey and aggregateByKey in spark RDD

Here is a standard approach. Point to note: you need to be working with an RDD. I think that is the bottleneck. Here you go: val keysWithValuesList = Array(“foo=A”, “foo=A”, “foo=A”, “foo=A”, “foo=B”, “bar=C”,”bar=D”, “bar=D”) val sample=keysWithValuesList.map(_.split(“=”)).map(p=>(p(0),(p(1)))) val sample2 = sc.parallelize(sample.map(x => (x._1, 1))) val sample3 = sample2.reduceByKey(_+_) sample3.collect() val sample4 = sc.parallelize(sample.map(x => (x._1, … Read more