[Solved] python : reduce by key with if condition statement?

[ad_1] IIUC, you need to change the key before the reduce, and then map your values back in the desired format. You should be able to do the following: new_rdd = rdd.map(lambda row: ((row[0], row[1][0]), row[1][1]))\ .reduceByKey(sum). .map(lambda row: (row[0][0], (row[0][1], row[1]))) 0 [ad_2] solved python : reduce by key with if condition statement?

[Solved] Working with Dates in Spark

[ad_1] So by just creating a quick rdd in the format of the csv-file you describe val list = sc.parallelize(List((“1″,”Timothy”,”04/02/2015″,”100″,”TV”), (“1″,”Timothy”,”04/03/2015″,”10″,”Book”), (“1″,”Timothy”,”04/03/2015″,”20″,”Book”), (“1″,”Timothy”,”04/05/2015″,”10″,”Book”),(“2″,”Ursula”,”04/02/2015″,”100″,”TV”))) And then running import java.time.LocalDate import java.time.format.DateTimeFormatter val startDate = LocalDate.of(2015,1,4) val endDate = LocalDate.of(2015,4,5) val result = list .filter{case(_,_,date,_,_) => { val localDate = LocalDate.parse(date, DateTimeFormatter.ofPattern(“MM/dd/yyyy”)) localDate.isAfter(startDate) && localDate.isBefore(endDate)}} .map{case(id, _, … Read more

[Solved] How to get the specified output without combineByKey and aggregateByKey in spark RDD

[ad_1] Here is a standard approach. Point to note: you need to be working with an RDD. I think that is the bottleneck. Here you go: val keysWithValuesList = Array(“foo=A”, “foo=A”, “foo=A”, “foo=A”, “foo=B”, “bar=C”,”bar=D”, “bar=D”) val sample=keysWithValuesList.map(_.split(“=”)).map(p=>(p(0),(p(1)))) val sample2 = sc.parallelize(sample.map(x => (x._1, 1))) val sample3 = sample2.reduceByKey(_+_) sample3.collect() val sample4 = sc.parallelize(sample.map(x => … Read more