rdd Archives - JassWeb

[Solved] Converting .txt Spark Output to .csv

January 14, 2023 by Kirat

This is a simple approach that converts the txt output data into a data structure (that can easily be written into a csv file). The basic idea is using data structures along with the amount of headers / columns in order to parse entry sets from the one liner txt output. Have a look at … Read more

[Solved] python : reduce by key with if condition statement?

January 14, 2023 by Kirat

IIUC, you need to change the key before the reduce, and then map your values back in the desired format. You should be able to do the following: new_rdd = rdd.map(lambda row: ((row[0], row[1][0]), row[1][1]))\ .reduceByKey(sum). .map(lambda row: (row[0][0], (row[0][1], row[1]))) 0 solved python : reduce by key with if condition statement?

[Solved] Working with Dates in Spark

October 22, 2022 by Kirat

So by just creating a quick rdd in the format of the csv-file you describe val list = sc.parallelize(List((“1″,”Timothy”,”04/02/2015″,”100″,”TV”), (“1″,”Timothy”,”04/03/2015″,”10″,”Book”), (“1″,”Timothy”,”04/03/2015″,”20″,”Book”), (“1″,”Timothy”,”04/05/2015″,”10″,”Book”),(“2″,”Ursula”,”04/02/2015″,”100″,”TV”))) And then running import java.time.LocalDate import java.time.format.DateTimeFormatter val startDate = LocalDate.of(2015,1,4) val endDate = LocalDate.of(2015,4,5) val result = list .filter{case(_,_,date,_,_) => { val localDate = LocalDate.parse(date, DateTimeFormatter.ofPattern(“MM/dd/yyyy”)) localDate.isAfter(startDate) && localDate.isBefore(endDate)}} .map{case(id, _, _, … Read more

[Solved] How to get the specified output without combineByKey and aggregateByKey in spark RDD

September 29, 2022 by Kirat

Here is a standard approach. Point to note: you need to be working with an RDD. I think that is the bottleneck. Here you go: val keysWithValuesList = Array(“foo=A”, “foo=A”, “foo=A”, “foo=A”, “foo=B”, “bar=C”,”bar=D”, “bar=D”) val sample=keysWithValuesList.map(_.split(“=”)).map(p=>(p(0),(p(1)))) val sample2 = sc.parallelize(sample.map(x => (x._1, 1))) val sample3 = sample2.reduceByKey(_+_) sample3.collect() val sample4 = sc.parallelize(sample.map(x => (x._1, … Read more