## [Solved] How to get answer from this Spark Scala program for Input : s= ‘aaabbbccaabb’ Output : 3a3b2c2a2b

You can foldLeft over the input string, with a state of List[(Char, Int)]. Note that if you use Map[Char, Int], all occurrences of each character would be added up, weather they're beside each other or not. s.foldLeft(List.empty[(Char, Int)]) { case (Nil, newChar) => (newChar, 1) :: Nil case (list@(headChar, headCount) :: tail, newChar) => if … Read more

## [Solved] python : reduce by key with if condition statement?

IIUC, you need to change the key before the reduce, and then map your values back in the desired format. You should be able to do the following: new_rdd = rdd.map(lambda row: ((row[0], row[1][0]), row[1][1]))\ .reduceByKey(sum). .map(lambda row: (row[0][0], (row[0][1], row[1])))

## [Solved] Groupby fill missing values in dataframe based on average of previous values available and next value available

Perhaps this is helpful – Load the test data df2.show(false) df2.printSchema() /** * +—–+—–+ * |class|score| * +—–+—–+ * |A |null | * |A |46 | * |A |null | * |A |null | * |A |35 | * |A |null | * |A |null | * |A |null | * |A |46 | * … Read more

## [Solved] How to perform self join with same row of previous group(month) to bring in additional columns with different expressions in Pyspark

## [Solved] How to get a list view of columns and % of nans/nulls in Pyspark?

replace null_df.show() with : for i,j in null_df.first().asDict().items(): print(i,j)

## [Solved] How to create a dataframe from two others dataframe?

If this two column are same data type , you can just union a = predictons_lr.select('prediction') b = predictions_nb.select('prediction') new_df = a.union(b)

## [Solved] number of lines with number of words less than 5

I solved it. The problem was that I was trying to split a list. This is the new line rdd=rdd.filter(lambda line: len(line[0].split(" "))<5).collect()