[Solved] How to find the difference between 1st row and nth row of a dataframe based on a condition using Spark Windowing

Shown here is a PySpark solution. You can use conditional aggregation with max(when…)) to get the necessary difference of ranks with the first ‘PD’ row. After getting the difference, use a when… to null out rows with negative ranks as they all occur after the first ‘PD’ row. # necessary imports w1 = Window.partitionBy(df.id).orderBy(df.svc_dt) df … Read more