[Solved] How to extract sub-elements from the column of DataFrame in Spark 2? [duplicate]


You basically have two steps here: First is exploding the arrays (using the explode functions) to get a row for each value in the array, then fixing each element.

You do not have the schema here so the internal structure of each element in the array is not clear, however, I would assume it is something like a struct with two elements.

This means you would do something like this:

import org.apache.spark.sql.functions.explode
df1 = df.withColumn("array_elem", explode(df("products"))
df2 = df1.select("product_PK", "array_elem.*")

now all you have to do is rename the columns to the names you need.

solved How to extract sub-elements from the column of DataFrame in Spark 2? [duplicate]