I would suggest you to use sqlContext api and use the schema you have defined
val df = sqlContext.read
.format("com.databricks.spark.csv")
.option("delimiter", "\\t")
.schema(schema)
.load("path to your text file")
the schema is
val schema = StructType(Seq(
StructField("ID", IntegerType, true),
StructField("col1", DoubleType, true),
StructField("col2", IntegerType, true),
StructField("col3", DoubleType, true),
StructField("col4", DoubleType, true),
StructField("col5", DoubleType, true),
StructField("col6", DoubleType, true),
StructField("col7", DoubleType, true)
))
After that all you need is to apply avg
function on the grouped dataframe
as
import org.apache.spark.sql.functions._
val res1 = df.groupBy("ID").agg(avg("col1"),avg("col2"),avg("col3"),avg("col4"),avg("col5"),avg("col6"),avg("col7"))
finally you can save directly to csv
from dataframe
. You don’t need to convert to rdd
res1.coalesce(1).write.csv("/stuaverage/spoutput12")
0
solved Finding average value in spark scala gives blank result