[Solved] Finding average value in spark scala gives blank result


I would suggest you to use sqlContext api and use the schema you have defined

val df = sqlContext.read
  .format("com.databricks.spark.csv")
  .option("delimiter", "\\t")
  .schema(schema)
  .load("path to your text file") 

the schema is

val schema = StructType(Seq(
  StructField("ID", IntegerType, true),
  StructField("col1", DoubleType, true),
  StructField("col2", IntegerType, true),
  StructField("col3", DoubleType, true),
  StructField("col4", DoubleType, true),
  StructField("col5", DoubleType, true),
  StructField("col6", DoubleType, true),
  StructField("col7", DoubleType, true)
))

After that all you need is to apply avg function on the grouped dataframe as

import org.apache.spark.sql.functions._
val res1 = df.groupBy("ID").agg(avg("col1"),avg("col2"),avg("col3"),avg("col4"),avg("col5"),avg("col6"),avg("col7"))

finally you can save directly to csv from dataframe. You don’t need to convert to rdd

  res1.coalesce(1).write.csv("/stuaverage/spoutput12")

0

solved Finding average value in spark scala gives blank result