[Solved] Given the file path find the file extension using Scala?

You could achieve this as follows: import java.nio.file.Paths val path = “/home/gmc/exists.csv” val fileName = Paths.get(path).getFileName // Convert the path string to a Path object and get the “base name” from that path. val extension = fileName.toString.split(“\\.”).last // Split the “base name” on a . and take the last element – which is the extension. … Read more

[Solved] Get the highest price with smaller ID when two ID have the same highest price in Scala

Try this. scala> val df = Seq((4, 30),(2,50),(3,10),(5,30),(1,50),(6,25)).toDF(“id”,”price”) df: org.apache.spark.sql.DataFrame = [id: int, price: int] scala> df.show +—+—–+ | id|price| +—+—–+ | 4| 30| | 2| 50| | 3| 10| | 5| 30| | 1| 50| | 6| 25| +—+—–+ scala> df.sort(desc(“price”), asc(“id”)).show +—+—–+ | id|price| +—+—–+ | 1| 50| | 2| 50| | 4| … Read more

[Solved] Finding average value in spark scala gives blank result

I would suggest you to use sqlContext api and use the schema you have defined val df = sqlContext.read .format(“com.databricks.spark.csv”) .option(“delimiter”, “\\t”) .schema(schema) .load(“path to your text file”) the schema is val schema = StructType(Seq( StructField(“ID”, IntegerType, true), StructField(“col1”, DoubleType, true), StructField(“col2”, IntegerType, true), StructField(“col3”, DoubleType, true), StructField(“col4”, DoubleType, true), StructField(“col5”, DoubleType, true), StructField(“col6”, DoubleType, … Read more

[Solved] Spark 2.3: subtract dataframes but preserve duplicate values (Scala)

Turns out it’s easier to do df1.except(df2) and then join the results with df1 to get all the duplicates. Full code: def exceptAllCustom(df1: DataFrame, df2: DataFrame): DataFrame = { val except = df1.except(df2) val columns = df1.columns val colExpr: Column = df1(columns.head) <=> except(columns.head) val joinExpression = columns.tail.foldLeft(colExpr) { (colExpr, p) => colExpr && df1(p) … Read more