[Solved] Spark in Business Intelligence

Introduction Business intelligence (BI) is a critical tool for organizations to gain insights into their data and make informed decisions. Spark is an open-source distributed computing platform that has become increasingly popular for its ability to process large amounts of data quickly and efficiently. Spark is a powerful tool for business intelligence, as it can … Read more

[Solved] How to get answer from this Spark Scala program for Input : s= ‘aaabbbccaabb’ Output : 3a3b2c2a2b

You can foldLeft over the input string, with a state of List[(Char, Int)]. Note that if you use Map[Char, Int], all occurrences of each character would be added up, weather they’re beside each other or not. s.foldLeft(List.empty[(Char, Int)]) { case (Nil, newChar) => (newChar, 1) :: Nil case (list@(headChar, headCount) :: tail, newChar) => if … Read more

[Solved] How can i make a for loop with if else and make the good return type

Your use of for does not behave as you expect. You’re using this for-comprehension: for (data <- category(i)) { if (data.startsWith(x)) true else false } This expression “desugars” into (i.e. is shorthand for): category(i).foreach(data => { if (data.startsWith(x)) true else false }) foreach returns Unit, and therefore the type of this expression (and the type … Read more

[Solved] Given the file path find the file extension using Scala?

You could achieve this as follows: import java.nio.file.Paths val path = “/home/gmc/exists.csv” val fileName = Paths.get(path).getFileName // Convert the path string to a Path object and get the “base name” from that path. val extension = fileName.toString.split(“\\.”).last // Split the “base name” on a . and take the last element – which is the extension. … Read more

[Solved] Input path does not exist

The error is pretty self explanatory, so it is probably something simple that you are missing. Can you modify your script and run it as shown below. Please modify the “fileName” value to where you think the file is. import java.nio.file.{Paths, Files} import sys.process._ /************ Modify this line with your data’s file name **************/ val … Read more

[Solved] Scala Filter Only Digits

I think this might solve your problem t.countByValue().filter(tupleOfCount=>Try(tupleOfCount._1.toInt).toOption.isEmpty).print() Use of isInstanceOf should be the last resort as @sergey said , so this code must solve the issue or else the pattern matching would be a good option too. solved Scala Filter Only Digits

[Solved] How can I access a method which return Option object?

The most iconic way ot do it is to unwrap values with scala is to use pattern matching to unwrap the value. entities match { case Some(queryEntities: QueryEntities) => queryEntities.entities.foreach { case e => println(e.columnFamily) println(e.fromDate.getOrElse(“defaultFromDateHere”) println(e.toDate.getOrElse(“defaultToDateHere”)) } case None => println(“No value”) } 9 solved How can I access a method which return Option … Read more

[Solved] Get the highest price with smaller ID when two ID have the same highest price in Scala

Try this. scala> val df = Seq((4, 30),(2,50),(3,10),(5,30),(1,50),(6,25)).toDF(“id”,”price”) df: org.apache.spark.sql.DataFrame = [id: int, price: int] scala> df.show +—+—–+ | id|price| +—+—–+ | 4| 30| | 2| 50| | 3| 10| | 5| 30| | 1| 50| | 6| 25| +—+—–+ scala> df.sort(desc(“price”), asc(“id”)).show +—+—–+ | id|price| +—+—–+ | 1| 50| | 2| 50| | 4| … Read more

[Solved] spark- scala:How to read data from .dat file transform it and finally store in HDFS

Please find the solution val rdd = sc.textFile(“/path/Test.dat”) val rddmap = rdd.map(i => i.split(” “)).map(i => (i(1),i(2))).sortByKey().map(i => i._1 + “%$” + i._2) rddmap.repartition(1).saveAsTextFile(“/path/TestOut1.dat”) output Jasper%$Pinto Jhon%$Ward Shally%$Stun 1 solved spark- scala:How to read data from .dat file transform it and finally store in HDFS

[Solved] MapReduce to Spark

This is a very broad question, but the short of it is: Create an RDD of the input data. Call map with your mapper code. Output key-value pairs. Call reduceByKey with your reducer code. Write the resulting RDD to disk. Spark is more flexible than MapReduce: there is a great variety of methods that you … Read more

[Solved] Working with Dates in Spark

So by just creating a quick rdd in the format of the csv-file you describe val list = sc.parallelize(List((“1″,”Timothy”,”04/02/2015″,”100″,”TV”), (“1″,”Timothy”,”04/03/2015″,”10″,”Book”), (“1″,”Timothy”,”04/03/2015″,”20″,”Book”), (“1″,”Timothy”,”04/05/2015″,”10″,”Book”),(“2″,”Ursula”,”04/02/2015″,”100″,”TV”))) And then running import java.time.LocalDate import java.time.format.DateTimeFormatter val startDate = LocalDate.of(2015,1,4) val endDate = LocalDate.of(2015,4,5) val result = list .filter{case(_,_,date,_,_) => { val localDate = LocalDate.parse(date, DateTimeFormatter.ofPattern(“MM/dd/yyyy”)) localDate.isAfter(startDate) && localDate.isBefore(endDate)}} .map{case(id, _, _, … Read more