[Solved] Scala / Spark: map in word reduce


Here is an informal explaination:

The map() method, when applied to a collection of one type of thing (e.g. a collections of lines in a file) and provided a function (e.g. extract the second and third item from the given string) will return a collection of the results of applying that function to each item in the original collection (e.g. a collection of tuples containing the second and third items from each line).

The syntax

line=>(line.blah())

is a shorthand for defining a function. The input parameter is being declared as with the name ‘line’ and the output will be the result of evaluating the expression. In your expression, result is the second item on the line as a string and the third item as an integer (returned together as a ‘tuple’).

Here is a variation you can paste into the scala interactive interpreter that fakes the file and splits the line on spaces instead of tabs:

val file = List("111 222 333", "444 555 666")

file: List[String] = List(111 222 333, 444 555 666)

val tokenized =file.map(line=>(line.split(" ")(1),line.split(" ")(2).toInt))

tokenized: List[(String, Int)] = List((222,333), (555,666))

So, here you see the result is of the type List[(String, Int)]

solved Scala / Spark: map in word reduce