What I think is happening here is that Spark serializes functions to send them over the wire. And that because your function (the one you’re passing to map
) calls the accessor param_user_minimal_rating_count
of object odbscan
, the entire object odbscan
will need to get serialized and sent along with it. Deserializing and then using that deserialized object will cause the code in its body to get executed again which causes an infinite loop of serializing–>sending–>deserializing–>executing–>serializing–>…
Probably the easiest thing to do here is changing that val
to final val param_user_minimal_rating_count = 2
so the compiler will inline the value. But note that this will only be a solution for literal constants. For more information see constant value definitions and constant expressions.
An other and better solution would be to refactor your code so that no instance variables are used in lambda expressions. Referencing vals that are defined in an object or class will get the whole object serialized. So try to only refer to vals that are local (to a method). And most importantly don’t execute your business logic from within a constructor/the body of an object or class.
1
solved Infinite loop when replacing concrete value by parameter name