Error when using shortestPaths in GraphFrames 0.8.0 for Spark 3.0

I got the following error after upgrading to GraphFrames 0.8.0 for Spark 3.0 from GraphFrames 0.8.0 for Spark 2.4:

AnalysisException: You're using untyped Scala UDF, which does not have the input type information. Spark may blindly pass null to the Scala closure with primitive-type argument, and the closure will see the default value of the Java type for the null argument, e.g. `udf((x: Int) => x, IntegerType)`, the result is 0 for null input. To get rid of this error, you could:
---------------------------------------------------------------------------
AnalysisException                         Traceback (most recent call last)
<command-2369878291015010> in <module>
----> 1 shortestPathToOmersVenturesDF = g.shortestPaths(landmarks=omersVenturesAsList)
      2 distanceToOmersVenturesDF = shortestPathToOmersVenturesDF.select("id", "name", "domain", "facebook_url", "linkedin_url", "twitter_url", explode("distances").alias("uuid", "distance"))
      3 
      4 display(distanceToOmersVenturesDF.sort(['distance', 'name'], ascending=[1, 1]))

/local_disk0/spark-b5c3fbb0-16cd-4049-8b97-3d2e50a269cc/userFiles-68efc905-cfeb-44ac-bb86-f01aeee17e23/addedFile3754751852010423874graphframes_0_8_0_spark3_0_s_2_12-a07ea.jar/graphframes/graphframe.py in shortestPaths(self, landmarks)
    404         :return: DataFrame with new vertices column "distances"
    405         """
--> 406         jdf = self._jvm_graph.shortestPaths().landmarks(landmarks).run()
    407         return DataFrame(jdf, self._sqlContext)
    408 

/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1303         answer = self.gateway_client.send_command(command)
   1304         return_value = get_return_value(
-> 1305             answer, self.gateway_client, self.target_id, self.name)
   1306 
   1307         for temp_arg in temp_args:

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
    132                 # Hide where the exception came from that shows a non-Pythonic
    133                 # JVM exception message.
--> 134                 raise_from(converted)
    135             else:
    136                 raise

/databricks/spark/python/pyspark/sql/utils.py in raise_from(e)

AnalysisException: You're using untyped Scala UDF, which does not have the input type information. Spark may blindly pass null to the Scala closure with primitive-type argument, and the closure will see the default value of the Java type for the null argument, e.g. `udf((x: Int) => x, IntegerType)`, the result is 0 for null input. To get rid of this error, you could:
1. use typed Scala UDF APIs(without return type parameter), e.g. `udf((x: Int) => x)`
2. use Java UDF APIs, e.g. `udf(new UDF1[String, Integer] { override def call(s: String): Integer = s.length() }, IntegerType)`, if input types are all non primitive
3. set spark.sql.legacy.allowUntypedScalaUDF to true and use this API with caution;

I understand that there is a workaround, but it doesn't sound like the ideal solution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error when using shortestPaths in GraphFrames 0.8.0 for Spark 3.0 #367

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error when using shortestPaths in GraphFrames 0.8.0 for Spark 3.0 #367

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions