-
Notifications
You must be signed in to change notification settings - Fork 252
Description
I got the following error after upgrading to GraphFrames 0.8.0 for Spark 3.0 from GraphFrames 0.8.0 for Spark 2.4:
AnalysisException: You're using untyped Scala UDF, which does not have the input type information. Spark may blindly pass null to the Scala closure with primitive-type argument, and the closure will see the default value of the Java type for the null argument, e.g. udf((x: Int) => x, IntegerType)
, the result is 0 for null input. To get rid of this error, you could:
AnalysisException Traceback (most recent call last)
in
----> 1 shortestPathToOmersVenturesDF = g.shortestPaths(landmarks=omersVenturesAsList)
2 distanceToOmersVenturesDF = shortestPathToOmersVenturesDF.select("id", "name", "domain", "facebook_url", "linkedin_url", "twitter_url", explode("distances").alias("uuid", "distance"))
3
4 display(distanceToOmersVenturesDF.sort(['distance', 'name'], ascending=[1, 1]))
/local_disk0/spark-b5c3fbb0-16cd-4049-8b97-3d2e50a269cc/userFiles-68efc905-cfeb-44ac-bb86-f01aeee17e23/addedFile3754751852010423874graphframes_0_8_0_spark3_0_s_2_12-a07ea.jar/graphframes/graphframe.py in shortestPaths(self, landmarks)
404 :return: DataFrame with new vertices column "distances"
405 """
--> 406 jdf = self._jvm_graph.shortestPaths().landmarks(landmarks).run()
407 return DataFrame(jdf, self._sqlContext)
408
/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in call(self, *args)
1303 answer = self.gateway_client.send_command(command)
1304 return_value = get_return_value(
-> 1305 answer, self.gateway_client, self.target_id, self.name)
1306
1307 for temp_arg in temp_args:
/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
132 # Hide where the exception came from that shows a non-Pythonic
133 # JVM exception message.
--> 134 raise_from(converted)
135 else:
136 raise
/databricks/spark/python/pyspark/sql/utils.py in raise_from(e)
AnalysisException: You're using untyped Scala UDF, which does not have the input type information. Spark may blindly pass null to the Scala closure with primitive-type argument, and the closure will see the default value of the Java type for the null argument, e.g. udf((x: Int) => x, IntegerType)
, the result is 0 for null input. To get rid of this error, you could:
- use typed Scala UDF APIs(without return type parameter), e.g.
udf((x: Int) => x)
- use Java UDF APIs, e.g.
udf(new UDF1[String, Integer] { override def call(s: String): Integer = s.length() }, IntegerType)
, if input types are all non primitive - set spark.sql.legacy.allowUntypedScalaUDF to true and use this API with caution;
I understand that there is a workaround, but it doesn't sound like the ideal solution.