Skip to content

Error when using shortestPaths in GraphFrames 0.8.0 for Spark 3.0 #367

@ebressot

Description

@ebressot

I got the following error after upgrading to GraphFrames 0.8.0 for Spark 3.0 from GraphFrames 0.8.0 for Spark 2.4:

AnalysisException: You're using untyped Scala UDF, which does not have the input type information. Spark may blindly pass null to the Scala closure with primitive-type argument, and the closure will see the default value of the Java type for the null argument, e.g. udf((x: Int) => x, IntegerType), the result is 0 for null input. To get rid of this error, you could:

AnalysisException Traceback (most recent call last)
in
----> 1 shortestPathToOmersVenturesDF = g.shortestPaths(landmarks=omersVenturesAsList)
2 distanceToOmersVenturesDF = shortestPathToOmersVenturesDF.select("id", "name", "domain", "facebook_url", "linkedin_url", "twitter_url", explode("distances").alias("uuid", "distance"))
3
4 display(distanceToOmersVenturesDF.sort(['distance', 'name'], ascending=[1, 1]))

/local_disk0/spark-b5c3fbb0-16cd-4049-8b97-3d2e50a269cc/userFiles-68efc905-cfeb-44ac-bb86-f01aeee17e23/addedFile3754751852010423874graphframes_0_8_0_spark3_0_s_2_12-a07ea.jar/graphframes/graphframe.py in shortestPaths(self, landmarks)
404 :return: DataFrame with new vertices column "distances"
405 """
--> 406 jdf = self._jvm_graph.shortestPaths().landmarks(landmarks).run()
407 return DataFrame(jdf, self._sqlContext)
408

/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in call(self, *args)
1303 answer = self.gateway_client.send_command(command)
1304 return_value = get_return_value(
-> 1305 answer, self.gateway_client, self.target_id, self.name)
1306
1307 for temp_arg in temp_args:

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
132 # Hide where the exception came from that shows a non-Pythonic
133 # JVM exception message.
--> 134 raise_from(converted)
135 else:
136 raise

/databricks/spark/python/pyspark/sql/utils.py in raise_from(e)

AnalysisException: You're using untyped Scala UDF, which does not have the input type information. Spark may blindly pass null to the Scala closure with primitive-type argument, and the closure will see the default value of the Java type for the null argument, e.g. udf((x: Int) => x, IntegerType), the result is 0 for null input. To get rid of this error, you could:

  1. use typed Scala UDF APIs(without return type parameter), e.g. udf((x: Int) => x)
  2. use Java UDF APIs, e.g. udf(new UDF1[String, Integer] { override def call(s: String): Integer = s.length() }, IntegerType), if input types are all non primitive
  3. set spark.sql.legacy.allowUntypedScalaUDF to true and use this API with caution;

I understand that there is a workaround, but it doesn't sound like the ideal solution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions