Skip to content

Conversation

rjurney
Copy link
Collaborator

@rjurney rjurney commented Feb 25, 2025

What changes were proposed in this pull request?

I converted the unittest/nose tests to pytest.

Why are the changes needed?

This is important because nose does not support Python 3.10, which means we can't either and still run unit tests.

…s.txt and split out requirements-dev.txt. Version bumps.
@rjurney rjurney changed the title Convert nose/unittest tests to pytest Convert nose / unittest tests to pytest Feb 25, 2025
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move it to python/tests/test.py?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should really split it up into parts, but I want to wait and just keep it tests.py for now. I'll do another PR soon to break it up into like tests/connected_components.py and stuff.

"spark.submit.pyFiles",
os.path.abspath("python/dist/graphframes-0.8.4-py3-none-any.whl"),
)
cls.sc = SparkContext(master="local[4]", appName="GraphFramesTests", conf=cls.conf)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid any usages of the SparkContext at all and rely only on a SparkSession? It allows smoothly test both spark classic and spark connect. You can see how it is done in #506

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SemyonSinchenko do you mean this? Can I do the equivalent in Python?

      case MethodCase.CONNECTED_COMPONENTS => {
        val cc = apiMessage.getConnectedComponents
        graphFrame.connectedComponents
          .setAlgorithm(cc.getAlgorithm)
          .setCheckpointInterval(cc.getCheckpointInterval)
          .setBroadcastThreshold(cc.getBroadcastThreshold)
          .run()
      }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SemyonSinchenko it looks like this code was removed, so no longer an issue?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SemyonSinchenko sorry, what about this? I was confused - I see checkpointing happening elsewhere.

cls.spark._jsc.setCheckpointDir(cls.checkpointDir)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid any usages of the SparkContext at all and rely only on a SparkSession? It allows smoothly test both spark classic and spark connect. You can see how it is done in #506

I don't think so, no. At least I do not know how to accomplish that. My attempt went badly :)

def spark_session():
# Create a SparkSession with a smaller number of shuffle partitions.
spark = (
SparkSession(GraphFrameTestUtils.sc)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just explicitly pass all the same confs (app name, etc.) to the SparkSession.builder

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, what do you want me to do here? :) I don't understand.

Copy link
Collaborator Author

@rjurney rjurney Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SemyonSinchenko I want to skip this comment... it is hard to keep things working once I monkey with how this code works. I spent an hour and I'd rather leave this alone, ship pytest and then Spark Connect :)

cls.conf = SparkConf().setAppName("GraphFramesTests")
cls.conf.set(
"spark.submit.pyFiles",
os.path.abspath("python/dist/graphframes-0.8.4-py3-none-any.whl"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need it? If the project is installed via poetry install to the same venv, like pyspark, it will be here anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is gone!

Comment on lines 268 to 279
ranks = (
graph.pregel.setMaxIter(5)
.withVertexColumn(
"rank",
F.lit(1.0 / numVertices),
F.coalesce(Pregel.msg(), F.lit(0.0)) * F.lit(1.0 - alpha)
+ F.lit(alpha / numVertices),
)
.sendMsgToDst(Pregel.src("rank") / Pregel.src("outDegree"))
.aggMsgs(F.sum(Pregel.msg()))
.run()
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ranks = (
graph.pregel.setMaxIter(5)
.withVertexColumn(
"rank",
F.lit(1.0 / numVertices),
F.coalesce(Pregel.msg(), F.lit(0.0)) * F.lit(1.0 - alpha)
+ F.lit(alpha / numVertices),
)
.sendMsgToDst(Pregel.src("rank") / Pregel.src("outDegree"))
.aggMsgs(F.sum(Pregel.msg()))
.run()
)
pregel = graph.pregel
ranks = (
pregel.setMaxIter(5)
.withVertexColumn(
"rank",
F.lit(1.0 / numVertices),
F.coalesce(pregel.msg(), F.lit(0.0)) * F.lit(1.0 - alpha)
+ F.lit(alpha / numVertices),
)
.sendMsgToDst(pregel.src("rank") / pregel.src("outDegree"))
.aggMsgs(F.sum(pregel.msg()))
.run()
)

Because with support of SparkConnect there will be two possible types (Pregel classic and Pregel connect)

Copy link
Collaborator Author

@rjurney rjurney Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SemyonSinchenko I did this one.

@rjurney
Copy link
Collaborator Author

rjurney commented Mar 6, 2025

@SemyonSinchenko I'm starting my review of your PR now, it will take a few days... but do you mind if we get this one in first?

@rjurney
Copy link
Collaborator Author

rjurney commented Mar 6, 2025

@SemyonSinchenko trying to finish this as quickly as possible, almost got it done... waiting to see if tests pass :)

@rjurney
Copy link
Collaborator Author

rjurney commented Mar 6, 2025

Whoah, it builds! @SemyonSinchenko will definitely get your comments addressed this morning so we can merge and I can review your updated PR for Connect!

Copy link
Collaborator

@SemyonSinchenko SemyonSinchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall! I think we should merge it because in #506 I need to change testing fixtures again and there is no clear way to avoid it. So, let's merge it?

@rjurney rjurney merged commit 65a654e into master Mar 7, 2025
10 checks passed
@rjurney
Copy link
Collaborator Author

rjurney commented Mar 7, 2025

@SemyonSinchenko thanks for the flexibility - merging!

@rjurney rjurney deleted the rjurney/pytest branch April 15, 2025 00:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants