-
Notifications
You must be signed in to change notification settings - Fork 252
Convert nose / unittest tests to pytest #524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…s.txt and split out requirements-dev.txt. Version bumps.
…ney/build-upgrades
…ney/build-upgrades
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move it to python/tests/test.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should really split it up into parts, but I want to wait and just keep it tests.py
for now. I'll do another PR soon to break it up into like tests/connected_components.py
and stuff.
python/graphframes/tests.py
Outdated
"spark.submit.pyFiles", | ||
os.path.abspath("python/dist/graphframes-0.8.4-py3-none-any.whl"), | ||
) | ||
cls.sc = SparkContext(master="local[4]", appName="GraphFramesTests", conf=cls.conf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid any usages of the SparkContext at all and rely only on a SparkSession? It allows smoothly test both spark classic and spark connect. You can see how it is done in #506
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SemyonSinchenko do you mean this? Can I do the equivalent in Python?
case MethodCase.CONNECTED_COMPONENTS => {
val cc = apiMessage.getConnectedComponents
graphFrame.connectedComponents
.setAlgorithm(cc.getAlgorithm)
.setCheckpointInterval(cc.getCheckpointInterval)
.setBroadcastThreshold(cc.getBroadcastThreshold)
.run()
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SemyonSinchenko it looks like this code was removed, so no longer an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SemyonSinchenko sorry, what about this? I was confused - I see checkpointing happening elsewhere.
cls.spark._jsc.setCheckpointDir(cls.checkpointDir)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid any usages of the SparkContext at all and rely only on a SparkSession? It allows smoothly test both spark classic and spark connect. You can see how it is done in #506
I don't think so, no. At least I do not know how to accomplish that. My attempt went badly :)
def spark_session(): | ||
# Create a SparkSession with a smaller number of shuffle partitions. | ||
spark = ( | ||
SparkSession(GraphFrameTestUtils.sc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just explicitly pass all the same confs (app name, etc.) to the SparkSession.builder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, what do you want me to do here? :) I don't understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SemyonSinchenko I want to skip this comment... it is hard to keep things working once I monkey with how this code works. I spent an hour and I'd rather leave this alone, ship pytest
and then Spark Connect :)
python/graphframes/tests.py
Outdated
cls.conf = SparkConf().setAppName("GraphFramesTests") | ||
cls.conf.set( | ||
"spark.submit.pyFiles", | ||
os.path.abspath("python/dist/graphframes-0.8.4-py3-none-any.whl"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need it? If the project is installed via poetry install
to the same venv, like pyspark, it will be here anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is gone!
python/graphframes/tests.py
Outdated
ranks = ( | ||
graph.pregel.setMaxIter(5) | ||
.withVertexColumn( | ||
"rank", | ||
F.lit(1.0 / numVertices), | ||
F.coalesce(Pregel.msg(), F.lit(0.0)) * F.lit(1.0 - alpha) | ||
+ F.lit(alpha / numVertices), | ||
) | ||
.sendMsgToDst(Pregel.src("rank") / Pregel.src("outDegree")) | ||
.aggMsgs(F.sum(Pregel.msg())) | ||
.run() | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ranks = ( | |
graph.pregel.setMaxIter(5) | |
.withVertexColumn( | |
"rank", | |
F.lit(1.0 / numVertices), | |
F.coalesce(Pregel.msg(), F.lit(0.0)) * F.lit(1.0 - alpha) | |
+ F.lit(alpha / numVertices), | |
) | |
.sendMsgToDst(Pregel.src("rank") / Pregel.src("outDegree")) | |
.aggMsgs(F.sum(Pregel.msg())) | |
.run() | |
) | |
pregel = graph.pregel | |
ranks = ( | |
pregel.setMaxIter(5) | |
.withVertexColumn( | |
"rank", | |
F.lit(1.0 / numVertices), | |
F.coalesce(pregel.msg(), F.lit(0.0)) * F.lit(1.0 - alpha) | |
+ F.lit(alpha / numVertices), | |
) | |
.sendMsgToDst(pregel.src("rank") / pregel.src("outDegree")) | |
.aggMsgs(F.sum(pregel.msg())) | |
.run() | |
) |
Because with support of SparkConnect there will be two possible types (Pregel classic and Pregel connect)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SemyonSinchenko I did this one.
@SemyonSinchenko I'm starting my review of your PR now, it will take a few days... but do you mind if we get this one in first? |
@SemyonSinchenko trying to finish this as quickly as possible, almost got it done... waiting to see if tests pass :) |
Whoah, it builds! @SemyonSinchenko will definitely get your comments addressed this morning so we can merge and I can review your updated PR for Connect! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall! I think we should merge it because in #506 I need to change testing fixtures again and there is no clear way to avoid it. So, let's merge it?
@SemyonSinchenko thanks for the flexibility - merging! |
What changes were proposed in this pull request?
I converted the
unittest
/nose
tests topytest
.Why are the changes needed?
This is important because
nose
does not support Python 3.10, which means we can't either and still run unit tests.