-
Notifications
You must be signed in to change notification settings - Fork 252
feat: add an experimental implementation of SP in GF #558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add an experimental implementation of SP in GF #558
Conversation
@SauronShepherd @rjurney Hi! Take a look when you have time! Thanks! |
Some points:
|
Deleted. I just wanted to mark somehow the method as exprimental.
Fixed
I moved it to the trait.
Without it I'm getting error (see CI fail). May be fixed by importing javaConversions, but let's leave it for the next PRs? There are a lot of such a things inside the code, it may be a good first issue.
That is how GraphX impl works at the moment. It may be hard to expect from user a good estimation, it is not something like PageRank. A wrong choice of maxIter will tend that you will get a valid result and you cannot even check the correctness. For example, algo returns you that one of vertices has distance 5 to one of landmakrs, how would you check that it is correct? Maybe on the next iteration it would be 4...
See above. I think shortestPaths is not the algorithm when we can leave it to user.
Fixed
I do not like long one-liners tbh. Btw, fixed.
Fixed
I added an explanation as a comment, please see in the code.
Tbh, I want to update tests. I'm going to do it as part as introduction of LDBC checks because for LDBC graphs we have ground truth. Does it sound good to you? |
I liked the idea and didn't know such exprimental annotation existed in Spark. It's only that ... it seemed somehow odd to me using it, because it's like something internal of Spark. That's all.
My IDE (IntelliJ) highlighted that. I drop the
I thought this was like the feature in Neo4J, that you could set the number of maximum "jumps".
I still think it would be nice for the the user to be able to define a threshold. But i'm not an expert on graphs.
I do not like using single-line methods that are only used in one place. I think they make the code harder to read, but maybe this is more a matter of personal preference than anything else.
Yes, of course. We don't need to improve everything in one go, do we? I've seen that you've introduced some changes in the |
I've just had a look at your changes in @rjurney, could you please merge this PR when you get a chance? (once the tests get fixed) |
@rjurney What do you think about this PR? |
@SauronShepherd @rjurney Hi! This one is ready for review, the only change from the last round is adding early stopping. This implementation should be considered as experimental because it is still slower compared to the GraphX-based one. I'm going to continue my work on the performance improvement as part of overall Pregel revisiting. |
@SemyonSinchenko you’re so productive on the code base that I think I’m going to continue the documentation work I did with motifs by starting a Pregel tutorial to go along with the motif finding one. It is a really powerful feature that I’ve been trying to learn to mentally use for a few years now but haven’t done much with it. This could open up the API to a lot more people and give them another reason to use GraphFrames and your improvements. Thoughts? created #562 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
I'm going to merge this one after #370 |
cool |
What changes were proposed in this pull request?
.gitignore
Why are the changes needed?
GraphX is deprecated in Spark 4.0; close #557