-
Notifications
You must be signed in to change notification settings - Fork 252
feat: add LDBC tests #570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add LDBC tests #570
Conversation
@rjurney @SauronShepherd @james-willis Hi guys! What do you think about it? Should it be merged with creating a follow-up tickets, or should it be left as pending until all the tests are passed? At the moment there are problems in:
Alternatively, I can disable these tests for now to allow contributors enable them manually. |
And, of course, the most interesting thing, is that with scala 2.13 CDLP test passed, with 2.12 failed. |
I'd like to have a look at it next weekend. I love challenging mysteries!!! |
But what about this PR? If you will work one of found issues, it would be much easier to have these tests in main. For example, I want to work on the GraphX-free implementation of LabelPropagation and I want to have these tests too. If you are fine with this PR overall, let's merge it? cc: @rjurney |
Yes, of course. I think I've mixed up this new feature with the open issue. |
I fixed small things (access modifiers, typo, etc.) and this PR is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
Why is the new test suite named |
Why is the test internally downloading the datasets? Why are we forcing the user to install a command-line tool like Also, why are the datasets being placed in the target folder? I’d prefer to locate them in the project root for better visibility and consistency. Additionally, rebuilding the project or changing the Scala version could trigger unnecessary re-downloads of the datasets. |
@SauronShepherd these are good points |
Instead of explicitly defining the algorithms to traverse in each test, wouldn’t it be better to retrieve them from the algorithm itself? After all, what could be better than the algorithm itself in determining its own supported types? I mean this ...
|
GraphX is deprecated in Apache Spark 4.0 and it is buggy (#571); it is very hard to support for GraphX implementations such things like weights, directed/undirected, properties, etc. To be honest I was hoping that after finishing #556 with a similar like GraphX performance, we can dpeprecate all the GraphX-related things here. For example, we can mark it deprecated in GraphFrames 1.0. In that case, is there a reason to make tests overcomplicated? What is bad with
Overall answer is the following: LDBC contains not only toy datasets, but also graphs of different size. And even LDBC XS is already bigger than 100 Mb and cannot fit into GitHub limits (that are about 50 Mb as I remember). Of course we can store small files in resources and download big ones in runtime if you prefer that way. But because the structure of all the LDBC datasets is the same, I made a decision to do it like it is done.
I can fix it. Please, could you create a follow-up ticket? Thanks! |
It is definitely desirable to download these datasets at runtime. I do this for the motif finding tutorial. |
We're talking about <2KB files. Do we really want to force users to install a command tool (
What that has to do with the fact that the tests had not only to know but also enumerate the types of implementations used in the algorithm they're testing?
Adding a common method in the algorithm classes that simply returns a list with the types of implementations each algorithm class supports, so the tests traverses these lists and not an enumeration... is really that overcomplicated? We already have a Besides, imagine we add a new type of implementation in one of the algorithms ... do we also have to update the enumerated list in the test?
It's not available on Windows and the Ubuntu WSL I have installed. |
I'm fine with the idea, can you open a PR? |
lgtm |
I'm investigating the scala 2.12 issue. Let's wait for any changes to fix that and, if there's no changes in GraphFrames, I'll open a PR with the |
What changes were proposed in this pull request?
Why are the changes needed?
Close #568
WARNING This PR contains failed tests! It looks like I found bugs in implementations of PR and CDLP, but fixing this should be done as separate PRs!!!