Skip to content

Conversation

architch
Copy link
Contributor

@architch architch commented Feb 9, 2019

This PR changes the label to minimum of the vertex ID belonging to a component.
The assigned label of output of Connected Component is currently a random number which is not consistent. If some delta edges are added to the graph, the output label of unaffected components also changes.

Output of friends graph example:
friends graph output

@codecov-io
Copy link

Codecov Report

Merging #320 into master will increase coverage by 0.05%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #320      +/-   ##
==========================================
+ Coverage   91.17%   91.23%   +0.05%     
==========================================
  Files          18       18              
  Lines         873      878       +5     
  Branches       73       73              
==========================================
+ Hits          796      801       +5     
  Misses         77       77
Impacted Files Coverage Δ
...cala/org/graphframes/lib/ConnectedComponents.scala 94.69% <100%> (+0.24%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 89312f3...3dac6e7. Read the comment docs.

@architch
Copy link
Contributor Author

@mengxr @felixcheung any thoughts on this one.

@ericsun95
Copy link

This is very helpful, any updates?

@followingell
Copy link

followingell commented Nov 25, 2024

Hey @rjurney,

Not sure of your status or involvement in this project but is there a chance you could push this through? 🤞

AFAICT this would unlock consistent component IDs across Connected Component executions for components that have unchanged nodes.

@rjurney
Copy link
Collaborator

rjurney commented Nov 26, 2024

@followingell I like this, but my machine is presently in an unhappy state as it relates to this project as I dip my toes in Spark proper waters. What testing have you done?

@followingell
Copy link

@followingell I like this, but my machine is presently in an unhappy state as it relates to this project as I dip my toes in Spark proper waters. What testing have you done?

Unfortunately none. I don't have access at the moment to a machine where I could test this without issues.

@rjurney
Copy link
Collaborator

rjurney commented Nov 28, 2024

Okay, give me a few days and I’ll test it.

@SemyonSinchenko
Copy link
Collaborator

We have a lot of pending PRs related to Connected Components. I'm going to push this one forward after merging #552

@rjurney
Copy link
Collaborator

rjurney commented Mar 31, 2025

Sounds good. I like this PR.

Copy link
Collaborator

@SemyonSinchenko SemyonSinchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@SemyonSinchenko
Copy link
Collaborator

I will try to resolve conflicts.

@SemyonSinchenko
Copy link
Collaborator

SemyonSinchenko commented Apr 1, 2025

@architch Thanks for resolving conflicts. You can use an existing pre-commit hook to fix style (or just run ./build/sbt root/scalafmtAll)

@SemyonSinchenko SemyonSinchenko merged commit b18b35f into graphframes:master Apr 1, 2025
5 checks passed
@SemyonSinchenko
Copy link
Collaborator

Thanks for the contribution, @architch !

@SemyonSinchenko
Copy link
Collaborator

@architch I want to notify you that after some discussions we made a decision to keep as the default the behavior before your PR: #632

To use min vertex label as component label you will need to explicitly modify the parameter. You can use the config spark.graphframes.useLabelsAsComponents (set it to true) or API (useLabelsAsComponents: bool in Python and setUseLabelsAsComponents(value: Boolean) in JVM).

Sorry about it, but do a breaking change will require too much work from downstream projects. I think for users it won't be hard to set config or argument to true and have a behavior from your PR.

@rjurney
Copy link
Collaborator

rjurney commented Jul 16, 2025

This seems like a happy medium :)

@architch
Copy link
Contributor Author

Thanks @SemyonSinchenko for the info. It makes sense to make it configurable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants