-
Notifications
You must be signed in to change notification settings - Fork 252
Quote column names with backticks #370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quote column names with backticks #370
Conversation
Codecov Report
@@ Coverage Diff @@
## master #370 +/- ##
==========================================
+ Coverage 91.26% 91.28% +0.02%
==========================================
Files 18 18
Lines 870 872 +2
Branches 48 56 +8
==========================================
+ Hits 794 796 +2
Misses 76 76
Continue to review full report at Codecov.
|
32e45a5
to
b6d5812
Compare
@mengxr @felixcheung how do you feel about this fix? |
Codecov Report
@@ Coverage Diff @@
## master #370 +/- ##
==========================================
+ Coverage 91.26% 91.28% +0.02%
==========================================
Files 18 18
Lines 870 872 +2
Branches 48 56 +8
==========================================
+ Hits 794 796 +2
Misses 76 76
Continue to review full report at Codecov.
|
b6d5812
to
ca9fd56
Compare
Oh thank you, I hate transforming to underscores! |
@EnricoMi Hello! Is there any chance you can update the PR and resolve conflicts? It would be a nice feature for |
5084ca8
to
104b030
Compare
Rebased. Thanks for looking into this. |
@rjurney @bjornjorgensen Hello! What do you think about this one? For me it looks very cool, I think it is very nice to have feature for GraphFrames! |
cool thing.. but its something that should have been in spark not here.. |
.. and it is a breaking change to. |
To be honest I don't see why is it a breaking change? For end users everything should be exactly the same, until they are doing something similar on their side. From the other side complex pipelines with graphframes with column-names generation and similar things it should be much more smooth with that change. |
What you don't see any braking changes, but you explain them in the 3 next lines! I don't understand what this have to do with graphframes as this project code.. |
You mean if column names are backticked already? That should be doable. Any other use case that breaks due to this change? |
yes, and I think that it's very common to have.
Not that I can think of right now. |
like pandas json_normalize uses sep='.' as default. https://pandas.pydata.org/docs/reference/api/pandas.json_normalize.html |
104b030
to
5103ed5
Compare
Looking more closely, this is not a breaking change. And Spark API expects the calling site to properly escape column names if they contain dots (unless they reference nested columns). Spark won't change that, otherwise, that would be a breaking change on their side. Since Spark 4.0.0, they even provide methods in Backticks are only required when referencing a column by its name ( Without this fix, users of Graphframes have to workaround this by renaming affected columns: |
@bjornjorgensen Did you have a chance to check the latest commits? It seems to me that it should cover all of your concerns and nothing else is blocking merging. Thanks in advance! |
hm... well I think its ok.. If we did have branches for graphframes we didn't needed this for branch 4 I think we can have it, but if there are some user problems with it we will need to remove it. |
@rjurney What do you think about it? For me it is ready to merge. |
With "branch 4" I presume you refer to Spark4. What is different there? Do you imply backticks are not needed with Spark4? How are ambiguous column names resolved there? |
I was thinking about this "Note: This can be replaced with org.apache.spark.sql.catalyst.util.QuotingUtils.quoteIfNeeded
|
With "user code" I meant code that uses the Spark API, which in our case is Graphframes code. So once Spark3 support is dropped by Graphframes, the |
@bjornjorgensen Were your concerns resolved by the latest commits? If so, could you approve the PR? cc: @rjurney @SauronShepherd could you take a look too? If you are fine with it, I would like to resolve conflicts and merge it. I see it as a very important feature. |
LGTM |
This guards against columns that contain dots. These column names must be quoted to avoid that Spark parses them again.
5103ed5
to
7a60f7a
Compare
Conflict resolved. |
Shouldn't we add some tests to check that GraphFrames actually works fine with nested columns? |
There are two tests in this PR already, do you think there should be more? |
Sorry, I looked for the word "test" in the comments, didn't find anything
and assumed there weren't any tests. Let me check out the PR and I'll
confirm you tomorrow, but 2 tests seems reasonsable.
El vie, 28 mar 2025, 21:50, Sem ***@***.***> escribió:
… Shouldn't we add some tests to check that GraphFrames actually works fine
with nested columns?
There are two tests in this PR already, do you think there should be more?
—
Reply to this email directly, view it on GitHub
<#370 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACCN675GZIR6IC5ADWZJPFT2WWRXFAVCNFSM6AAAAABX7XQB3KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONRSGQ3DQMRWGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
[image: SemyonSinchenko]*SemyonSinchenko* left a comment
(graphframes/graphframes#370)
<#370 (comment)>
Shouldn't we add some tests to check that GraphFrames actually works fine
with nested columns?
There are two tests in this PR already, do you think there should be more?
—
Reply to this email directly, view it on GitHub
<#370 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACCN675GZIR6IC5ADWZJPFT2WWRXFAVCNFSM6AAAAABX7XQB3KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONRSGQ3DQMRWGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@SauronShepherd Hi! Were your concerns resolved? I would like to merge this one to avoid even more merge conflicts, because this PR changes a lot of files. |
Sorry, I didn't have time to look at it. Go ahead. |
Thanks for the contribution @EnricoMi ! |
Thanks for merging! |
This guards against columns that contain dots. These column names must be quoted to avoid that Spark parses them again.
Fixes following exception when your vertices DataFrame contains columns with dots (e.g.
"a.name"
):Fixes this issue also in other places.