Skip to content

Handle lone surrogate unicode character not being representable in Jython #5271

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 16, 2019

Conversation

nicoddemus
Copy link
Member

No tests for this because we don't test with Jython currently.

Fix #5256

Copy link
Member

@asottile asottile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohhhh jython :(

@codecov
Copy link

codecov bot commented May 15, 2019

Codecov Report

Merging #5271 into master will increase coverage by 0.51%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5271      +/-   ##
==========================================
+ Coverage   93.44%   93.95%   +0.51%     
==========================================
  Files         115      115              
  Lines       26163    26165       +2     
  Branches     2578     2578              
==========================================
+ Hits        24447    24583     +136     
+ Misses       1395     1259     -136     
- Partials      321      323       +2
Impacted Files Coverage Δ
src/_pytest/terminal.py 91.21% <100%> (+0.58%) ⬆️
testing/python/setup_only.py 92.85% <0%> (-7.15%) ⬇️
testing/python/collect.py 92.11% <0%> (-7.04%) ⬇️
testing/python/metafunc.py 93.56% <0%> (-1.11%) ⬇️
src/_pytest/pytester.py 86.8% <0%> (+0.42%) ⬆️
testing/test_warnings.py 98.9% <0%> (+1.63%) ⬆️
src/_pytest/tmpdir.py 100% <0%> (+3.44%) ⬆️
testing/python/integration.py 91.42% <0%> (+5%) ⬆️
testing/test_terminal.py 86.62% <0%> (+7.3%) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 43617a8...e253029. Read the comment docs.

@nicoddemus nicoddemus force-pushed the lone-surrogate-jython-5256 branch from f4373d1 to 8d98152 Compare May 16, 2019 20:58
…thon

No tests for this because we don't test with Jython currently.

Fix pytest-dev#5256
@nicoddemus nicoddemus force-pushed the lone-surrogate-jython-5256 branch from 8d98152 to e253029 Compare May 16, 2019 20:59
@asottile asottile merged commit d94b4b0 into pytest-dev:master May 16, 2019
@nicoddemus nicoddemus deleted the lone-surrogate-jython-5256 branch May 16, 2019 22:04
@blueyed
Copy link
Contributor

blueyed commented May 17, 2019

@asottile
I think this should have waited on #5256 (comment).

@nicoddemus
Copy link
Member Author

@blueyed I think it is fine, we can always do a follow-up. 👍

And after giving this some more thought, we needed this fix because our test suite used 😄 emoji in the messages (my suggestion), so it might not even happen in the wild as I doubt people are putting emojis in exception messages.

Either way I think a follow up later (including a CHANGELOG adjustment) is fine. 👍

@blueyed
Copy link
Contributor

blueyed commented May 17, 2019

Sure. But there was also no reason to get this intermediate fix in fast - since we're not testing against Jython.

Anyway, it was just a remark - nothing too bad.

@fabioz
Copy link
Contributor

fabioz commented May 17, 2019

The current change is already ok for me (Jython works)...

As a note, what I came up with in the end is below (where obj_repr is a unicode string and left_count the index to be sliced):

try:
    if left_count > 0 and unichr(0xD800) <= obj_repr[left_count - 1] <= unichr(0xDBFF):
        left_count -= 1
except ValueError:
    # On Jython unichr(0xD800) will throw an error:
    # ValueError: unichr() arg is a lone surrogate in range (0xD800, 0xDFFF) (Jython UTF-16 encoding)
    # Just ignore it in this case.
    pass

I'm also a bit wary of pytest doing msg.rstrip() with a unicode identifier where msg is bytes on Python 2 (that could fail depending on the encoding used to automatically make the conversion from bytes to unicode) -- in: d94b4b0#diff-606d01f030119020f3e820547882b6b0R1003 (but as @nicoddemus mentioned, it's quite rare that exceptions have unicode at all, so, this may not happen all that much in practice).

@nicoddemus
Copy link
Member Author

nicoddemus commented May 17, 2019

I'm also a bit wary of pytest doing msg.rstrip() with a unicode identifier where msg is bytes on Python 2

You are right @fabioz, that could lead to problems.

I propose we wait a bit though, I'm not sure this will really be a problem in the wild. If it does we can always fix it with an actual problem as guide. 👍

Thanks everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Invalid unicode in _pytest/terminal.py with Jython
4 participants