-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Add trim/0, ltrim/0 and rtrim/0 that trims leading and trailing whitespace #3056
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/jv_unicode.c
Outdated
@@ -118,3 +118,16 @@ int jvp_utf8_encode(int codepoint, char* out) { | |||
assert(out - start == jvp_utf8_encode_length(codepoint)); | |||
return out - start; | |||
} | |||
|
|||
// space codepoints for unicode basic latin and latin-1 supplement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is same as what golang strings.TrimSpace
/unicode.IsSpace
considers whitespace https://pkg.go.dev/unicode#IsSpace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that Go's strings.TrimSpace
considers White_Space
property as well.
docs/content/manual/manual.yml
Outdated
`"\f"`, | ||
`"\u000b"` (vertical tab), | ||
`"\u0085"` (next line) and | ||
`"\u00a0"` (no-break space). These are the whitespace characters in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this exhaustive for Unicode, or just Latin scripts in Unicode? If the latter, why not be more exhaustive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is only whitespace from basic block and latin-1, so not exhaustive. Reason is mostly to match what other implementations do. There are quite a lot of other whitespace characters in other blocks, wikipedia has a good list https://en.wikipedia.org/wiki/Whitespace_character.
Reading about PCRE's \s
-class it seem to match something similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rust trim
/is_whitespace
uses characters with White_Space
property https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt maybe more resonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should document that the list is not stable, that we may add to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rust
trim
/is_whitespace
uses characters withWhite_Space
property https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt maybe more resonable?
I do like that better, yeah, especially in light of @itchyny's comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait for utf8proc to be included for upcase/downcase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update to White_Space
but i guess we can wait. Also update docs and tests.
I found a list of languages on this function. I have no objection to name this trim not strip, because we already have ltrimstr. Does anyone want ltrim and rtrim as well? |
Personally mostly have needed |
48a7b8c
to
913457b
Compare
Added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
@emanuele6 thanks for review! maybe wait for one more apporval? thinking adding new functions might be good idea with more consensus About waiting for utf8proc from #2547: i think we can merge this separately and fix that later |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
After utf8proc is included later, let's see whether we can clean up (or get rid of) jv_unicode.c.
Yeap, had a quick look at it a few week ago and it seemed like it would be easy, something like |
Trims leading and trailing whitespace. Was added to jq in jqlang/jq#3056
This MR contains the following updates: | Package | Update | Change | |---|---|---| | [jqlang/jq](https://github.com/jqlang/jq) | minor | `1.7.1` -> `1.8.0` | MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot). **Proposed changes to behavior should be submitted there as MRs.** --- ### Release Notes <details> <summary>jqlang/jq (jqlang/jq)</summary> ### [`v1.8.0`](https://github.com/jqlang/jq/releases/tag/jq-1.8.0): jq 1.8.0 [Compare Source](jqlang/jq@jq-1.7.1...jq-1.8.0) We are pleased to announce the release of version 1.8.0. This release includes a number of improvements since the last version. Note that some changes may introduce breaking changes to existing scripts, so be sure to read the following information carefully. Full commit log can be found at <jqlang/jq@jq-1.7.1...jq-1.8.0>. #### Releasing - Change the version number pattern to `1.X.Y` (`1.8.0` instead of `1.8`). [@​itchyny](https://github.com/itchyny) [#​2999](jqlang/jq#2999) - Generate provenance attestations for release artifacts and docker image. [@​lectrical](https://github.com/lectrical) [#​3225](jqlang/jq#3225) ```sh gh attestation verify --repo jqlang/jq jq-linux-amd64 gh attestation verify --repo jqlang/jq oci://ghcr.io/jqlang/jq:1.8.0 ``` #### Security fixes - CVE-2024-23337: Fix signed integer overflow in `jvp_array_write` and `jvp_object_rehash`. [@​itchyny](https://github.com/itchyny) [`de21386`](jqlang/jq@de21386) - The fix for this issue now limits the maximum size of arrays and objects to [`5368709`](jqlang/jq@536870912) (`2^29`) elements. - CVE-2024-53427: Reject NaN with payload while parsing JSON. [@​itchyny](https://github.com/itchyny) [`a09a4df`](jqlang/jq@a09a4df) - The fix for this issue now drops support for NaN with payload in JSON (like `NaN123`). Other JSON extensions like `NaN` and `Infinity` are still supported. - CVE-2025-48060: Fix heap buffer overflow in `jv_string_vfmt`. [@​itchyny](https://github.com/itchyny) [`c6e0416`](jqlang/jq@c6e0416) - Fix use of uninitialized value in `check_literal`. [@​itchyny](https://github.com/itchyny) [#​3324](jqlang/jq#3324) - Fix segmentation fault on `strftime/1`, `strflocaltime/1`. [@​itchyny](https://github.com/itchyny) [#​3271](jqlang/jq#3271) - Fix unhandled overflow in `@base64d`. [@​emanuele6](https://github.com/emanuele6) [#​3080](jqlang/jq#3080) #### CLI changes - Fix `--indent 0` implicitly enabling `--compact-output`. [@​amarshall](https://github.com/amarshall) [@​gbrlmarn](https://github.com/gbrlmarn) [@​itchyny](https://github.com/itchyny) [#​3232](jqlang/jq#3232) ```sh $ jq --indent 0 . <<< '{ "foo": ["hello", "world"] }' { "foo": [ "hello", "world" ] } ``` ### Previously, this implied --compact-output, but now outputs with new lines. ```` - Improve error messages to show problematic position in the filter. @​itchyny #​3292 ```sh $ jq -n '1 + $foo + 2' jq: error: $foo is not defined at <top-level>, line 1, column 5: 1 + $foo + 2 ^^^^ jq: 1 compile error ```` - Include column number in parser and compiler error messages. [@​liviubobocu](https://github.com/liviubobocu) [#​3257](jqlang/jq#3257) - Fix error message for string literal beginning with single quote. [@​mattmeyers](https://github.com/mattmeyers) [#​2964](jqlang/jq#2964) ```sh $ jq .foo <<< "{'foo':'bar'}" jq: parse error: Invalid string literal; expected ", but got ' at line 1, column 7 ``` ### Previously, the error message was Invalid numeric literal at line 1, column 7. ```` - Improve `JQ_COLORS` environment variable to support larger escapes like truecolor. @​SArpnt #​3282 ```sh JQ_COLORS="38;2;255;173;173:38;2;255;214;165:38;2;253;255;182:38;2;202;255;191:38;2;155;246;255:38;2;160;196;255:38;2;189;178;255:38;2;255;198;255" jq -nc '[null,false,true,42,{"a":"bc"}]' ```` - Add `--library-path` long option for `-L`. [@​thaliaarchi](https://github.com/thaliaarchi) [#​3194](jqlang/jq#3194) - Fix `--slurp --stream` when input has no trailing newline character. [@​itchyny](https://github.com/itchyny) [#​3279](jqlang/jq#3279) - Fix `--indent` option to error for malformed values. [@​thaliaarchi](https://github.com/thaliaarchi) [#​3195](jqlang/jq#3195) - Fix option parsing of `--binary` on non-Windows platforms. [@​calestyo](https://github.com/calestyo) [#​3131](jqlang/jq#3131) - Fix issue with `~/.jq` on Windows where `$HOME` is not set. [@​kirkoman](https://github.com/kirkoman) [#​3114](jqlang/jq#3114) - Fix broken non-Latin output in the command help on Windows. [@​itchyny](https://github.com/itchyny) [#​3299](jqlang/jq#3299) - Increase the maximum parsing depth for JSON to 10000. [@​itchyny](https://github.com/itchyny) [#​3328](jqlang/jq#3328) - Parse short options in order given. [@​thaliaarchi](https://github.com/thaliaarchi) [#​3194](jqlang/jq#3194) - Consistently reset color formatting. [@​thaliaarchi](https://github.com/thaliaarchi) [#​3034](jqlang/jq#3034) #### New functions - Add `trim/0`, `ltrim/0` and `rtrim/0` to trim leading and trailing white spaces. [@​wader](https://github.com/wader) [#​3056](jqlang/jq#3056) ```sh $ jq -n '" hello " | trim, ltrim, rtrim' "hello" "hello " " hello" ``` - Add `trimstr/1` to trim string from both ends. [@​gbrlmarn](https://github.com/gbrlmarn) [#​3319](jqlang/jq#3319) ```sh $ jq -n '"foobarfoo" | trimstr("foo")' "bar" ``` - Add `add/1`. Generator variant of `add/0`. [@​myaaaaaaaaa](https://github.com/myaaaaaaaaa) [#​3144](jqlang/jq#3144) ```sh $ jq -c '.sum = add(.xs[])' <<< '{"xs":[1,2,3]}' {"xs":[1,2,3],"sum":6} ``` - Add `skip/2` as the counterpart to `limit/2`. [@​itchyny](https://github.com/itchyny) [#​3181](jqlang/jq#3181) ```sh $ jq -nc '[1,2,3,4,5] | [skip(2; .[])]' [3,4,5] ``` - Add `toboolean/0` to convert strings to booleans. [@​brahmlower](https://github.com/brahmlower) [@​itchyny](https://github.com/itchyny) [#​2098](jqlang/jq#2098) ```sh $ jq -n '"true", "false" | toboolean' true false ``` - Add `@urid` format. Reverse of `@uri`. [@​fmgornick](https://github.com/fmgornick) [#​3161](jqlang/jq#3161) ```sh $ jq -Rr '@​urid' <<< '%6a%71' jq ``` #### Changes to existing functions - Use code point index for `indices/1`, `index/1` and `rindex/1`. [@​wader](https://github.com/wader) [#​3065](jqlang/jq#3065) - This is a breaking change. Use `utf8bytelength/0` to get byte index. - Improve `tonumber/0` performance and rejects numbers with leading or trailing white spaces. [@​itchyny](https://github.com/itchyny) [@​thaliaarchi](https://github.com/thaliaarchi) [#​3055](jqlang/jq#3055) [#​3195](jqlang/jq#3195) - This is a breaking change. Use `trim/0` to remove leading and trailing white spaces. - Populate timezone data when formatting time. This fixes timezone name in `strftime/1`, `strflocaltime/1` for DST. [@​marcin-serwin](https://github.com/marcin-serwin) [@​sihde](https://github.com/sihde) [#​3203](jqlang/jq#3203) [#​3264](jqlang/jq#3264) [#​3323](jqlang/jq#3323) - Preserve numerical precision on unary negation, `abs/0`, `length/0`. [@​itchyny](https://github.com/itchyny) [#​3242](jqlang/jq#3242) [#​3275](jqlang/jq#3275) - Make `last(empty)` yield no output values like `first(empty)`. [@​itchyny](https://github.com/itchyny) [#​3179](jqlang/jq#3179) - Make `ltrimstr/1` and `rtrimstr/1` error for non-string inputs. [@​emanuele6](https://github.com/emanuele6) [#​2969](jqlang/jq#2969) - Make `limit/2` error for negative count. [@​itchyny](https://github.com/itchyny) [#​3181](jqlang/jq#3181) - Fix `mktime/0` overflow and allow fewer elements in date-time representation array. [@​emanuele6](https://github.com/emanuele6) [#​3070](jqlang/jq#3070) [#​3162](jqlang/jq#3162) - Fix non-matched optional capture group. [@​wader](https://github.com/wader) [#​3238](jqlang/jq#3238) - Provide `strptime/1` on all systems. [@​george-hopkins](https://github.com/george-hopkins) [@​fdellwing](https://github.com/fdellwing) [#​3008](jqlang/jq#3008) [#​3094](jqlang/jq#3094) - Fix `_WIN32` port of `strptime`. [@​emanuele6](https://github.com/emanuele6) [#​3071](jqlang/jq#3071) - Improve `bsearch/1` performance by implementing in C. [@​eloycoto](https://github.com/eloycoto) [#​2945](jqlang/jq#2945) - Improve `unique/0` and `unique_by/1` performance. [@​itchyny](https://github.com/itchyny) [@​emanuele6](https://github.com/emanuele6) [#​3254](jqlang/jq#3254) [#​3304](jqlang/jq#3304) - Fix error messages including long string literal not to break Unicode characters. [@​itchyny](https://github.com/itchyny) [#​3249](jqlang/jq#3249) - Remove `pow10/0` as it has been deprecated in glibc 2.27. Use `exp10/0` instead. [@​itchyny](https://github.com/itchyny) [#​3059](jqlang/jq#3059) - Remove private (and undocumented) `_nwise` filter. [@​itchyny](https://github.com/itchyny) [#​3260](jqlang/jq#3260) #### Language changes - Fix precedence of binding syntax against unary and binary operators. Also, allow some expressions as object values. [@​itchyny](https://github.com/itchyny) [#​3053](jqlang/jq#3053) [#​3326](jqlang/jq#3326) - This is a breaking change that may change the output of filters with binding syntax as follows. ```sh $ jq -nc '[-1 as $x | 1,$x]' [1,-1] # previously, [-1,-1] $ jq -nc '1 | . + 2 as $x | -$x' -3 # previously, -1 $ jq -nc '{x: 1 + 2, y: false or true, z: null // 3}' {"x":3,"y":true,"z":3} # previously, syntax error ``` - Support Tcl-style multiline comments. [@​emanuele6](https://github.com/emanuele6) [#​2989](jqlang/jq#2989) ```sh #!/bin/sh -- ``` ### Can be use to do shebang scripts. ### Next line will be seen as a comment be of the trailing backslash. \\ exec jq ... ### this jq expression will result in \[1] \[ 1, ### \\ 2 ] ```` - Fix `foreach` not to break init backtracking with `DUPN`. @​kanwren #​3266 ```sh $ jq -n '[1, 2] | foreach .[] as $x (0, 1; . + $x)' 1 3 2 4 ```` - Fix `reduce`/`foreach` state variable should not be reset each iteration. [@​itchyny](https://github.com/itchyny) [#​3205](jqlang/jq#3205) ```sh $ jq -n 'reduce range(5) as $x (0; .+$x | select($x!=2))' 8 $ jq -nc '[foreach range(5) as $x (0; .+$x | select($x!=2); [$x,.])]' [[0,0],[1,1],[3,4],[4,8]] ``` - Support CRLF line breaks in filters. [@​itchyny](https://github.com/itchyny) [#​3274](jqlang/jq#3274) - Improve performance of repeating strings. [@​itchyny](https://github.com/itchyny) [#​3272](jqlang/jq#3272) #### Documentation changes - Switch the homepage to custom domain [jqlang.org](https://jqlang.org). [@​itchyny](https://github.com/itchyny) [@​owenthereal](https://github.com/owenthereal) [#​3243](jqlang/jq#3243) - Make latest release instead of development version the default manual. [@​wader](https://github.com/wader) [#​3130](jqlang/jq#3130) - Add opengraph meta tags. [@​wader](https://github.com/wader) [#​3247](jqlang/jq#3247) - Replace jqplay.org with play.jqlang.org [@​owenthereal](https://github.com/owenthereal) [#​3265](jqlang/jq#3265) - Add missing line from decNumber's licence to `COPYING`. [@​emanuele6](https://github.com/emanuele6) [#​3106](jqlang/jq#3106) - Various document improvements. [@​tsibley](https://github.com/tsibley) [#​3322](jqlang/jq#3322), [@​itchyny](https://github.com/itchyny) [#​3240](jqlang/jq#3240), [@​jhcarl0814](https://github.com/jhcarl0814) [#​3239](jqlang/jq#3239), [@​01mf02](https://github.com/01mf02) [#​3184](jqlang/jq#3184), [@​thaliaarchi](https://github.com/thaliaarchi) [#​3199](jqlang/jq#3199), [@​NathanBaulch](https://github.com/NathanBaulch) [#​3173](jqlang/jq#3173), [@​cjlarose](https://github.com/cjlarose) [#​3164](jqlang/jq#3164), [@​sheepster1](https://github.com/sheepster1) [#​3105](jqlang/jq#3105), [#​3103](jqlang/jq#3103), [@​kishoreinvits](https://github.com/kishoreinvits) [#​3042](jqlang/jq#3042), [@​jbrains](https://github.com/jbrains) [#​3035](jqlang/jq#3035), [@​thalman](https://github.com/thalman) [#​3033](jqlang/jq#3033), [@​SOF3](https://github.com/SOF3) [#​3017](jqlang/jq#3017), [@​wader](https://github.com/wader) [#​3015](jqlang/jq#3015), [@​wllm-rbnt](https://github.com/wllm-rbnt) [#​3002](jqlang/jq#3002) #### Build improvements - Fix build with GCC 15 (C23). [@​emanuele6](https://github.com/emanuele6) [#​3209](jqlang/jq#3209) - Fix build with `-Woverlength-strings` [@​emanuele6](https://github.com/emanuele6) [#​3019](jqlang/jq#3019) - Fix compiler warning `type-limits` in `found_string`. [@​itchyny](https://github.com/itchyny) [#​3263](jqlang/jq#3263) - Fix compiler error in `jv_dtoa.c` and `builtin.c`. [@​UlrichEckhardt](https://github.com/UlrichEckhardt) [#​3036](jqlang/jq#3036) - Fix warning: a function definition without a prototype is deprecated. [@​itchyny](https://github.com/itchyny) [#​3259](jqlang/jq#3259) - Define `_BSD_SOURCE` in `builtin.c` for OpenBSD support. [@​itchyny](https://github.com/itchyny) [#​3278](jqlang/jq#3278) - Define empty `JV_{,V}PRINTF_LIKE` macros if `__GNUC__` is not defined. [@​emanuele6](https://github.com/emanuele6) [#​3160](jqlang/jq#3160) - Avoid `ctype.h` abuse: cast `char` to `unsigned char` first. [@​riastradh](https://github.com/riastradh) [#​3152](jqlang/jq#3152) - Remove multiple calls to free when successively calling `jq_reset`. [@​Sameesunkaria](https://github.com/Sameesunkaria) [#​3134](jqlang/jq#3134) - Enable IBM z/OS support. [@​sachintu47](https://github.com/sachintu47) [#​3277](jqlang/jq#3277) - Fix insecure `RUNPATH`. [@​orbea](https://github.com/orbea) [#​3212](jqlang/jq#3212) - Avoid zero-length `calloc`. [@​itchyny](https://github.com/itchyny) [#​3280](jqlang/jq#3280) - Move oniguruma and decNumber to vendor directory. [@​itchyny](https://github.com/itchyny) [#​3234](jqlang/jq#3234) #### Test improvements - Run tests in C locale. [@​emanuele6](https://github.com/emanuele6) [#​3039](jqlang/jq#3039) - Improve reliability of `NO_COLOR` tests. [@​dag-erling](https://github.com/dag-erling) [#​3188](jqlang/jq#3188) - Improve `shtest` not to fail if `JQ_COLORS` and `NO_COLOR` are already set. [@​SArpnt](https://github.com/SArpnt) [#​3283](jqlang/jq#3283) - Refactor constant folding tests. [@​itchyny](https://github.com/itchyny) [#​3233](jqlang/jq#3233) - Make tests pass when `--disable-decnum`. [@​nicowilliams](https://github.com/nicowilliams) [`6d02d53`](jqlang/jq@6d02d53) - Disable Valgrind by default during testing. [@​itchyny](https://github.com/itchyny) [#​3269](jqlang/jq#3269) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this MR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box --- This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0MC41MC4wIiwidXBkYXRlZEluVmVyIjoiNDAuNTAuMCIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiUmVub3ZhdGUgQm90Il19-->
No description provided.