`indices` reports byte offsets instead of character offsets

**Describe the bug**
jq uses _characters_ to index strings.
To see that, we can run `"🇬🇧oo" | .[0 : 1,2,3,4]`, which yields "🇬" "🇬🇧" "🇬🇧o" "🇬🇧oo".
Note that 🇬🇧 is actually two characters and 8 bytes, as we can see from `"🇬🇧" | length, utf8bytelength`.
However, the `indices` filter returns _byte offsets_  to the pattern in the string.
The documentation does not specify the behaviour of `indices` for UTF-8 strings, but given that `length` and `.[x:y]` use character counts to index strings, it is likely that this is a bug and not just undocumented behaviour.

**To Reproduce**
$ ./jq-linux-amd64-1.7.1 -nc '"🇬🇧oo" | indices("o")'
[8,9]
$ ./jq-linux-amd64-1.7.1 -nc '"ƒoo" | indices("o")'
[2,3]

**Expected behavior**
$ ./jq-linux-amd64-1.7.1-fixed -nc '"🇬🇧oo" | indices("o")'
[2,3]
$ ./jq-linux-amd64-1.7.1-fixed -nc '"ƒoo" | indices("o")'
[1,2]

The problem is probably caused in [jv_string_indexes](https://github.com/jqlang/jq/blob/c95b34ff827d05a2d262f00280a4891a295ed0ed/src/jv.c#L1272).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`indices` reports byte offsets instead of character offsets #3064

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

indices reports byte offsets instead of character offsets #3064

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`indices` reports byte offsets instead of character offsets #3064