Skip to content

Regex match turns non-breakable space into regular space #10058

@ttomasz

Description

@ttomasz

What happens?

When trying to use regex to match specific character by providing unicode code it seems that non-breakable space (chr: 160) is converted to regular space (chr: 32).

The RE2 engine seems to supports this fine: https://regex101.com/r/7SjXN9/1

To Reproduce

with
data(wsc, zipcode) as (
values (32, '00' || chr(32) || '001'), (160, '00' || chr(160) || '001')
)
select *
from data
where 1=1
and regexp_matches(zipcode, '^00\x{00A0}001$')
and regexp_matches(zipcode, '^00\x{0020}001$')

OS:

Linux

DuckDB Version:

0.9.2

DuckDB Client:

CLI

Full Name:

Tomasz Taraś

Affiliation:

Orsted

Have you tried this on the latest main branch?

I have tested with a release build (and could not test with a main build)

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions