[R] stringr modifier functions cannot be called with namespace prefix

### Describe the bug, including details regarding any error messages, version, and platform.

When using stringr's `str_detect()` and `str_count()`, stringr's own documentation recommends to use `stringr::regex()` and `stringr::fixed()` "for finer control of the matching behaviour."

This can be used, for example, to set "ignore_case" to TRUE, which is not available as an argument to `str_detect()` directly. 

The resulting functions have the following structure:

``` r
stringr::str_detect(
  string = "eXample",
  pattern = stringr::regex("x", ignore_case = TRUE)
)
#> [1] TRUE
```

Unfortunately, arguments passed via `stringr::regex()` and `stringr::fixed()` are silently ignored by `arrow`, which leads to unexpected and quite possibly wrong results.

If one prints the arrow call, it is possible to see that indeed even if `ignore_case` is set to TRUE, the call is passed with `ignore_case` as FALSE. 

```
bool (match_substring_regex(text, {pattern="x", ignore_case=false}))
```

I suppose `arrow` should either get this right, or throw an error.

The following reprex (run with arrow version 12.0.1) shows:

- how the `ignore.case` argument works nicely when passed via the base function `grepl`
- how it is simply ignored when passed to `stringr::str_detect()`, `stringr::str_count()` (and possibly other stringr functions) through `stringr::regex()` and `stringr::str_detect()`
- how it works nicely if the ignore_case is passed directly in the pattern with `(?i)`
- how `arrow` throws an error when using `stringi::stri_detect_regex()` (rather than `stringr`) with `case_insensitive = TRUE` (which is still preferrable to ignoring the argument silently).

There are obviously many workarounds, but this has led to errors when I applied functions that were not originally written and tested with `arrow` in mind. 


``` r
library("arrow")
#> 
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#> 
#>     timestamp

apple_df <- tibble::tibble(
  text = c(
    "apple",
    "APPLE"
  )
)

arrow::write_dataset(dataset = apple_df, path = "apple.parquet")

apple_parquet <- arrow::open_dataset(sources = "apple.parquet")



## with grepl, it works

apple_parquet |>
  dplyr::mutate(
    a_check = grepl(
      x = text,
      pattern = "a",
      ignore.case = TRUE
    )
  )
#> FileSystemDataset (query)
#> text: string
#> a_check: bool (if_else(is_null(match_substring_regex(text, {pattern="a", ignore_case=true}), {nan_is_null=true}), false, match_substring_regex(text, {pattern="a", ignore_case=true})))
#> 
#> See $.data for the source Arrow object

apple_parquet |>
  dplyr::mutate(
    a_check = grepl(x = text, pattern = "a", ignore.case = TRUE)
  ) |>
  dplyr::collect()
#> # A tibble: 2 × 2
#>   text  a_check
#>   <chr> <lgl>  
#> 1 apple TRUE   
#> 2 APPLE TRUE


## with stringr::str_detect it does not work

apple_parquet |>
  dplyr::mutate(
    a_check = stringr::str_detect(
      string = text,
      pattern = "a"
    )
  )
#> FileSystemDataset (query)
#> text: string
#> a_check: bool (match_substring_regex(text, {pattern="a", ignore_case=false}))
#> 
#> See $.data for the source Arrow object


apple_parquet |>
  dplyr::mutate(
    a_check = stringr::str_detect(
      string = text,
      pattern = stringr::regex(
        pattern = "a",
        ignore_case = TRUE
      )
    )
  )
#> FileSystemDataset (query)
#> text: string
#> a_check: bool (match_substring_regex(text, {pattern="a", ignore_case=false}))
#> 
#> See $.data for the source Arrow object


apple_parquet |>
  dplyr::mutate(
    a_check = stringr::str_detect(
      string = text,
      pattern = stringr::regex(
        pattern = "a",
        ignore_case = TRUE
      )
    ),
    p_count = stringr::str_count(
      string = text,
      pattern = stringr::regex(
        pattern = "p",
        ignore_case = TRUE
      )
    )
  ) |>
  dplyr::collect()
#> # A tibble: 2 × 3
#>   text  a_check p_count
#>   <chr> <lgl>     <int>
#> 1 apple TRUE          2
#> 2 APPLE FALSE         0

## Same result with stringr::fixed


apple_parquet |>
  dplyr::mutate(
    a_check = stringr::str_detect(
      string = text,
      pattern = stringr::fixed(
        pattern = "a",
        ignore_case = TRUE
      )
    ),
    p_count = stringr::str_count(
      string = text,
      pattern = stringr::fixed(
        pattern = "p",
        ignore_case = TRUE
      )
    )
  ) |>
  dplyr::collect()
#> # A tibble: 2 × 3
#>   text  a_check p_count
#>   <chr> <lgl>     <int>
#> 1 apple TRUE          2
#> 2 APPLE FALSE         0

## it works nicely just including the case insensitive in the regex

apple_parquet |>
  dplyr::mutate(
    a_check = stringr::str_detect(
      string = text,
      pattern = "(?i)a"
    ),
    p_count = stringr::str_count(
      string = text,
      pattern = "(?i)p"
    )
  ) |>
  dplyr::collect()
#> # A tibble: 2 × 3
#>   text  a_check p_count
#>   <chr> <lgl>     <int>
#> 1 apple TRUE          2
#> 2 APPLE TRUE          2



## With stringi

apple_df |>
  dplyr::mutate(
    a_check = stringi::stri_detect_regex(
      str = text,
      pattern = "a",
      case_insensitive = TRUE
    )
  ) |>
  dplyr::collect()
#> # A tibble: 2 × 2
#>   text  a_check
#>   <chr> <lgl>  
#> 1 apple TRUE   
#> 2 APPLE TRUE



apple_parquet |>
  dplyr::mutate(
    a_check = stringi::stri_detect_regex(
      str = text,
      pattern = "a",
      case_insensitive = TRUE
    )
  ) |>
  dplyr::collect()
#> Error: Expression stringi::stri_detect_regex(str = text, pattern = "a", case_insensitive = TRUE) not supported in Arrow
#> Call collect() first to pull data into R.
```

<sup>Created on 2023-07-17 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup>

### Component(s)

R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[R] stringr modifier functions cannot be called with namespace prefix #36720

Describe the bug, including details regarding any error messages, version, and platform.

Component(s)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[R] stringr modifier functions cannot be called with namespace prefix #36720

Description

Describe the bug, including details regarding any error messages, version, and platform.

Component(s)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions