Skip to content

Conversation

adityagoel4512
Copy link
Contributor

@adityagoel4512 adityagoel4512 commented Jul 9, 2023

Description

This PR introduces the RegexFullMatch operator, as originally proposed in #5317.

The RegexFullMatch operator takes one string tensor as input and returns a bool tensor of identical shape indicating if each element fully matches the regex pattern encoded in the pattern string attribute. This attribute is a string and we expect valid re2 regex.

Some examples are as follows:

RegexFullMatch(["www.google.com", "www.facebook.com", "www.bbc.co.uk"], "www\.[\w.-]+\.\bcom\b")
=> [True, True, False]
RegexFullMatch([["account@gmail.com", "account@hotmail.com"], ["not email", "account2@yahoo.com"]], "(\W|^)[\w.\-]{0,25}@(yahoo|gmail)\.com(\W|$)")
=> [[True, False], [False, True]]

Motivation and Context

Closes #5317.

Following discussion at the last Operators SIG Weekly the "engine" attribute has been dropped in favour of simply using re2 syntax for now. This reflects the fact that both Tensorflow and PyTorch operators requiring regex use re2 already.

Signed-off-by: Aditya Goel <agoel4512@gmail.com>
@adityagoel4512 adityagoel4512 marked this pull request as ready for review July 9, 2023 23:11
@adityagoel4512 adityagoel4512 requested review from a team as code owners July 9, 2023 23:11
Signed-off-by: Aditya Goel <agoel4512@gmail.com>
@adityagoel4512 adityagoel4512 force-pushed the regexfullmatch-operator branch from e109fac to 02c32d1 Compare July 9, 2023 23:16
@@ -175,6 +175,7 @@
ReduceSumSquare_1,
ReduceSumSquare_18,
)
from onnx.reference.ops.op_regex_full_match import RegexFullMatch

Check notice

Code scanning / CodeQL

Unused import

Import of 'RegexFullMatch' is not used.
adityagoel4512 and others added 2 commits July 10, 2023 13:54
Signed-off-by: Aditya Goel <agoel4512@gmail.com>
…els (onnx#5400)

### Description
<!-- - Describe your changes. -->
Add a sentence to highlight that output_shape for ConvTranspose should
not have batch and channels.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve? -->
<!-- - If it fixes an open issue, please link to the issue here. -->
It's quite confusing that whether output_shape needs to be fully
provided. For instance, onnx#1437. An
explicit description should help prevent confusion.

cc @satyajandhyala

---------

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Signed-off-by: Aditya Goel <agoel4512@gmail.com>
@adityagoel4512 adityagoel4512 force-pushed the regexfullmatch-operator branch from 88dfc73 to 574e0a5 Compare July 10, 2023 12:54
adityagoel4512 and others added 2 commits July 10, 2023 13:55
Signed-off-by: Aditya Goel <agoel4512@gmail.com>
@adityagoel4512 adityagoel4512 force-pushed the regexfullmatch-operator branch from f3a3a96 to 754cc99 Compare July 10, 2023 13:17
Signed-off-by: Aditya Goel <agoel4512@gmail.com>
@adityagoel4512 adityagoel4512 force-pushed the regexfullmatch-operator branch from 66309bb to 29a5b4a Compare July 10, 2023 14:10
adityagoel4512 and others added 3 commits July 10, 2023 16:04
Signed-off-by: Aditya Goel <agoel4512@gmail.com>
Signed-off-by: Aditya Goel <48102515+adityagoel4512@users.noreply.github.com>
Signed-off-by: Aditya Goel <agoel4512@gmail.com>
@adityagoel4512 adityagoel4512 force-pushed the regexfullmatch-operator branch from 1ae97ce to b976bd8 Compare July 11, 2023 10:31
@gramalingam gramalingam added the topic: operator Issues related to ONNX operators label Jul 11, 2023
…ator

Signed-off-by: Aditya Goel <agoel4512@gmail.com>
@adityagoel4512
Copy link
Contributor Author

Would it be possible to receive a review/approval on this PR please? @gramalingam @xadupre

Signed-off-by: Aditya Goel <agoel4512@gmail.com>
@jcwchen jcwchen added the run release CIs Use this label to trigger release tests in CI label Jul 16, 2023
Copy link
Member

@jcwchen jcwchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added run-release-CIs label to run release CIs (it will take effect if there is a new commit). You will need to add google-re2 in requirements-release.txt to make release CIs pass I believe.

Signed-off-by: Aditya Goel <agoel4512@gmail.com>
Signed-off-by: Aditya Goel <agoel4512@gmail.com>
Signed-off-by: Aditya Goel <agoel4512@gmail.com>
Signed-off-by: Aditya Goel <agoel4512@gmail.com>
Signed-off-by: Aditya Goel <agoel4512@gmail.com>
@adityagoel4512 adityagoel4512 force-pushed the regexfullmatch-operator branch from 8aee616 to 5b93576 Compare July 27, 2023 19:58
Signed-off-by: Aditya Goel <agoel4512@gmail.com>
…nnx into regexfullmatch-operator

Signed-off-by: Aditya Goel <agoel4512@gmail.com>
Signed-off-by: Aditya Goel <agoel4512@gmail.com>
Signed-off-by: Aditya Goel <agoel4512@gmail.com>
@adityagoel4512
Copy link
Contributor Author

adityagoel4512 commented Jul 27, 2023

Updates:

  • we lazily import re2 in the RegexFullMatch reference operator
  • google-re2 is an optional dependency under the "reference" name (pip install onnx[reference]). New dependencies exclusive to the reference implementation and not used elsewhere can be placed in requirements-reference.txt if it makes sense.
  • tests that use the reference implementation of RegexFullMatch are skipped when the platform is win32 - (edit: after asking, the next google-re2 release will now support win32

Signed-off-by: Aditya Goel <agoel4512@gmail.com>
Signed-off-by: Aditya Goel <agoel4512@gmail.com>
@adityagoel4512 adityagoel4512 force-pushed the regexfullmatch-operator branch from 8206f2b to dace543 Compare August 1, 2023 15:22
@adityagoel4512
Copy link
Contributor Author

Is there anything holding back this PR? @gramalingam @xadupre @jcwchen

Signed-off-by: Aditya Goel <agoel4512@gmail.com>
….onnx format string

Signed-off-by: Aditya Goel <agoel4512@gmail.com>
Signed-off-by: Aditya Goel <agoel4512@gmail.com>
@xadupre xadupre enabled auto-merge August 4, 2023 10:28
@xadupre xadupre added this pull request to the merge queue Aug 4, 2023
Merged via the queue into onnx:main with commit 2e0908d Aug 4, 2023
@adityagoel4512 adityagoel4512 deleted the regexfullmatch-operator branch August 6, 2023 10:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run release CIs Use this label to trigger release tests in CI topic: operator Issues related to ONNX operators
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

New Operator: RegexFullMatch
5 participants