Input names are silently modified, leading to inputs mismatch during inference

When converting models using `convert_sklearn`, `initial_inputs` names are silently modified to comply
with the `onnx` specification which asks for identifiers to follow the C90 identifier specification.
As a result, the input names will not match when trying to perform inference even when using exactly the same input data.

Furthermore, this behaviour is quite hidden, and if we look at the following [basic tutorial](https://onnx.ai/sklearn-onnx/auto_examples/plot_complex_pipeline.html), the column `home.dest` is conveniently dropped.  
Trying to use it results in a renaming into `home_dest`, which makes the `onnx` models fail at inference because the input data doesn't match anymore.

This is a really treacherous behavior, and in fact it is probably not even necessary as `onnx` [doesn't even enforce their own specification](https://github.com/onnx/onnx/issues/6219).
It is also the kind of behaviour users might be tempted to overcome by being "smart", but in fact most smart solutions are wrong, especially when taking into account the possibility that a renamed identifier might collide with an existing one.

By the way looking at the code (and testing it) [the line that turn input names into C-style identifiers](https://github.com/onnx/sklearn-onnx/blob/ffb17c505829cde7e4edbc3f123a0730db837727/skl2onnx/common/_topology.py#L1042) is not correct:  
`re.sub("[^\\w+]", "_", seed)` let some special characters such as "+" unchanged.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Input names are silently modified, leading to inputs mismatch during inference #1153

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Input names are silently modified, leading to inputs mismatch during inference #1153

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions