Skip to content

Input names are silently modified, leading to inputs mismatch during inference #1153

@Pierre-Bartet

Description

@Pierre-Bartet

When converting models using convert_sklearn, initial_inputs names are silently modified to comply
with the onnx specification which asks for identifiers to follow the C90 identifier specification.
As a result, the input names will not match when trying to perform inference even when using exactly the same input data.

Furthermore, this behaviour is quite hidden, and if we look at the following basic tutorial, the column home.dest is conveniently dropped.
Trying to use it results in a renaming into home_dest, which makes the onnx models fail at inference because the input data doesn't match anymore.

This is a really treacherous behavior, and in fact it is probably not even necessary as onnx doesn't even enforce their own specification.
It is also the kind of behaviour users might be tempted to overcome by being "smart", but in fact most smart solutions are wrong, especially when taking into account the possibility that a renamed identifier might collide with an existing one.

By the way looking at the code (and testing it) the line that turn input names into C-style identifiers is not correct:
re.sub("[^\\w+]", "_", seed) let some special characters such as "+" unchanged.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions