-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
Hi everyone, I would like to know if performing a layer-per-layer inference on a pre-trained model (in fp32 or int8 datatypes) is possible using GPU with cuda 11.2.
My idea is to use several fp32 and int8-quantized models from ONNX Model Zoo Repo and then do the inference layer by layer to achieve a feature extraction. After this, I would modify the outputs from each layer and use them as a new input for the following layers, with the last layer's output equal to the output of the original model.
The approximate code would be something similar to this one:
model_path = "model.onnx"
ort_session = ort.InferenceSession(model_path)
input_data = np.random.randn(1, 3, 32, 32).astype(np.float32)
conv1_output = ort_session.run(None, {'input1': input_data})[0]
conv2_output = ort_session.run(None, {'input2': conv1_output})[0]
# Now, I can work with intermediate outputs, modify them and use them as new inputs
However, I tried to reproduce this code with a resnet50 pre-trained model from ONNX Model Zoo Repo, but it seems this model, like the rest of pre-trained models, only has one input and one output (no way of accessing to intermediate outputs).
So, is there any way I could do this? I have seen Evaluation Step by Step documentation. However, I am unsure if this "ReferenceEvaluator" function also works for pre-trained/quantised models and, more importantly, obtaining accuracy from a dataset like ImageNet.
Thank you!