Skip to content

Conversation

AurumnPegasus
Copy link
Contributor

Reference Issues/PRs

Fixes #3022

What does this implement/fix? Explain your changes.

Added functions in BaseDeepClassifier (as discussed with @GuzalBulatova ) to allow users to save and load their keras models.

PR checklist

For all contributions
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG] indicating whether the PR topic is related to enhancement, maintenance, documentation, or bug.

@AurumnPegasus AurumnPegasus marked this pull request as ready for review August 1, 2022 10:07
@@ -46,6 +46,15 @@ class CNNClassifier(BaseDeepClassifier):

Adapted from the implementation from Fawaz et. al
https://github.com/hfawaz/dl-4-tsc/blob/master/classifiers/cnn.py
Examples
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

misformatted, please fix

@@ -134,7 +143,11 @@ def _fit(self, X, y):

check_random_state(self.random_state)
self.input_shape = X.shape[1:]
self.model_ = self.build_model(self.input_shape, self.n_classes_)
self.model_ = (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you kindly explain this change? Should self.model_ not always be None, and replaced even if it is already fitted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to the load functionality. The thought here is that:

  • user first creates object of the estimator class
  • user then calls the load function with path
  • BaseDeepClassifier loads the model into self.model_
  • user calls model.fit
  • Within CNNClassifier.fit function, we first check if self.model_ is initially defined (that is, if user has already loaded in the network) or it is None. If it is None, then we build the network.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds like an odd workflow, can you explain why that would be desired or intuitive?

The "normal" save/load workflow is as follows:

A. save
A1. user builds model model
A2. user saves model to a location saved_model, via model.save

B. load
B1. user loads model from saved_model, via load (loose function)

Your workflow part B takes three steps.

We should also distinguish, are you trying to save/load a model configuration (unfitted), or a model serialization (fitted)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking in terms of saving/loading a fitted model (model serialisation). Are there any use cases where there would be a point of saving a model configuration?

"""Save the model at the given path."""
self.model_.save(path)

def load(self, path):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should load be a method of an estimator? I thought of it as a loose function that produces a fitted estimator.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change requests:

  • I think load should be a loose function, not of an estimator. Otherwise you would always have to construct an estimator before loading sth. Think of the user workflow.
  • Is there a possibility to serialize the model in memory, if path=None? The return would be a reference.

@AurumnPegasus
Copy link
Contributor Author

Hey @fkiraly ,
My idea about load functionality was within the structure of the estimator class, that is, I thought if someone wants to use CNNClassifier (say) from an already saved path cnnpath then it makes sense to ask them to create estimator object:

  • to specify general parameters like batch_size, epochs etc
  • to use the same fit function as is required (for eg: cntc and cnn have minor differences in how they change data before passing it through the model)

If load were to be a lose function, we would need a parameter of sorts to the estimator class (or maybe an additional parameter in the estimator class to specify from where to load model from (load_path), and then the estimator calls load from within ?). Another thing is I feel structurally it would be a bit more uniform if load were to be an estimator function (although this is purely from a personal opinion standpoint)

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 9, 2022

Could you kindly spell out, with code, intended save/load workflows?

We should cover:

  • save/load of model specifications
  • save/load of serialized models
  • how this works for "ordinary" (sklearn style) models, vs neural network models

I think it is really important that the design is intuitive and matches user expectations.

I would also recommend we write a STEP, since this might become a central interface point

@AurumnPegasus
Copy link
Contributor Author

Sure,

  1. For fitted neural networks / DL models in sktime:
cnn = CNNClassifier()
cnn.fit(X_train, y_train)
cnn.save("./saved_model")
----------------------------------------------
cnn = CNNClassifier(batch_size=32) #If users wants to mention different parameter
cnn.load("./saved_model")
output = cnn.predict(X_test)

Here, I have written the functionality to save a fitted model. If, for some reason, a user were to do cnn.save() before fitting the model (though there is no point to it in my opinion), it would throw an error (I had not thought about this case since I cant think of a scenario where one would save a neural model without fitting it)

  1. To save/load model specifications (If I understand this correctly, let me know otherwise) the existing method for statistical sktime models is via pickle. I dont think we can save/load unfitted models via pickle for DL models due to one of the parameters being keras.optimizer (which does not directly pickle). Example of statistical models being saved:
import pickle
vecm = VECM()
vecm.fit(train, fh=fh)
save_output = pickle.dumps(vecm)
----------------------------------------------
model = pickle.loads(save_output)
model.predict(fh=fh)

This save functionality is there via BaseEstimator class of sklearn (from which all sktime estimators are derived), which has implemented the methods __get_state__ and __set_state__ , which directly either pickle the object of the class with all its parameters (saves the state/class dictionary), or loads it from a pickle file/object.
The reason for this issue/PR was that pickle is not directly compatible with keras (especially any keras optimizers).

The idea behind save or load functionality being part of the class is that if the user were to use fit or predict later on, they would have to create an object of the class anyway.
If we were to create that as a lose function, then:

from sktime import load_network
cnn_network = load_network('path`)

In this case, we are making the neural network directly accessible to the user (which we have not done yet in any model by design, either neural or statistical from what I know. Let me know if otherwise). My implementation was with the idea that users should not directly interact with the model, but with an abstraction layer of class/object (either that is a wrapper, as in the case of vecm or arima or we create our own model but just let the users use certain functionality like in the case of cnn)

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 9, 2022

Thanks for elucidating!

I still think the problems are as I have explained.

The principles I'd like to invoke are:

  • "don't require the user provide the same information twice", which in the case of your design happens once when you create the model originally (line 1), then before you load (line 4). The point is, the saved model already is a CNNClassifier with certain parameters, e.g., batch_size=32, so why have the user enter that information again?
  • "make objects inspectable and don't require the user to have secret knowlegde". The user needs to know what class and parameters have been saved. If the user does not know what model is saved in the file, there might be no practical way to guess that, since they would have to load before they inspect, but they can only load if they know the right model configuration - you just created a catch-22 in the workflow!

@AurumnPegasus
Copy link
Contributor Author

  1. About user providing the same information twice:
    In current implementation, as said before, the user will have to specify CNNClassifier twice, once for originally creating and fitting the network, and again if they ever were to load it.
    But if we were to create a lose function for load, they will have to do that anyway?
from sktime.classification import load_network
cnn = load_network(pathA)

load_network function will not return the object of class CNNClassifier, but returns a keras network with trained weights. If the user has to use the loaded network in any way, they will have to create the respective class object, and pass the loaded network to that class as init parameters. This is because the functions like predict and fit are class methods, and cannot be accessed without the object.
The saved model is not a CNNClassifier, but a keras network. Atleast in my knowledge, there is no straightforward way to store CNNClassifier as an object (mostly because of the parameter allowing to directly send keras optimizer which is not compatible with pickle)

  1. I understand your second point, there is a need to store the parameters of the class so that user is able to use the saved network again. My proposed design does not require them to know/remember any parameters. For example:
    a. If the user wants to load the model to make predictions:
cnn = CNNClassifier()
cnn.load(pathA)
cnn.predict(X)

Here, the user does not need to know the exact configuration with which they had originally fitted the network. This functionality is ensured in the load function in base.py, when we initialise self.model_ with the loaded model (hence no specifications of parameters required).

b. If the user wants to load the model to train on more data points: This is useful if after some time you realise there are more data points you can train your model on, or if you want to do something related to transfer learning. In such a case:

cnn = CNNClassifier()
cnn.load(path)
cnn.fit(train_X, train_y)

Here, the user does not need to know the exact configuration of saved network either, because of the line 146-150 in cnn.py

self.model_ = (
            self.build_model(self.input_shape, self.n_classes_)
            if self.model_ is None
            else self.model_
        )

The parameters are only required/used in self.build_model which is only called if the user has not loaded an existing model.

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 10, 2022

But if we were to create a lose function for load, they will have to do that anyway?

No, I think load should return an estimator object. If that is the signature, then it would not have to be done twice.

there is no straightforward way to store CNNClassifier as an object

That's why we need to write our own save/load functions. The save needs to store the keras network, and with it the additional information on the estimator.

My proposed design does not require them to know/remember any parameters.

I see - but then we have another problem, which I think is worse: the model will be loaded with the wrong parameters, and any inspection of it will return the wrong parameters. So, after load, the object will not be identical, and inspectable parameters (via the sklearn interface) will not correspond to the actual structure of the neural network.

If the user wants to load the model to train on more data points: This is useful if after some time you realise there are more data points you can train your model on, or if you want to do something related to transfer learning

Hm, model updating is currently something the interface does not support. A call for fit will always put the estimator on a clean slate before ingesting the data, this is a requirement by the sklearn interface. So, what you say above, would not work.

We could think about an update for classifiers etc. Although I would suggest to work this out for forecasters first, which already have an update.

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 10, 2022

PS: you have conflicts with main, kindly check and resolve before the branch gets too outdated.

@AurumnPegasus
Copy link
Contributor Author

you have conflicts with main, kindly check and resolve before the branch gets too outdated.

Right, I was waiting for this discussion to resolve before solving the merge conflicts, but I will do that asap.

A call for fit will always put the estimator on a clean slate before ingesting the data, this is a requirement by the sklearn interface.

Aaaah, okay, that does resolve a lot of discussion we had. Maybe some day later then.

The save needs to store the keras network, and with it the additional information on the estimator.

Okay, I understand. The main issue is when we generally save a class, it saves the state dictionary. This state dictionary contains all the class parameters (as well as current state of the object). The state is saved via pickle, generally. The issue is that self.optimizer and self.model_ are keras objects, which cannot be pickled. Hence why just storing the model separately made more sense from my pov.
I will try looking up something for this, maybe if we can pickle the class object without those specific class parameters it would work fine. In such a case we would have two saved objects: a) state dictionary via pickle and b) keras network via keras.save
The parameters ignored in a) would be saved in b) anyway, so it should not be a difference.

the model will be loaded with the wrong parameters

I do not understand this point, can you please give an example?
If you are saying that say I save cnn_a and cnn_b as two paths, and by mistake gave cnn_b when I wanted to load cnn_a, isnt that a user's fault which is replicable in statistical model's case as well? (giving the correct path should be on the user?)
If you are saying that I save cnn_a, and by mistake load it on mlp_a, this will give wrong output on inspection, but it will not throw an error. When a keras network is saved, the network parameters are saved as a named dictionary. So output layer of MLPClassifier() will search for output_layer in the loaded model, and will take cnn_a's output_layer and use it instead. This will not get you desired result, but again, I feel that giving the correct path should be the users responsibility.
Ofcourse the user should not be expected to remember parameters, which they are not expected to, in this design. Maybe I have understood your point wrong, let me know.

@AurumnPegasus
Copy link
Contributor Author

An update on this solution in line with what Franz has put forward in #3336 (writing it here since I do not think a STEP document is still required)
Requirements based on our discussion: User should not have to remember anything wrt parameters or estimators.
Problem till now: Generally, when we are trying to save a model as pickle, it saves the __dict__ of the particular class (it is a dictionary containing all the parameters of respective class). The problem here is that keras.optimizer (which we accept as a parameter).

A simple solution would be to overwrite the proposed save and load_from_path methods as, and also overwrite __getstate__ (the function which returns state dictionary when pickle.dumps is called)

def __getstate__(self):
    out = self.__dict__.copy()
    del out['optimizer']
    del out['optimizer_']
    return out

This will be done in BaseDeepClassifer (since it is common for all DL estimators).
We will delete the optimizers as parameters itself (not technically, since we are not deleting it from self.__dict__ but we are deleting it form information stored in the pickle) so that all other parameters can be stored via pickle. There will be no information loss here (will describe it below).

In the proposed save function (#3336), there are two information blocks (I do not know the proper term for it) which are being written, metadata and object.
We add another information block to save the network as well. Here, we do keras.save to save the netowrk. The information about the optimizer is stored with the network (and hence no information loss):

def save(self, path):
    with ZipFile(path) as zipfile:
            with zipfile.open("metadata", mode="w") as meta_file:
                meta_file.write(type(self))
            with zipfile.open("object", mode="w") as object:
                object.write(pickle.dumps(self))
            with zipfile.open("network", mode="w") as object:
                self.model_.save(path)
        return ZipFile(path)

And we will just need to reinitialise the particular optimizer when we are loading them again

def load_from_path(self, path):
    self.model = keras.load(path)
    self.optimizer = self.model.optimizer

Hence, we have all the information stored (and thus user does not need to remember anything at all), and there is no information loss either.
This is also useful since the pickle tests will pass (although I doubt that is a concern wrt DL estimators).

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 29, 2022

Some feedback:

This will be done in BaseDeepClassifer (since it is common for all DL estimators).

If it is common to all DL estimators (not classifiers), should it not be done in BaseDeepNetwork?

two information blocks (I do not know the proper term for it)

these are just files in the zip archive, look up how zip works if you haven't seen it before
https://en.wikipedia.org/wiki/ZIP_(file_format)
it's just a "cheap" way to have multiple file blobks but also just a single file in the end

We add another information block to save the network as well. Here, we do keras.save to save the netowrk. The information about the optimizer is stored with the network (and hence no information loss):

Yes, good idea!

Re overwriting __getstate__, is this not going to have unintended side effects? Would it maybe be "safer" to remove the optimizer in save, instead? I think, design-wise, __getstate__ and __setstate__ should be inverses, which wouldn't be the case here (because the optimizer is lost if you do "get" then "set").

This is also useful since the pickle tests will pass (although I doubt that is a concern wrt DL estimators).

Well, I've changed the pickle tests to save/load, so those should pass.

Further comments:

  • I still think we should have a STEP that is in line with the final design.
  • we should also implement a save to memory, i.e., serialization to memory. I suspect that would be in analogy though.

@AurumnPegasus
Copy link
Contributor Author

AurumnPegasus commented Aug 30, 2022

If it is common to all DL estimators (not classifiers), should it not be done in BaseDeepNetwork?

In the current design this would be done in BaseDeepClassifier, since it requires the parameters of the classifiers as well (and BaseDeepNetwork is not a true parent class of all estimators, rather, it is parent class of all networks). Based on the design we choose in #3190 (and the STEP design), we will write it in appropriate parent class.

Would it maybe be "safer" to remove the optimizer in save, instead?

We are not technically removing the optimizer. In the comment I made, in the __getstate__ function, I am removing optimizer from the copy of __dict__ (not from the actual dict)

I will be creating a step document with this in mind, I hope the other doubts would be resolved in those.

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 2, 2022

I'm closing this in favour of #3425 where I have also added you as an author (for get/setstate)

@fkiraly fkiraly closed this Oct 2, 2022
fkiraly added a commit to sktime/enhancement-proposals that referenced this pull request Oct 30, 2022
Serialization and deserializSerialization and deserialization interface for sktime estimators, supporting the following workflows:

1. serialization to in-memory object and deserialization
2. serialization to single file and deserialization

Initially proposed in sktime issues [#3128](sktime/sktime#3128) and [#3336](sktime/sktime#3336).

The proposed design introduces:

* an object interface points for estimators, `save`
* a loose method `load`
* a file format specification `skt` for serialized estimator objects

which are counterparts and universal.

Concrete class or intermediate class specific logic is implemented in:

* `save`, a class method
* `load_from_path` and `load_from_serial`, class methods called by `load`

while no class specific logic is present in `load`.ation for sktime estimators
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:classification classification module: time series classification
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] save/load functionality for models
3 participants