Skip to content

[ENH] Changes to CNN Classifier #2991

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Aug 6, 2022
Merged

Conversation

AurumnPegasus
Copy link
Contributor

@AurumnPegasus AurumnPegasus commented Jul 13, 2022

Reference Issues/PRs

Fixes #2935

What does this implement/fix? Explain your changes.

  • Moving the convert_y_to_keras function in BaseDeepClassifier into a utils function
  • Enable returning the label_encoder and one hot encoder to the user from convert_y_to_keras (it can be useful for the users in case they want to use it to encode other data / or to decode data later on)
  • Add option to have multiple linear layers in output layer / different activation function. Also add the use_bias parameter to output layer. (simply as a customisation option to users, multiple linear layers are useful to create deeper networks depending on dataset / choice of activation function)
  • option to choose optimiser (usually its Adam but just for customisation's sake we can give this as option as well). Also should allow specifying lr either within the optimiser or as parameter to the class itself.
  • Accept parameter as input in CNNClassifier to specify random seed
  • Train/fit the model on GPU depending on users choice (maybe take an input parameter for this as well)
  • Option to view / return the loss of fit (either as a graph or simply as a list / array )
For all contributions
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG] indicating whether the PR topic is related to enhancement, maintenance, documentation, or bug.

@AurumnPegasus
Copy link
Contributor Author

For the first task

Moving the convert_y_to_keras function in BaseDeepClassifier into a utils function

Any idea exactly where should I place this utils function?

@AurumnPegasus
Copy link
Contributor Author

For GPU functionality:

  1. A user would need tensorflow-gpu and cuda-toolkit installed. We will have to add them as a soft dependency? I am unsure if these are required since pytorch (which I am comfortable with) does not really require them
  2. We can also have the functionality of multi-gpu: https://keras.io/guides/distributed_training/ . In my estimate majority of the people using sktime would be on kaggle / collab in which case there is a single gpu used. And time series data should not be so heavy that multi-gpu are required, but there might be few using it on larger data / maybe college gpu cluster where they might have easy access to gpus and would prefer running the models with multiple gpus? It might be worth having in those cases

@AurumnPegasus
Copy link
Contributor Author

Update: I am also ignoring tests test_fit_does_not_overwrite_hyper_params for CNNClassifier since that is giving issues wrt pickling tf models (more specifically, it is unable to pickle a lambda funciton which is used for optimizer).
Stack overflow mentions that pickle by default does not support lambda functions, and doing any work around for the same would require going too deep into tf implementation instead.

@AurumnPegasus AurumnPegasus marked this pull request as ready for review July 18, 2022 06:30
@fkiraly
Copy link
Collaborator

fkiraly commented Jul 18, 2022

Update: I am also ignoring tests test_fit_does_not_overwrite_hyper_params for CNNClassifier

What is the problem with that test?

@AurumnPegasus
Copy link
Contributor Author

AurumnPegasus commented Jul 18, 2022

The test uses pickle to save the model, and I think once we update the save function as per #3022 we can stop ignoring it.
The error it gives:

    raise PicklingError(
_pickle.PicklingError: ("Can't pickle <function make_gradient_clipnorm_fn.<locals>.<lambda> at 0x7f780541a3a0>: it's not found as keras.optimizers.optimizer_v2.utils.make_gradient_clipnorm_fn.<locals>.<lambda>", 'PicklingError while hashing <keras.optimizers.optimizer_v2.adam.Adam object at 0x7f78041cefd0>: PicklingError("Can\'t pickle <function make_gradient_clipnorm_fn.<locals>.<lambda> at 0x7f780541a3a0>: it\'s not found as keras.optimizers.optimizer_v2.utils.make_gradient_clipnorm_fn.<locals>.<lambda>")')

@AurumnPegasus
Copy link
Contributor Author

Hey @fkiraly @TonyBagnall ,
Any idea why the single test is failing for mac? I can not find any leads

@fkiraly
Copy link
Collaborator

fkiraly commented Jul 23, 2022

re single test, for anyone who reads this, it was discussed on slack, and is related to the sporadic failure #2368

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks almost good to go to me!

I left some comments related to interface conformance, summary and random_seed. The rest is not blocking.

Also, question: can you explain why the test test_fit_does_not_overwrite_hyper_params did not fail prior to this PR? Which change makes it fail? Perhaps that could be avoided somehow?

@AurumnPegasus
Copy link
Contributor Author

Also, question: can you explain why the test test_fit_does_not_overwrite_hyper_params did not fail prior to this PR? Which change makes it fail? Perhaps that could be avoided somehow?

From what I know, it is because of specifying lr in the optimizer by default. The full error description is this

'test_fit_does_not_overwrite_hyper_params[CNNClassifier-ClassifierFitPredictMultivariate]': PicklingError("Can't pickle <function make_gradient_clipnorm_fn.<locals>.<lambda> at 0x7f0a6918aaf0>: it's not found as keras.optimizers.optimizer_v2.utils.make_gradient_clipnorm_fn.<locals>.<lambda>", 'PicklingError while hashing <keras.optimizers.optimizer_v2.adam.Adam object at 0x7f0a6917e940>: PicklingError("Can\'t pickle <function make_gradient_clipnorm_fn.<locals>.<lambda> at 0x7f0a6918aaf0>: it\'s not found as keras.optimizers.optimizer_v2.utils.make_gradient_clipnorm_fn.<locals>.<lambda>")')

On a bit of investigation, the error is when I use Adam (or any optimizer) and store it as a class variable. For example, in the current code I am doing self.optimizer = Adam() in line 140, which leads to the test_fit_does_not_overwrite_hyper_params failure. On the other hand, if I were to not write that line, and instead directly assign Adam within the model.compile at line 149, it will show all tests passed.

Directly specifying the optimizer is not feasible, since we want to give the option to specify the optimizer to the user. Ofcourse, if instead of optimizer we were to directly take input of learning_rate we can do it, but it feels like a work around, and it would take away the choice of optimizer from the user (although Adam is the most popular one)

@fkiraly
Copy link
Collaborator

fkiraly commented Jul 24, 2022

Ah, I see.

An sklearn interface requirement is to never overwrite self.paramname if paramname is an __init__ arg.

The test is failing correctly and pointing out that you are not interface compliant!

Instead, write the optimizer to self.optimizer_ in __init__, and overwrite that if it is None, so self.optimizer remains unchanged.

@fkiraly
Copy link
Collaborator

fkiraly commented Jul 24, 2022

PS: generally, please do not except tests if are not sure what the tests are testing. Please remove the test skip again.

@AurumnPegasus
Copy link
Contributor Author

@fkiraly
Updated the code, it works now. Thanks for the suggestion!

@AurumnPegasus
Copy link
Contributor Author

Hey, before merging this I am planning to make a couple more design changes. I will formalise it in a bit, just informing so we dont merge this

@AurumnPegasus
Copy link
Contributor Author

@fkiraly made some changes:
I converted label_encoder and onehot_encoder to class variables instead of function variables as they were. I dont know why they were function variables in the first place (maybe I did not realise it at first) but convert_y_to_keras (the function which takes in the label_encoder and onehot_encoder as input in BaseDeepClassifier) is called from CNNClassifier which itself has no parameter for label_encoder and onehot_encoder (hence, no way for the user to specify it!)

Now, I am asking for them as input in __init__ of CNNClassifier so users can specify required encoders. One thing I want you to note/comment on is line 174-175 where I am doing:

self.label_encoder = self.label_encoder_
self.onehot_encoder = self.onehot_encoder_

In cnn.py. The reason for this is so that BaseDeepClassifier can directly change self.label_encoder from the convert_y_to_keras method, and hence I do not have to send it parameters, or receive the two encoders along with y (which as you stated, is non obvious). If you think this is unnecessary, I can change it back so that we send and receive parameters, and update the documentation to as it was in the previous commit.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Two comments:

  • can you explain what purpose exposing the label and onehot encoder in the __init__ serves? What would be a second example that makes sense here, i.e., another value that the user can pass?
  • __init__ arguments should never be mutated. The line self.label_encoder = self.label_encoder_ creates a (pointer) reference and not a copy, so calling fit_transform on it will mutate the reference.

@AurumnPegasus
Copy link
Contributor Author

  • can you explain what purpose exposing the label and onehot encoder in the __init__ serves? What would be a second example that makes sense here, i.e., another value that the user can pass?

My essential idea was to allow for custom functions instead of the default one by sklearn. For example, if I want to use a specific Label Encoder (and I send the encoded data to CNNClassifier), the CNNClassifier will run the sklearn's default LabelEncoder on top of it which might cause it to lose some information. Maybe a better solution would have been to have a variable OR a check telling if it has already been encoded?

@fkiraly
Copy link
Collaborator

fkiraly commented Jul 30, 2022

Maybe a better solution would have been to have a variable OR a check telling if it has already been encoded?

Can you give two or three examples of intended usage here? Not sure whether I fully understand.

@AurumnPegasus
Copy link
Contributor Author

Okay so i thought a bit about it and I was wrong, I do not think it is required:
My original though revolved around this particular scenario:
Lets say my inputs are ['>90', '80-90'....'0-10'] (these are the classes I want to classify in. For some reason I want to do different types of grouping, where '50-100' becomes when group and '0-50' becomes another. In this case I would either have to change the data, or simply create a new Label Encoder which would do encodings are required. This is a customisation I thought would be useful
But, if the user has to create a Label Encoding function themself, they can easily just run the test data on it and pass it on to the CNN Classifier. Another doubt I had about information loss would not hold either, since running sklearn Label Encoder on encoded data would not give different results in such a case.
I will remove them and set the default as sklearn's LabelEncoder and OneHotEncoder

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 1, 2022

ok, also fine for me.

I don't have an opinion either way, only applying the general principle that function or init args should be justified by at least two, optimally three different, meaningful examples of values/references that users may want to pass to them.

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 2, 2022

Let me know when you think this is ready for another review.

@AurumnPegasus
Copy link
Contributor Author

@fkiraly it is, have made the requested changes.

@AurumnPegasus AurumnPegasus requested a review from fkiraly August 4, 2022 15:12
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

  • random_seed should be random_state for consistency with sklearn and the rest of sktime
  • we have already released BaseDeepClassifier with args batch_size and random_state, so should not remove them without deprecation

Things fixed:

  • handling of self.optimizer_ is good now!
  • convert_y_to_keras looks good now

@AurumnPegasus AurumnPegasus requested a review from fkiraly August 5, 2022 10:01
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:classification classification module: time series classification
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] Changing and Adding features to CNN Classifier
3 participants