Skip to content

Fix Epoch Number for Console Logging #411

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 15, 2019
Merged

Fix Epoch Number for Console Logging #411

merged 5 commits into from
Oct 15, 2019

Conversation

ostamand
Copy link
Contributor

@ostamand ostamand commented Oct 6, 2019

Fix epoch number in console logging.

Before:

Screen Shot 2019-10-06 at 1 17 12 PM

After:

Screen Shot 2019-10-06 at 1 14 38 PM

@Scitator
Copy link
Member

Scitator commented Oct 7, 2019

Hi, thanks for PR!

Nevertheless, working with checkpoint indices, we also need to logs correctness. In your example, best model saved as train.1.pth, that means checkpoint at 1 epoch.... but from logs perspective it's already 2nd epoch – a little bit confusing :)

@Scitator Scitator self-requested a review October 7, 2019 04:49
Copy link
Member

@Scitator Scitator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ostamand
Copy link
Contributor Author

ostamand commented Oct 7, 2019

You are right. I missed that. Since all the logs are epoch zero based, I propose that we keep it as is but update the total number of epochs in the console progress bar. To keep it consistent.

Screen Shot 2019-10-07 at 9 04 40 AM

Let me know if you would prefer a full refactoring of the epoch number so that it starts at 1 everywhere & I will do that instead.

@Scitator
Copy link
Member

Scitator commented Oct 9, 2019

Conceptually, I think it's much better to make full refactoring and save model after N epoch as checkpoint.N.pth. It's much easier to understand :)

Looks like you need to refactor here and here... at least I hope so :)
We still need to check logs, metrics and overall correctness.

@ostamand
Copy link
Contributor Author

ostamand commented Oct 9, 2019

@Scitator let me know what you think of this:

Added epoch_log & stage_epoch_log properties to RunnerState to keep it cleaner.

Multi-Stage Training Example

Stage 1:

image

[...]

image

Stage 2:

image

Loading from Stage 2 Checkpoint:

image

Looking at the plots

Stage 1:

image

Stage 2:

image

Loading from Stage 2 Checkpoint:

image

@ostamand
Copy link
Contributor Author

ostamand commented Oct 9, 2019

Was not expecting travis to fail for such a small change. Let me look into it first. Sorry.

@Scitator
Copy link
Member

Hah, @ostamand looks like you also need to rewrite tests a bit :)
We have check like train.1.loss < train.0.loss.... but with your new indexing... we need +1 there

@ostamand
Copy link
Contributor Author

@Scitator Cool thanks for the tests commit! I was about to take a look at it. Let me know if there is any other change you want me to do?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants