-
Notifications
You must be signed in to change notification settings - Fork 267
add ScaledUnitLowerCholeskyTransform and change default AutoMultivariateNormal parameterization #1146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
martinjankowiak
commented
Sep 6, 2021
fehiepsi
approved these changes
Sep 6, 2021
LGTM - you might need to double-check if using softplus is still good for your experiments. |
2 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
joint work with @fehiepsi
instead of parameterizing a lower cholesky factor as an unconstrained strictly lower triangular piece and a positive diagonal we instead parameterize as
L = unit_scale_tril @ scale_diag
where
unit_scale_tril
is lower triangular with ones along the diagonal andscale_diag
is a positive diagonal matrix.not surprisingly (consider e.g. the analogous parameterization in
AutoLowRankMultivariateNormal
) this seems to lead to consistently better performance (all results useAutoMultivariateNormal
):logistic regression dataset with N=50k, D=28
-32 430.37
[this PR]-32 590.92
[before this PR]logistic regression dataset with N=50k, D=18
-25 112.28
[this PR]-25 280.20
[before this PR]logistic regression dataset with N=37k, D=15
-12 576.18
[this PR]-12 592.29
[before this PR]sparse FITC GP classifier with N = 37k, D = 14, and latent dim = 128 (inducing points)
9 897.12
[this PR]9 765.09
[before this PR]sparse FITC GP classifier with N = 37k, D = 14, and latent dim = 64 (inducing points)
9 925.11
[this PR]9 902.51
[before this PR]sparse FITC GP classifier with N = 37k, D = 14, and latent dim = 32 (inducing points)
9 896.23
[this PR]9 884.37
[before this PR]sparse FITC GP classifier with N = 37k, D = 14, and latent dim = 16 (inducing points)
9 320.48
[this PR]9 311.96
[before this PR]sparse FITC GP classifier with N = 100k, D = 27, and latent dim = 64 (inducing points)
-55 858.99
[this PR]-55 947.08
[before this PR]sparse FITC GP classifier with N = 100k, D = 17, and latent dim = 64 (inducing points)
-43 956.38
[this PR]-44 015.16
[before this PR](note: inducing points have the same initialization for each comparison; not surprisingly i was getting noisy results before i made sure this was the case; also note the N=37k GP elbos are missing a log pi term that shifts them by large amounts)
there appears to be a clear winner but do we need more experiments? @fehiepsi what do you think?
(note: whether positivity is enforced via exponential or softplus tends to be much less important)