Skip to content

Conversation

martinjankowiak
Copy link
Collaborator

@martinjankowiak martinjankowiak commented Sep 6, 2021

joint work with @fehiepsi

instead of parameterizing a lower cholesky factor as an unconstrained strictly lower triangular piece and a positive diagonal we instead parameterize as

L = unit_scale_tril @ scale_diag

where unit_scale_tril is lower triangular with ones along the diagonal and scale_diag is a positive diagonal matrix.

not surprisingly (consider e.g. the analogous parameterization in AutoLowRankMultivariateNormal) this seems to lead to consistently better performance (all results use AutoMultivariateNormal):

logistic regression dataset with N=50k, D=28

  • elbo: -32 430.37 [this PR]
  • elbo: -32 590.92 [before this PR]

logistic regression dataset with N=50k, D=18

  • elbo: -25 112.28 [this PR]
  • elbo: -25 280.20 [before this PR]

logistic regression dataset with N=37k, D=15

  • elbo: -12 576.18 [this PR]
  • elbo: -12 592.29 [before this PR]

sparse FITC GP classifier with N = 37k, D = 14, and latent dim = 128 (inducing points)

  • elbo: 9 897.12 [this PR]
  • elbo: 9 765.09 [before this PR]

sparse FITC GP classifier with N = 37k, D = 14, and latent dim = 64 (inducing points)

  • elbo: 9 925.11 [this PR]
  • elbo: 9 902.51 [before this PR]

sparse FITC GP classifier with N = 37k, D = 14, and latent dim = 32 (inducing points)

  • elbo: 9 896.23 [this PR]
  • elbo: 9 884.37 [before this PR]

sparse FITC GP classifier with N = 37k, D = 14, and latent dim = 16 (inducing points)

  • elbo: 9 320.48 [this PR]
  • elbo: 9 311.96 [before this PR]

sparse FITC GP classifier with N = 100k, D = 27, and latent dim = 64 (inducing points)

  • elbo: -55 858.99 [this PR]
  • elbo: -55 947.08 [before this PR]

sparse FITC GP classifier with N = 100k, D = 17, and latent dim = 64 (inducing points)

  • elbo: -43 956.38 [this PR]
  • elbo: -44 015.16 [before this PR]

(note: inducing points have the same initialization for each comparison; not surprisingly i was getting noisy results before i made sure this was the case; also note the N=37k GP elbos are missing a log pi term that shifts them by large amounts)

there appears to be a clear winner but do we need more experiments? @fehiepsi what do you think?

(note: whether positivity is enforced via exponential or softplus tends to be much less important)

@fehiepsi fehiepsi removed the WIP label Sep 6, 2021
@fehiepsi
Copy link
Member

fehiepsi commented Sep 6, 2021

LGTM - you might need to double-check if using softplus is still good for your experiments.

@martinjankowiak martinjankowiak added the enhancement New feature or request label Sep 6, 2021
@fehiepsi fehiepsi merged commit 062d822 into master Sep 6, 2021
@fehiepsi fehiepsi deleted the scaledchol branch September 6, 2021 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants