-
-
Notifications
You must be signed in to change notification settings - Fork 97
Feature/1029 record fixed param #1030
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…4.1 (tags/RELEASE_600/final)
Jenkins Console Log Machine informationProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: G++: Clang: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be changed so that the CSV should be similar if you are running manually starting fixed_param or when cmdstan automatically changes it to fixed_param. With the current implementation the CSV metadata states that both algorithm = hmc as well as fixed_param = 1.
Manual fixed_param:
...
# method = sample (Default)
# sample
# num_samples = 1000 (Default)
# num_warmup = 1000 (Default)
# save_warmup = 0 (Default)
# thin = 1 (Default)
# adapt
# engaged = 1 (Default)
# gamma = 0.050000000000000003 (Default)
# delta = 0.80000000000000004 (Default)
# kappa = 0.75 (Default)
# t0 = 10 (Default)
# init_buffer = 75 (Default)
# term_buffer = 50 (Default)
# window = 25 (Default)
# algorithm = fixed_param
# id = 0 (Default)
...
# stanc_version = stanc3 5f0258c1
# stancflags =
# fixed_param = 1
lp__,accept_stat__,p
Automatic fixed_param:
...
# method = sample (Default)
# sample
# num_samples = 1000 (Default)
# num_warmup = 1000 (Default)
# save_warmup = 0 (Default)
# thin = 1 (Default)
# adapt
# engaged = 1 (Default)
# gamma = 0.050000000000000003 (Default)
# delta = 0.80000000000000004 (Default)
# kappa = 0.75 (Default)
# t0 = 10 (Default)
# init_buffer = 75 (Default)
# term_buffer = 50 (Default)
# window = 25 (Default)
# algorithm = hmc (Default)
# hmc
# engine = nuts (Default)
# nuts
# max_depth = 10 (Default)
# metric = diag_e (Default)
# metric_file = (Default)
# stepsize = 1 (Default)
# stepsize_jitter = 0 (Default)
# id = 0 (Default)
...
# stanc_version = stanc3 5f0258c1
# stancflags =
# fixed_param = 1
lp__,accept_stat__,p
I couldn't agree more. I looked at the classes in |
|
OK, will look into messing with |
The problem here isn't with the The entire CmdStan ecosystem was designed around the arguments being immutable. A user specifies the configuration and CmdStan executes exactly that configuration no matter the structure of the Stan program. The behavior of CmdStan is then entirely determined how it is called without the need to consult the associated Stan program, which is critical to reproducible analysis pipelines where different Stan programs are switched in and out. In other words the immutability of the configuration ensures no surprises in what is executed. Modifying the configuration based on the structure of the Stan program breaks this guarantee. Instead of returning an error that immediately communicates a conflict between the requested configuration and the given Stan program the assumption of a misspecified configuration propagates through the entire analysis pipeline. If that assumption is incorrect then it would not be observed until later on in the pipeline complicating debugging. The reason why the argument parser does not have readily-accessible mutators is that it was designed for this immutability paradigm. By breaking the immutability you've gone beyond the intended use of the code. That said while mutators are not implemented the modification of the parsed argument is straightforward given that the argument parsing is implemented recursively.
Again I don't actually think that this fix should be implemented but rather the automatic modification reverted. |
I agree with @betanalpha that it seems confusing to change what users requested to something else because what they requested is incoherent. How about just generalizing the HMC algorithms to allow zero-size parameter vectors? If you look at the leapfrog algorithm with size zero vectors, the boundary condition is a no-op, which is just what we want. Then users can choose any of our HMC algorithms when there is zero-size parameters and the boundary condition should do the right thing. Then we don't need to modify any of our output. |
The reason why I introduced a separate “fixed_param” sampler configuration is that while the Hamiltonian trajectories themselves are fine the local (trajectory length) and global (stepsize, inverse metric) adaptation are not. Instead of trying to adapt each of those routines it was easier to have a separate, explicit option for the user to communicate when they don’t want to modify the parameter space at all.
… On Sep 21, 2021, at 12:43 PM, Bob Carpenter ***@***.***> wrote:
I agree with @betanalpha <https://github.com/betanalpha> that it seems confusing to change what users requested to something else because what they requested is incoherent.
How about just generalizing the HMC algorithms to allow zero-size parameter vectors? If you look at the leapfrog algorithm with size zero vectors, the boundary condition is a no-op, which is just what we want. Then users can choose any of our HMC algorithms when there is zero-size parameters and the boundary condition should do the right thing. Then we don't need to modify any of our output.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#1030 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALU3FSZMSID6ELYLAGG2NTUDCY3ZANCNFSM5BXW4HEQ>.
Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
the user's request was vague, but then defaults were applied, resulting in an incoherent request. it's a problem with the defaults, not the user. all the user wants is to run the program to generate a sample. |
That's a really good point. If a user only specifies "sample", then I don't see the problem with choosing fixed_param or adaptive, diagonal, Euclidean NUTS for them depending on whether the model has 0 or 1+ parameters. |
That's a really good point. If a user only specifies "sample", then I don't see the problem with choosing fixed_param or adaptive, diagonal, Euclidean NUTS for them depending on whether the model has 0 or 1+ parameters.
This is a misinterpretation of what “sample” means.
All of the method arguments refer how the parameter space is handled. `method=sample` refers to sampling from the parameters defined in a Stan program and nowhere else.
The `fixed_param` algorithm runs a trivial Markov chain that leaves the initial parameter values unperturbed. A consequence of this is that running with the `fixed_param` algorithm just executes the transformed parameters and generated quantities block on the initial parameters over and over again, regardless of the configuration of the parameter space or the code in those blocks. If the generated quantities block happens to have RNG evaluations then this might look like a sampling operation but that’s irrelevant to why `method=sample` and `algorithm=fixed_param` were chosen. And if the generated quantities block doesn’t have any RNG evaluations then nothing looks like sampling at all despite `method=sample` and `algorithm=fixed_param` still being meaningful.
More detailed warning messages to users clarifying this behavior is awesome. Adding a new method that more directly runs generated quantities without pretending to run a Markov chain, with the additional benefit of being able to craft proper arguments, is also awesome. Trying to change the behavior of `method=sample` based on overlapping terminology that might arise in some Stan programs is, in my opinion, much more confusing and much less awesome.
|
I agree that fixed-param doesn't do MCMC sampling, but it's main utility is for MC sampling using the I can see two reasons to run fixed-param.
(2) feels like it should be done another way. Are there other reasons people run fixed_param? What I tend to do in practice is just call the defaults. Then if I have zero size parameters, I have to fumble in the doc to figure out how to call a different function. And if I ever want to use a single program for both, I have to write a conditional into the interface scripts. But I agree that it's awkward to be able to supply HMC parameters if HMC isn't being run. If we enforce that consistency by throwing errors whenever we call fixed_param with HMC parameters, I'd still have to write conditionals into the interface scripts. |
I agree that fixed-param doesn't do MCMC sampling, but it's main utility is for MC sampling using the _rng functions. Not to go all Humpty-Dumpty here, but we can define the keyword "sample" in an interface to mean whatever we want it to mean as long as it's consistent with standard meanings of the word.
Sure, but we don’t want define it in ways that clash with what is actually being done or even possible.
For example there’s no general way to turn a Stan program that species a joint density over the parameters and data variables into a “procedurally generative” program that can generate example samples of all of those variables, so there’s no “sample” or “Monte Carlo” method that would ever a valid option for general Stan programs.
`fixed_param` describes what is done - fixed the parameters but letting everything else be evaluated -- not what one can do with the side effects of that action, such as evaluating the data or transformed data block once or running the generated quantities block over and over again.
I can see two reasons to run fixed-param.
to MC sample using pseudo-RNGS, or
to evaluate a Stan function or test a data/transformed data block.
I think it’s more to think about how a Stan program is actually processed, in other words which blocks are actually evaluated. `fixed_param` evaluates everything at the same initial values of the parameters block which then allows for all kinds of useful side effects based on the structure of those blocks. At the same time because those side effects can’t be guaranteed unless a Stan program has certain structure in certain blocks there’s no way to guarantee them and hence they’re not really appropriate naming targets.
What I tend to do in practice is just call the defaults. Then if I have zero size parameters, I have to fumble in the doc to figure out how to call a different function. And if I ever want to use a single program for both, I have to write a conditional into the interface scripts. But I agree that it's awkward to be able to supply HMC parameters if HMC isn't being run. If we enforce that consistency by throwing errors whenever we call fixed_param with HMC parameters, I'd still have to write conditionals into the interface scripts
I don’t see the problem with having to call the right method based on what you want to do.
`method=sample algorithm=hmc` will attempt to evolve the parameters with Markov chain Monte Carlo. That’s it. If your program does not provide enough information for that to be a well-defined operation then it should err our, which is used to do.
`method=sample algorithm=fixed_param` uses a trivial Markov chain Monte Carlo algorithm that doesn’t achieve the same behavior as the Hamiltonian Monte Carlo sampler. There are lots of useful side effects for this, and if you want one of those side effects then you should specify that directly with the call. You can’t rely on default behavior when there are multiple, distinct actions you want to perform.
Again I think having a full method that repeatedly evaluates the Stan program at the fixed input, and not hiding that in `method=sample` would be a great way to clarify the intent and possible applications. But this issue is all about the CmdStan interface automatically switching from `algorithm=hmc` to `algoirthm=fixed_param` based on an exceptional circumstance for certain Stan programs.
|
In theory, HMC works perfectly fine in a zero-dimensional space. It's just arithmetic with size-zero vectors, trivial but well-behaved. First, disabling the zero-parameter check in CmdStan and running with defaults result in
which is obviously false: the posterior is a point mass and necessarily proper. It's just that step size is meaningless and the step size adapter misdiagnoses this.
Ok, one-line fix in
All trajectories terminate after one proposal and always accept. Why does |
In theory, HMC works perfectly fine in a zero-dimensional space. It's just arithmetic with size-zero vectors, trivial but well-behaved.
In practice, Stan's HMC algorithm works perfectly fine in a zero-dimensional space but there are two other bugs that interfere.
First, disabling the zero-parameter check in CmdStan and running with defaults result in
Exception initializing step size.
Posterior is improper. Please check your model.
which is obviously false: the posterior is a point mass and necessarily proper.
If we’re being formal then a zero-dimensional space is topologically discrete and cannot be written as a dense subset of the real numbers. The entire algorithm library, from the algorithms to the output to the summaries, assumes that the ambient space is a subset of the real numbers (which is why Stan can’t directly handle spheres, torii, and other topologically nontrivial spaces even though Hamiltonian Monte Carlo itself is applicable). From that perspective the error message is correct — the posterior density function relative to a reference Lebesgue measure is not proper.
At the same time even if we equate “no parameters” with “zero-dimensional” the ambient space need not be a point. A single point is a classic model for a zero-dimensional space, but any discrete set of points is technically valid too.
It’s just that step size is meaningless and the step size adapter misdiagnoses this.
But precisely because step size is meaningless adaptation shouldn't do anything anyway. Running with adapt engaged=0 warmup completes successfully but the program crashes when it tries to print the metric.
stan/lib/stan_math/lib/eigen_3.3.9/Eigen/src/Core/DenseCoeffsBase.h:425: Eigen::DenseCoeffsBase<type-parameter-0-0, 1>::Scalar &Eigen::DenseCoeffsBase<Eigen::Matrix<double, -1, 1, 0, -1, 1>, 1>::operator()(Eigen::Index) [Derived = Eigen::Matrix<double, -1, 1, 0, -1, 1>, Level = 1]: Assertion `index >= 0 && index < size()' failed.
The adaption code, and again the I/O code, assume a continuous ambient space that is a subset of some finite-dimensional real space.
Whether or not all of the code can be adapted to an empty parameter block depends on the particular conventions assumed for the corresponding ambient space. If a point space is assumed then no-ops are valid, but again that’s not the only possible assumption. A useful discussion but tangential to the main issue in the thread.
You could generate the same samples a bit more efficiently with fixed_param sampler (which evaluates zero proposals for each iteration) but--and here's the problem this PR is trying to workaround--fixed_param has a different output format from all the other samplers.
That’s not the original issue. The originating problem is that the Stan configuration is printed to output based on the input configuration. CmdStanPy is currently modifying that configuration which then results in a conflict between what configuration is executed, and hence the structure of the output, and what configuration is reported. In particular the header of the CmdStan output reports that HMC is run while the body is given by fixed_param output.
Why does fixed_param have different output? As far as I can tell it's because fixed_param has no warmup and thus omits the adaptation info. On one hand that makes sense, "warming up" fixed_param would be just wasted computation; on the other hand fixed_param still supports thinning, which is equally pointless, and CmdStan accepts (but silently ignores) nonzero num_warmup argument even with algorithm=fixed_param.
In short, it's all inconsistent.
The output is consistent. Running different HMC sampler configurations results in different adaptation information being output, and the fixed_param configuration simply has no adaptation information to output. Warmup is an implicit no-op by default because generated quantities isn’t run if `save_warmup=0` and so there’s nothing to output, but if `save_warmup=1` then generated quantities are run and those evaluations are output.
The problem that’s arising is when other programs is run over the CmdStan output after the fact. Those programs — especially the code being print/stansummary — check the Stan configuration to determine what output to expect. When the output Stan configuration specifies that HMC was run but the output doesn’t contain the assumed HMC output then there are problems.
Now those CmdStan programs that read the CmdStan output are very much fragile and poorly implemented, a sin which goes back a very long time, and more robust processing logic would be welcome. But that’s a tangential issue here.
|
Right. R^0 has just a single element, the empty tuple and any measure has to assign it 100% of the probability mass. But it's still a proper measure and I can't see where the rest of the math won't generalize properly if we get our code right.
That feels like a bug to me in the brittle way we pack things into and read things out of .csv format.
This also just feels like a bug in our code not handling N = 0 boundary conditions. But isn't this moot for the issue at hand, which is just to let the sampling method deal with size zero output by allowing the flexibility to select between HMC or fixed-param depending on the problem dimensionality? |
If we’re being formal then a zero-dimensional space is topologically discrete and cannot be written as a dense subset of the real numbers.
Right. R^0 has just a single element, the empty tuple and any measure has to assign it 100% of the probability mass. But it's still a proper measure and I can't see where the rest of the math won't generalize properly if we get our code right.
This also just feels like a bug in our code not handling N = 0 boundary conditions.
But isn't this moot for the issue at hand, which is just to let the sampling method deal with size zero output by allowing the flexibility to select between HMC or fixed-param depending on the problem dimensionality?
Again: a zero-dimensional space does not imply a single element set! There are many zero-dimensional spaces and hence many different ways to extrapolate to the N = 0 boundary case of R^N. These include the empty set, a set with a single element, and a set with a finite number of elements.
Each of these zero-dimensional spaces imply different Markov chain Monte Carlo algorithms and output configurations. For example the empty set would correspond to no-ops everywhere with no parameter values passed to the MCMC output. A single element set would also correspond to no-op but the value of that one element would technically get passed to the MCMC output. A no-op sampler would definitely not be okay for a discrete set.
_The zero-dimensional case is ambiguous and there is not an obvious or canonical implementation of Markov chain Monte Carlo_. We can choose one arbitrarily or we can just not define HMC for this ambiguous setting, instead erring out with an informative message as was done previously. I strongly prefer the latter. If a user wants a no-op sampler then they should specify a no-op sampler.
When the output Stan configuration specifies that HMC was run but the output doesn’t contain the assumed HMC output then there are problems.
That feels like a bug to me in the brittle way we pack things into and read things out of .csv format.
It’s not bug in that it’s technically working as intended. Every well-defined HMC output has a guaranteed structure and the reader assumes that structure.
I agree that the CmdStan csv reader is brittle and should be updated or redesigned, but I don’t think this is the reason why.
|
There are indeed many zero dimensional spaces (although, no, the empty set is not one of them) but R^N does not mean some random N-dimensional space. R^N means the N-dimensional real vector space. That is unique for every N; zero is not special. The issue isn't about generalizing HMC to arbitrary zero dimensional spaces. The only space of interest is the one Stan chooses when a model has no parameters at all. And that particular space is quite well-behaved. |
There are indeed many zero dimensional spaces (although, no, the empty set is not one of them) but R^N does not mean some random N-dimensional space. R^N means the N-dimensional real vector space. That is unique for every N; zero is not special.
By empty set I was referring to the model theoretic use of the empty set itself as the canonical element of a set with cardinality one, but that’s irrelevant.
R^N does not uniquely define an N-dimensional real vector space. In order to identify R^N with a Euclidean vector space one has to fix a parameterization/coordinate-system; in particular non-linear transformations/reparameterizations/changes or coordinate systems change the identification of R^N with the Euclidean vector space. One can interpret this as an infinite number of different R^N, each identified with a canonical Euclidean vector space, or a single R^N that requires a choice of parameterization before it can be identified with a Euclidean vector space, but either way R^N doesn’t refer to unique, full-specified system.
The identification ambiguity carries over to the zero-dimensional case as well. For example let’s say that we define a vector space containing only the zero vector as a zero-dimensional space. There are an infinite number of ways to associate the zero vector with a single point, each corresponding to a different choice of origin in R^1 with the different choices related by translations.
Taking any choice arbitrarily yields has a non-empty state space, which formally means that we need a name for a variable that takes values in the state space so that we can print the constant output. This does not correspond to the no-op, no-output behavior of the fixed_param sampler.
|
no longer relevant. |
Submisison Checklist
./runCmdStanTests.py src/test
Summary:
In
command.hpp
, before invoking service methodfixed_param
, add line "# fixed_param=1" to Stan CSV header.Intended Effect:
This will make it easier for wrapper interfaces to check Stan CSV files. see #1029
How to Verify:
Added more unit tests
Side Effects:
CSV header files will change only when method
fixed_param
is invoked. Should be harmless.Documentation:
n/a
Copyright and Licensing
Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Columbia University
By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses: