-
Notifications
You must be signed in to change notification settings - Fork 64
FAQ
To make debugging easier and make TensorFlow behave like Numpy so that breakpoints can be set and values be read out immediately, an environment variable (currently experimental, maybe changes name in the future!) can be set: ZFIT_DO_JIT. If this is False, zfit behaves like using Numpy.
Most of them are from TensorFlow and are purely informational nature. You can make zfit suppress them by setting the environment variable 'ZFIT_DISABLE_TF_WARNINGS' to a true value (1, ...).
NaNs are "Not a Number" and hint towards an "illegal" mathematical operation. There is a variety of reasons for this to happen, most commonly:
Most losses (such as the Negative Log Likelihood) involve a logarithm. If a single number is negative, this will result
in NaNs. Make sure that your PDF produces only non-negative values, also not due to numerical instabilites. The value of the loss can be obtained by loss.value()
.
Make sure that the PDF is defined in the right way. Try to debug using the eager flag.
See below in Fitting and Minimization
There are "two types" of constraints. The most common one are the "auxiliary measurements" that we use often to incorporate knowledge about a parameter (actually an approximation of the likelihood the parameter was obtained with); they use very often "Gaussian" constraints. The second type are equalities and inequalities, with the simplest of them being limits on parameters. Apart of the trivial case, these are constraints that can also be incorporated into the likelihood: if an equality/inequality constraint is not true, the likelihood of this is 0. The advantage of this is that everything is well defined. The problem arises that minimizers don't like steps given by delta or heavyside distributions (that mathematically encapsulate the (in)equalities) and using a very steep wall instead will work better.
Some minimizers support some contstraints explicitly. This can help the minimization and avoid the step function but lacks a fundamental problem: the likelihood does not contain all of the knowledge. A solution to the minimum of
(a special case are equality constraints that often can be expressed as functions of parameters using ComposedParameter
)
Therefore, zfit currently doesn't use this functionality and discourages its use, instead a SimpleLoss
with a custom created steep wall on a case-by-case basis is favorable. V2 of zfit is planned to contain objects that have more knowledge about constraints which can be optimized in the minimizer (i.e. taken into account by it).
...that lead to NaNs in the loss. It can be because of numerical instabilities, in which case it is best to force them a small, minimal value.
Other than that, "significant" negative values can be caused for example in polynomial fits. There are two possibilities:
- if the fit has a valid (as in no negative counts) minimum, then it should work if the initial values are close enough. zfit (by default) does push back the loss if the loss evaluates to NaN (as with negative values), so a few wrong evaluations should not matter. It is possible to (temporarily) suppress negative values in the model by setting them to a minimal small value (if self defined) and find the minimum. Then remove this limitation and fit again to find the true minimum.
- maybe the fit actually has negative values at the minimum. If the minimizer fails, maybe make it verbose (10 prints every step) and set the params at this point, then profile the parameter and see how that looks like.
There can be several reasons causing this issue. First thing to try is to set the verbosity of the minimizer high, e.g. 7 or even 10 (this can be done when creating the minimizer).
The output is the parameter values, loss and gradients as used by the minimizer.
very likely
Most algorithms (Minuit,...) are local optimizers. They search in the vicinity of their starting values for the minimum. It can be crucial to be not too far from the actual minimum. If in doubt, try to start very close and see if the fit converges. Then start moving away.
likely, mostly with yields involved
Parameters have a step size, which should be chosen in the order of the uncertainty on that parameter. This is crucial for the minimization to be in the correct order of magnitude. Since the uncertainty is only known posterior to the fit, the step_size defaults to something around (upper_limit - lower_limit) / 1e-4. Mostly for yields, this value has to be adjusted (increased). Usually, a good value is 0.1 for normal parameters, for yield 1-10. If the minimization stops in the beginning producing NaNs, it is likely though that the step size was chosen too large, making the minimizer move into a region that produces NaNs
somewhat likely with Minuit
The Minuit minimizer provides its own, internal gradient calculation. While this is usually slower, it can lead to
more stable minimizations. To activate this, use minimizer = Minuit(use_minuit_grad=True)
.
unlikely, except if arbitrary Python code
TensorFlow, the underlying engine, provides automatic (~analytic) gradients. But for several reasons (e.g. if you use a z.py_function
, or a non-gradient supported function (unlikely) etc.) can lead to a wrong gradient computation.
Using numerical derivatives can solve this problem.
zfit.run.set_autograd_mode(False)
While the reason are usually finer grained, a possibility to stabilize things further is to add a penalty to the loss whenever NaNs are encountered. This behavior is by default enabled using the default minimizer strategy. This can be tuned however to be more or less tolerant.
There is a tutorial available about run modes. Consider going through that to understand the different modes.
In general, for debugging purpose, use graph=False
. This allows to step through and have all the numbers.
If, on the other hand, the problem is more advanced and seems to behave differently between graph=False
and graph=True
, consider debugging with this two modes.
For all other purposes, the default mode 'auto'
is best.
- Are all operations written in TF?
- yes: are they many calls with different arguments, e.g. calling the loss value once, then changing
norm_range
?- yes:
graph=False
may improve the performance or cleaning regularly the cachezfit.run.clear_graph_caches()
- no: use the default
graph='auto'
- yes:
- no: are they wrapped within
z.py_function
?- yes: default
- no: if they can't be wrapped (as the last possibility to consider), set
graph=False
. Furthermore, since graph mode will break this implementation, consider adding azfit.run.assert_executing_eagerly()
.
- yes: are they many calls with different arguments, e.g. calling the loss value once, then changing
- Is there any non-TensorFlow functionality that changes between calls to a loss/pdf based on Python/Numpy logic?
For example if a shape from SciPy is used, wrapped with
z.py_function
or even in complete eager mode.- no: use the autograd (or maybe an internal minimizer gradient for stability, in some cases this can be beneficial)
- yes: disable the autograd with
zfit.run.set_mode(autograd=False)
and maybe use an internal minimizer gradient.
Make sure that the shape is known. Before returning your value in the overriden function (e.g. _unnormalized_pdf
), set the shape of it explicitly to the shape of the input. For example, if the data is unstacked as a, b = z.unstack_x(x)
, the shape can be set as probs.set_shape(a.shape)
, where probs
is the return value of the overriden function.