Skip to content
Jonas Eschle edited this page Nov 28, 2023 · 14 revisions

Frequently Asked Questions about zfit in usage

Debugging

To make debugging easier and make TensorFlow behave like Numpy so that breakpoints can be set and values be read out immediately, an environment variable (currently experimental, maybe changes name in the future!) can be set: ZFIT_DO_JIT. If this is False, zfit behaves like using Numpy.

General

There are many warnings and prints

Most of them are from TensorFlow and are purely informational nature. You can make zfit suppress them by setting the environment variable 'ZFIT_DISABLE_TF_WARNINGS' to a true value (1, ...).

Numerical problems

NaNs are produced

NaNs are "Not a Number" and hint towards an "illegal" mathematical operation. There is a variety of reasons for this to happen, most commonly:

Loss returns NaN

Most losses (such as the Negative Log Likelihood) involve a logarithm. If a single number is negative, this will result in NaNs. Make sure that your PDF produces only non-negative values, also not due to numerical instabilites. The value of the loss can be obtained by loss.value().

PDF returns NaN or negative values

Make sure that the PDF is defined in the right way. Try to debug using the eager flag.

During minimization

See below in Fitting and Minimization

Fitting and minimization

Constraints

There are "two types" of constraints. The most common one are the "auxiliary measurements" that we use often to incorporate knowledge about a parameter (actually an approximation of the likelihood the parameter was obtained with); they use very often "Gaussian" constraints. The second type are equalities and inequalities, with the simplest of them being limits on parameters. Apart of the trivial case, these are constraints that can also be incorporated into the likelihood: if an equality/inequality constraint is not true, the likelihood of this is 0. The advantage of this is that everything is well defined. The problem arises that minimizers don't like steps given by delta or heavyside distributions (that mathematically encapsulate the (in)equalities) and using a very steep wall instead will work better.

What about minimizer constraints?

Some minimizers support some contstraints explicitly. This can help the minimization and avoid the step function but lacks a fundamental problem: the likelihood does not contain all of the knowledge. A solution to the minimum of $a + 2* b + 1$ looks very different if we impose additionally $a = b$. But if the latter is not in the function objective in general, the minimization may works but any further steps (uncertainties, likelihood profile, coverages,...) will not be valid anymore.

(a special case are equality constraints that often can be expressed as functions of parameters using ComposedParameter)

Therefore, zfit currently doesn't use this functionality and discourages its use, instead a SimpleLoss with a custom created steep wall on a case-by-case basis is favorable. V2 of zfit is planned to contain objects that have more knowledge about constraints which can be optimized in the minimizer (i.e. taken into account by it).

Negative probabilities

...that lead to NaNs in the loss. It can be because of numerical instabilities, in which case it is best to force them a small, minimal value.

Other than that, "significant" negative values can be caused for example in polynomial fits. There are two possibilities:

  • if the fit has a valid (as in no negative counts) minimum, then it should work if the initial values are close enough. zfit (by default) does push back the loss if the loss evaluates to NaN (as with negative values), so a few wrong evaluations should not matter. It is possible to (temporarily) suppress negative values in the model by setting them to a minimal small value (if self defined) and find the minimum. Then remove this limitation and fit again to find the true minimum.
  • maybe the fit actually has negative values at the minimum. If the minimizer fails, maybe make it verbose (10 prints every step) and set the params at this point, then profile the parameter and see how that looks like.

My fit does not converge

There can be several reasons causing this issue. First thing to try is to set the verbosity of the minimizer high, e.g. 7 or even 10 (this can be done when creating the minimizer).

The output is the parameter values, loss and gradients as used by the minimizer.

initial values

very likely

Most algorithms (Minuit,...) are local optimizers. They search in the vicinity of their starting values for the minimum. It can be crucial to be not too far from the actual minimum. If in doubt, try to start very close and see if the fit converges. Then start moving away.

step size

likely, mostly with yields involved

Parameters have a step size, which should be chosen in the order of the uncertainty on that parameter. This is crucial for the minimization to be in the correct order of magnitude. Since the uncertainty is only known posterior to the fit, the step_size defaults to something around (upper_limit - lower_limit) / 1e-4. Mostly for yields, this value has to be adjusted (increased). Usually, a good value is 0.1 for normal parameters, for yield 1-10. If the minimization stops in the beginning producing NaNs, it is likely though that the step size was chosen too large, making the minimizer move into a region that produces NaNs

gradient

somewhat likely with Minuit

The Minuit minimizer provides its own, internal gradient calculation. While this is usually slower, it can lead to more stable minimizations. To activate this, use minimizer = Minuit(use_minuit_grad=True).

unlikely, except if arbitrary Python code

TensorFlow, the underlying engine, provides automatic (~analytic) gradients. But for several reasons (e.g. if you use a z.py_function, or a non-gradient supported function (unlikely) etc.) can lead to a wrong gradient computation. Using numerical derivatives can solve this problem. zfit.run.set_autograd_mode(False)

Shooting with a cannon on birds

While the reason are usually finer grained, a possibility to stabilize things further is to add a penalty to the loss whenever NaNs are encountered. This behavior is by default enabled using the default minimizer strategy. This can be tuned however to be more or less tolerant.

Graph and gradient modes

There is a tutorial available about run modes. Consider going through that to understand the different modes.

Which mode should I run in?

Graph

In general, for debugging purpose, use graph=False. This allows to step through and have all the numbers. If, on the other hand, the problem is more advanced and seems to behave differently between graph=False and graph=True, consider debugging with this two modes. For all other purposes, the default mode 'auto' is best.

  • Are all operations written in TF?
    • yes: are they many calls with different arguments, e.g. calling the loss value once, then changing norm_range?
      • yes: graph=False may improve the performance or cleaning regularly the cache zfit.run.clear_graph_caches()
      • no: use the default graph='auto'
    • no: are they wrapped within z.py_function?
      • yes: default
      • no: if they can't be wrapped (as the last possibility to consider), set graph=False. Furthermore, since graph mode will break this implementation, consider adding a zfit.run.assert_executing_eagerly().

Gradients

  • Is there any non-TensorFlow functionality that changes between calls to a loss/pdf based on Python/Numpy logic? For example if a shape from SciPy is used, wrapped with z.py_function or even in complete eager mode.
    • no: use the autograd (or maybe an internal minimizer gradient for stability, in some cases this can be beneficial)
    • yes: disable the autograd with zfit.run.set_mode(autograd=False) and maybe use an internal minimizer gradient.

Custom PDFs and Funcs

Sampling

PDF works, sampling odd None error

Make sure that the shape is known. Before returning your value in the overriden function (e.g. _unnormalized_pdf), set the shape of it explicitly to the shape of the input. For example, if the data is unstacked as a, b = z.unstack_x(x), the shape can be set as probs.set_shape(a.shape), where probs is the return value of the overriden function.