Skip to content

Conversation

charleskorn
Copy link
Contributor

@charleskorn charleskorn commented May 26, 2025

Diff: prometheus/prometheus@f0471ff...14fc57e

New features

Performance improvements

Bug fixes

Refactoring and cleanup

Non-production changes (eg. tests, build tooling)

Changes not relevant to Mimir

KofClubs and others added 30 commits April 24, 2025 16:29
Signed-off-by: Zhang Zhanpeng <zhangzhanpeng.zzp@alibaba-inc.com>
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
The position range of nested aggregate expression was wrong, for the
expression "(sum(foo))" the position of "sum(foo)" should be 1-9, but
the parser could not decide the end of the expression on pos 9, instead
it read ahead to pos 10 and then emitted the aggregate. But we only
kept the last closing position (10) and wrote that into the aggregate.

The reason for this is that the parser cannot know from "(sum(foo)" alone
if the aggregate is finished. It could be finished as in "(sum(foo))" but
equally it could continue with group modifier as "(sum(foo) by (bar))".

The fix is to track ")" tokens in a stack in addition to the lastClosing
position, which is overloaded with other things like offset number tracking.


Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
This is failing after 26088d01b9518b3315805186dcee534b986bf79f and it seems that currently tested position is incorrect, fix it

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
This is a fix for documentation update done in prometheus/prometheus#16066, setting correct name for a configuration value.

Signed-off-by: Martin Danko <46035688+zepellin@users.noreply.github.com>
Copy the benchmark for native histograms with exponential buckets and
adopt to native histograms with custom buckets.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
The custom values are the "le" bucket boundaries of native histograms
with custom buckets. They are never modified. It is ok to not copy them
when iterating a chunk, just reference them.

If we will ever have a function that modifies the custom values, like
'trim' for example. That function will have to make a copy on write.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
promql: function selector sometimes misses a sample on dense samples
Signed-off-by: hardlydearly <799511800@qq.com>
Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
…eries (#16404)

This fixes the regression introduced in prometheus/prometheus#15971 while preserving the performance improvements.

Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
Manager.ApplyConfig() uses multiple locks:
- Provider.mu
- Manager.targetsMtx

Manager.cleaner() uses the same locks but in the opposite order:
- First it locks Manager.targetsMtx
- The it locks Provider.mu

I've seen a few strange cases of Prometheus hanging up on shutdown and never compliting that shutdown.
From a few traces I was given it appears that while Prometheus is still running only discovery.Manager and notifier.Manager are running running.
From that trace it also seems like they are stuck on a lock from two functions:
- cleaner waits on a RLock()
- ApplyConfig waits on a Lock()

I cannot reproduce it but I suspect this is a race between locks. Imagine this scenario:
- Manager.ApplyConfig() is called
- Manager.ApplyConfig locks Provider.mu.Lock()
- at the same time cleaner() is called on the same Provider instance and it calls Manager.targetsMtx.Lock()
- Manager.ApplyConfig() now calls Manager.targetsMtx.Lock() but that lock is already held by cleaner() function so ApplyConfig() hangs there
- at the same time cleaner() now wants to lock Provider.mu.Rlock() but that lock is already held by Manager.ApplyConfig()
- we end up with both functions locking each other out without any way to break that lock

Re-order lock calls to try to avoid this scenario.
I tried writing a test case for it but couldn't hit this issue.

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
Moving the debouncing of the search field to the parent component and then
memoizing the ScrapePoolsList component prevents a lot of superfluous
re-renders of the entire scrape pools list that previously got triggered
immediately when you typed in the search box or even just collapsed a pool.
(While the computation of what data to show was already memoized in the
ScrapePoolList component, the component itself still had to re-render a lot
with the same data.)

Discovered this problem + verified fix using react-scan.

Signed-off-by: Julius Volz <julius.volz@gmail.com>
When opening the status pages menu while already viewing one of the
status pages, the whole page would be re-rendered because the menu target's
default action of following the current page's URL was not prevented. Also,
we don't need to use a NavLink component for the menu target when we are
not viewing a status page, because then the component won't need to be
highlighted anyways.

Discovered + fixed with the help of react-scan.

Signed-off-by: Julius Volz <julius.volz@gmail.com>
…exists

Fixes: prometheus/prometheus#16334
Related to:
- prometheus/prometheus#15238
- prometheus/prometheus#16052

Currently, when the GOGC environment variable is set -- and no `runtime`
block is set in the Prometheus config file -- it is ignored and the
default value of 75% is always used.

However, if there is an empty runtime block (e.g. `runtime: {}`), _then_
the GOGC environment variable is checked.

This PR changes this behavior to consistently check and use the GOGC
environment variable when it is set (unless the `gogc` field is set in
the `runtime` block of the loaded config file, in which case it still
gives that precedence).

Co-authored-by: Adam Rambo <arambo@protonmail.com>
Signed-off-by: Will Hegedus <whegedus@akamai.com>
Signed-off-by: Will Hegedus <whegedus@akamai.com>
The new docs site will have syntax highlighting, so this adds language tags
to code boxes that are currently missing them. I didn't add `promql` as a
language yet since the highlighter doesn't support it yet, plus a lot of
the PromQL codeboxes in our docs aren't strictly valid PromQL, they are
more like multiple expressions listed in the same code box on multiple
lines. So I'm leaving that for sometime later.

In the HTTP API page, I moved the curl examples from the JSON codeboxes to
their own ones above the JSON output. I considered putting an "Output:"
text between the curl + JSON output, but I think the way it currently looks
without it is probably fine.

I also fixed a number of headings which were at the wrong level relative to
their nesting in the document.

I also removed `go` as a language from the Go template language examples,
because the Go template language isn't Go at all.

I also adjusted the indent on one codebox to be more reasonable (2 spaces
instead of 8).

And then finally, my editor made a bunch of whitespace changes
automatically, like removing trailing spaces.

Signed-off-by: Julius Volz <julius.volz@gmail.com>

Signed-off-by: Julius Volz <julius.volz@gmail.com>
fix(config): respect GOGC environment variable if no "runtime" block exists
Make sure the order of locks is always the same in all functions. In ApplyConfig() we have m.targetsMtx.Lock() after provider is locked, so replicate the same in allGroups().

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
Use `common-` prefix for `make proto` so downstream projects like
client_golang can implement their own `make proto`.

Signed-off-by: SuperQ <superq@gmail.com>
With golangci-lint v2, it now has "formatters" that can be configured.
Add `golangci-lint fmt` to the `make format` in Makefile.common.
* Enable goimports formatter.

Signed-off-by: SuperQ <superq@gmail.com>
…gets` option

Fixes prometheus/prometheus#16586

Signed-off-by: Julius Volz <julius.volz@gmail.com>
SD UI: Better total target count display when using `keep_dropped_targets` option
Addresses part of prometheus/prometheus#16515

For now, I'm adding very similar filtering to the /rules page as we have on
the /alerts page, with the difference being:

* The state filter filters by rule health (ok/warn/unknown) instead of
  alert state (firing/pending/inactive)
* We don't collect & show detailed stats on the different state counts as
  we do on the /alerts page

There is a lot of copied / very similar code between those two pages (and
also some others) around filtering and pagination, so maybe there is an
opportunity for more code sharing in the future here.

Signed-off-by: Julius Volz <julius.volz@gmail.com>
Signed-off-by: marcoderama <marcoderamagit@gmail.com>
Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
* feature: type-and-unit-labels (extended MetricIdentity)

Experimental implementation of prometheus/proposals#39

Previous (unmerged) experiments:
* prometheus/prometheus@main...dashpole:prometheus:type_and_unit_labels
* prometheus/prometheus#16025

Signed-off-by: bwplotka <bwplotka@gmail.com>

feature: type-and-unit-labels (extended MetricIdentity)

Experimental implementation of prometheus/proposals#39

Previous (unmerged) experiments:
* prometheus/prometheus@main...dashpole:prometheus:type_and_unit_labels
* prometheus/prometheus#16025

Signed-off-by: bwplotka <bwplotka@gmail.com>

* Fix compilation errors

Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>

Lint

Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>

Revert change made to protobuf 'Accept' header

Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>

Fix compilation errors for 'dedupelabels' tag

Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>

* Rectored into schema.Metadata

Signed-off-by: bwplotka <bwplotka@gmail.com>

* texparse: Added tests for PromParse

Signed-off-by: bwplotka <bwplotka@gmail.com>

* add OM tests.

Signed-off-by: bwplotka <bwplotka@gmail.com>

* add proto tests

Signed-off-by: bwplotka <bwplotka@gmail.com>

* Addressed comments.

Signed-off-by: bwplotka <bwplotka@gmail.com>

* add schema label tests.

Signed-off-by: bwplotka <bwplotka@gmail.com>

* addressed comments.

Signed-off-by: bwplotka <bwplotka@gmail.com>

* fix tests.

Signed-off-by: bwplotka <bwplotka@gmail.com>

* add promql tests.

Signed-off-by: bwplotka <bwplotka@gmail.com>

* lint

Signed-off-by: bwplotka <bwplotka@gmail.com>

* Addressed comments.

Signed-off-by: bwplotka <bwplotka@gmail.com>

---------

Signed-off-by: bwplotka <bwplotka@gmail.com>
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Co-authored-by: Arthur Silva Sens <arthursens2005@gmail.com>
machine424 and others added 14 commits May 19, 2025 11:09
discovery: Try fixing potential deadlocks in discovery
Closes #15406

Signed-off-by: jub0bs <jcretel-infosec+github@protonmail.com>
docs: Fix `metric_name_escaping_scheme` config parameter
util/httputil: Always add Vary header in SetCORS (fixes #15406)
Add health & text filtering on the /rules page
perf(chunkenc): intern custom values for native histograms
…iteration performance (#16365)

* refactor(endpointslice): use cache.Indexer to index endpointslices by LabelServiceName so not have to iterate over all endpoint objects.

Signed-off-by: Ryan Wu <rongjun0821@gmail.com>

* check the type and error early and add 'TestEndpointSliceDiscoveryWithUnrelatedServiceUpdate' unit test to give a regression test

Signed-off-by: Ryan Wu <rongjun0821@gmail.com>

* make service indexer namespaced

Signed-off-by: Ryan Wu <rongjun0821@gmail.com>

* remove unneeded test func

Signed-off-by: Ryan Wu <rongjun0821@gmail.com>

* Apply suggestions from code review

Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
Signed-off-by: Ryan Wu <rongjun0821@gmail.com>

---------

Signed-off-by: Ryan Wu <rongjun0821@gmail.com>
Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
…it for chunks writing (#15365)

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
Signed-off-by: Ayoub Mrini <ayoubmrini424@gmail.com>
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
…sed on learnings from GoGC addition

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
Add a test for aggregation wrapped in ParenExpr
…signed managed identity (#16421)

* squash (#1)

* remote-write: allow empty azure client_id to support system assigned managed identity

* add blank line for tests

* remote-write: allow empty azure client_id to support system assigned managed identity

Signed-off-by: Kaveesh Dubey <kadubey@microsoft.com>

* add blank line for tests

Signed-off-by: Kaveesh Dubey <kadubey@microsoft.com>

---------

Signed-off-by: Kaveesh Dubey <kadubey@microsoft.com>

* treat empty client_id as system-assigned identity; this is a valid case

Signed-off-by: Kaveesh Dubey <kadubey@microsoft.com>

* rename file 

Signed-off-by: bragi92 <kadubey@microsoft.com>

---------

Signed-off-by: Kaveesh Dubey <kadubey@microsoft.com>
Signed-off-by: bragi92 <kadubey@microsoft.com>
@CLAassistant
Copy link

CLAassistant commented May 26, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
4 out of 18 committers have signed the CLA.

✅ roidelapluie
✅ krajorama
✅ jkroepke
✅ charleskorn
❌ KofClubs
❌ prymitive
❌ zepellin
❌ NeerajGartia21
❌ juliusv
❌ machine424
❌ SuperQ
❌ hardlydearly
❌ bwplotka
❌ jub0bs
❌ bragi92
❌ wbh1
❌ marcoderama
❌ ryanwuer
You have signed the CLA already but the status is still pending? Let us recheck it.

@charleskorn charleskorn marked this pull request as ready for review May 26, 2025 01:23
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the changes affecting Mimir and I can't see anything wrong. LGTM

@charleskorn charleskorn merged commit a5ce434 into main May 26, 2025
12 of 13 checks passed
@charleskorn charleskorn deleted the charleskorn/upgrade-prometheus branch May 26, 2025 23:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.