Skip to content

Conversation

charleskorn
Copy link
Contributor

@charleskorn charleskorn commented Mar 5, 2025

This PR steps us closer to being in sync with the latest upstream changes.

Diff: prometheus/prometheus@cb3b17a...b74cebf

While the list of changes below is large, most of the changes themselves are quite small.

Behaviour changes

Bug fixes

Performance improvements

Non-production code changes (eg. tests)

Dependency updates

Changes not relevant to Mimir (eg. docs, examples)

prymitive and others added 30 commits January 9, 2025 17:05
This partially reverts ae3d392aa9c3a5c5f92f8116738c5b32c98b09a7.

ae3d392aa9c3a5c5f92f8116738c5b32c98b09a7 added a call to db.mtx.Lock() that lasts for the entire duration of db.reloadBlocks(),
previous db.mtx would be locked only during critical part of db.reloadBlocks().
The motivation was to protect against races:
prometheus/prometheus@9e0351e#r555699794
The 'reloads' being mentioned are (I think) reloadBlocks() calls, rather than db.reload() or other methods.
TestTombstoneCleanRetentionLimitsRace was added to catch this but I wasn't able to ever get any error out of it, even after disabling all calls to db.mtx in reloadBlocks() and CleanTombstones().
To make things more complicated CleanupTombstones() itself calls reloadBlocks(), so it seems that the real issue is that we might have concurrent calls to reloadBlocks().

The problem with this change is that db.reloadBlocks() can take a very long time, that's because it might need to load very large blocks from disk, which is slow.
While db.mtx is locked a large chunk of the db is locked, including queries, since db.mtx read lock is needed for db.Querier() call.
One of the issues this manifests itself as is a gap in all metrics and blocked queries just after a large block compaction happens.
When compaction merges multiple day-or-more blocks into a week-or-more block it create a single very big block.
After that block is written it needs to be loaded and that seems to be taking many seconds (30-45), during which mtx is held and everything is blocked.

Turns out that there is another lock that is more fine grained and aimed at this specific use case:

// cmtx ensures that compactions and deletions don't run simultaneously.
cmtx sync.Mutex

All calls to reloadBlocks() are wrapped inside cmtx lock. The only exception is db.reload() which this change fixes.
We can't add cmtx lock inside reloadBlocks() itself because it's called by a number of functions, some of which are already holding cmtx.

Looking at the code I think it is sufficient to hold cmtx and skip a reloadBlocks() wide mtx call.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
This test ensures that running db.reloadBlocks() and db.CleanTombstones() at the same time doesn't race.
The problem is that CleanTombstones() is a public method while reloadBlocks() is internal.
CleanTombstones() sets db.cmtx lock while reloadBlocks() is not protected by any locks at all, it expects the public method through which it was called to do it.
So having a race between these two is not unexpected and we shouldn't really be testing this.
db.cmtx ensures that no other function can be modifying the list of open blocks and so the scenario tested here cannot happen.
If it would happen it would be only because some other method doesn't aquire db.ctmx lock, something this test cannot detect.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
We don't hold db.mtx lock when trying to read db.blocks here so we need a read lock around this loop.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
Compact() is an uppercase function that deals with locks on its own, so we shouldn't have a lock around it.

Signed-off-by: Lukasz Mierzwa <lukasz@cloudflare.com>
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
Signed-off-by: Ørjan Ommundsen <orjan@sanity.io>
Signed-off-by: Ørjan Ommundsen <orjan@sanity.io>
…gram

If a rate (or increase) is calculated on native histograms, and there
is a counter reset between the 1st and 2nd histogram, we never have to
touch the 1st histogram, so it doesn't even matter if it has an
incompatible bucket layout. So we should not error out in that case.

This simply nulls out the 1st histogram in that case.

Signed-off-by: beorn7 <beorn@grafana.com>
…5853)

Add native histogram support to idelta and irate functions

---------

Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
…ograms (#15895)

PromQL: Updates annotation for bin op between incompatible histograms

---------

Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
* scrape: Add realistic data case for scrape loop append bench.

Signed-off-by: bwplotka <bwplotka@gmail.com>

* Update scrape/scrape_test.go

Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

---------

Signed-off-by: bwplotka <bwplotka@gmail.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.61.0 to 0.62.0.
- [Release notes](https://github.com/prometheus/common/releases)
- [Changelog](https://github.com/prometheus/common/blob/main/RELEASE.md)
- [Commits](prometheus/common@v0.61.0...v0.62.0)

---
updated-dependencies:
- dependency-name: github.com/prometheus/common
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* doc: clarify `rate` values are averaged

Signed-off-by: asymmetric <101816+asymmetric@users.noreply.github.com>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
Signed-off-by: Charles Korn <charles.korn@grafana.com>
Signed-off-by: Charles Korn <charles.korn@grafana.com>
Signed-off-by: Charles Korn <charles.korn@grafana.com>
This makes it consistent with label_join.

Signed-off-by: Joshua Hesketh <josh@nitrotech.org>
"invalid-label-name" is valid in utf-8 mode

Signed-off-by: Joshua Hesketh <josh@nitrotech.org>
promql: fix behaviour of range vector selectors with 0 length range
Fix duplicate output vector if delayed name removal is disabled

This error is emitted in cleanupMetricLabels, but is skipped if
enableDelayedNameRemoval is false.

This makes it consistent with label_replace

Signed-off-by: Joshua Hesketh <josh@nitrotech.org>

---------

Signed-off-by: Joshua Hesketh <josh@nitrotech.org>
Signed-off-by: Björn Rabenstein <github@rabenste.in>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
promql: fix rate calculation with a counter reset after the 1st sample
…unctions (#15964)

PromQL: Ignore histograms in scalar, sort and sort_desc functions

---------

Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
Around Mimir compactions we see logging in ShardedPostings do massive allocations and drive GC up to 50% of CPU.

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
…5962)

Bumps [bufbuild/buf-setup-action](https://github.com/bufbuild/buf-setup-action) from 1.48.0 to 1.50.0.
- [Release notes](https://github.com/bufbuild/buf-setup-action/releases)
- [Commits](bufbuild/buf-setup-action@1115d0a...a47c93e)

---
updated-dependencies:
- dependency-name: bufbuild/buf-setup-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.5.0 to 4.6.0.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](actions/upload-artifact@6f51ac0...65c4c4a)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.28.0 to 3.28.8.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](github/codeql-action@48ab28a...dd74661)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/stale](https://github.com/actions/stale) from 9.0.0 to 9.1.0.
- [Release notes](https://github.com/actions/stale/releases)
- [Changelog](https://github.com/actions/stale/blob/main/CHANGELOG.md)
- [Commits](actions/stale@28ca103...5bef64f)

---
updated-dependencies:
- dependency-name: actions/stale
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…#15958)

Bumps [golangci/golangci-lint-action](https://github.com/golangci/golangci-lint-action) from 6.1.1 to 6.2.0.
- [Release notes](https://github.com/golangci/golangci-lint-action/releases)
- [Commits](golangci/golangci-lint-action@971e284...ec5d184)

---
updated-dependencies:
- dependency-name: golangci/golangci-lint-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [golangci/golangci-lint-action](https://github.com/golangci/golangci-lint-action) from 6.1.1 to 6.2.0.
- [Release notes](https://github.com/golangci/golangci-lint-action/releases)
- [Commits](golangci/golangci-lint-action@971e284...ec5d184)

---
updated-dependencies:
- dependency-name: golangci/golangci-lint-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
dependabot bot and others added 16 commits February 7, 2025 19:14
…15956)

Bumps [actions/setup-go](https://github.com/actions/setup-go) from 5.2.0 to 5.3.0.
- [Release notes](https://github.com/actions/setup-go/releases)
- [Commits](actions/setup-go@3041bf5...f111f33)

---
updated-dependencies:
- dependency-name: actions/setup-go
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ontrib/processor/deltatocumulativeprocessor (#15924)

Bumps [github.com/open-telemetry/opentelemetry-collector-contrib/processor/deltatocumulativeprocessor](https://github.com/open-telemetry/opentelemetry-collector-contrib) from 0.116.0 to 0.118.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-collector-contrib/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/CHANGELOG-API.md)
- [Commits](open-telemetry/opentelemetry-collector-contrib@v0.116.0...v0.118.0)

---
updated-dependencies:
- dependency-name: github.com/open-telemetry/opentelemetry-collector-contrib/processor/deltatocumulativeprocessor
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [github.com/docker/docker](https://github.com/docker/docker) from 27.4.1+incompatible to 27.5.1+incompatible.
- [Release notes](https://github.com/docker/docker/releases)
- [Commits](moby/moby@v27.4.1...v27.5.1)

---
updated-dependencies:
- dependency-name: github.com/docker/docker
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [github.com/hetznercloud/hcloud-go/v2](https://github.com/hetznercloud/hcloud-go) from 2.18.0 to 2.19.0.
- [Release notes](https://github.com/hetznercloud/hcloud-go/releases)
- [Changelog](https://github.com/hetznercloud/hcloud-go/blob/main/CHANGELOG.md)
- [Commits](hetznercloud/hcloud-go@v2.18.0...v2.19.0)

---
updated-dependencies:
- dependency-name: github.com/hetznercloud/hcloud-go/v2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…5922)

Bumps [github.com/prometheus/sigv4](https://github.com/prometheus/sigv4) from 0.1.1 to 0.1.2.
- [Release notes](https://github.com/prometheus/sigv4/releases)
- [Changelog](https://github.com/prometheus/sigv4/blob/main/RELEASE.md)
- [Commits](prometheus/sigv4@v0.1.1...v0.1.2)

---
updated-dependencies:
- dependency-name: github.com/prometheus/sigv4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…15918)

Bumps [github.com/linode/linodego](https://github.com/linode/linodego) from 1.46.0 to 1.47.0.
- [Release notes](https://github.com/linode/linodego/releases)
- [Commits](linode/linodego@v1.46.0...v1.47.0)

---
updated-dependencies:
- dependency-name: github.com/linode/linodego
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [google.golang.org/api](https://github.com/googleapis/google-api-go-client) from 0.218.0 to 0.219.0.
- [Release notes](https://github.com/googleapis/google-api-go-client/releases)
- [Changelog](https://github.com/googleapis/google-api-go-client/blob/main/CHANGES.md)
- [Commits](googleapis/google-api-go-client@v0.218.0...v0.219.0)

---
updated-dependencies:
- dependency-name: google.golang.org/api
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/setup-go](https://github.com/actions/setup-go) from 5.2.0 to 5.3.0.
- [Release notes](https://github.com/actions/setup-go/releases)
- [Commits](actions/setup-go@3041bf5...f111f33)

---
updated-dependencies:
- dependency-name: actions/setup-go
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…0 (#15920)

Bumps [github.com/digitalocean/godo](https://github.com/digitalocean/godo) from 1.132.0 to 1.136.0.
- [Release notes](https://github.com/digitalocean/godo/releases)
- [Changelog](https://github.com/digitalocean/godo/blob/main/CHANGELOG.md)
- [Commits](digitalocean/godo@v1.132.0...v1.136.0)

---
updated-dependencies:
- dependency-name: github.com/digitalocean/godo
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This is also meant to document the actual implementation, but
see #13934 for the current state.

This also improves and streamlines some parts of the documentation
that are not strictly native histogram related, but are colocated with
them. In particular, the section about aggregation operators got
restructured quite a bit, including the removal of a quite verbose
example for `limit_ratio` (which was just too long an this location
and also a bit questionabl in its usefulness).

Signed-off-by: beorn7 <beorn@grafana.com>
docs: Update PromQL documentation to match the native histogram spec
mixin: replace use of 'cluster' with clusterLabel variable in dashboards
Move to 24h-based time formatting and unambiguous date formats. Also add
more details to the default formatting of each tick instead of only showing
e.g. minutes/seconds at rollover ticks for the shorter breakpoints.

Fixes prometheus/prometheus#15913

Signed-off-by: Julius Volz <julius.volz@gmail.com>
Also:
* split benchmark functions to make sure no one compares across parsers.
* testdata file have meaningful names reflecting the type representation
* promtestdata.txt now has all types, taken directly from long running Prometheus (https://demo.do.prometheus.io/)

Needed for prometheus/prometheus#15731

Signed-off-by: bwplotka <bwplotka@gmail.com>
@CLAassistant
Copy link

CLAassistant commented Mar 5, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
5 out of 12 committers have signed the CLA.

✅ beorn7
✅ jhesketh
✅ dimitarvdimitrov
✅ charleskorn
✅ bboreham
❌ prymitive
❌ obliadp
❌ NeerajGartia21
❌ bwplotka
❌ asymmetric
❌ juliusv
❌ metalmatze
You have signed the CLA already but the status is still pending? Let us recheck it.

@charleskorn charleskorn marked this pull request as ready for review March 6, 2025 00:02
@charleskorn charleskorn merged commit 9ba23e2 into main Mar 7, 2025
7 of 8 checks passed
@charleskorn charleskorn deleted the charleskorn/update-prometheus branch March 7, 2025 03:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.