Skip to content

Conversation

replay
Copy link
Contributor

@replay replay commented Dec 20, 2021

Our load tests have shown that there is a latency spike in the
remote write handler whenever the head chunks need to be written,
because chunkDiskMapper.WriteChunk() blocks until the chunks are written
to disk.

This adds a queue to the chunk disk mapper which makes the WriteChunk()
method non-blocking unless the queue is full. Reads can still be served
from the queue.

Our load tests have shown that there is a latency spike in the
remote write handler whenever the head chunks need to be written,
because chunkDiskMapper.WriteChunk() blocks until the chunks are written
to disk.

This adds a queue to the chunk disk mapper which makes the WriteChunk()
method non-blocking unless the queue is full. Reads can still be served
from the queue.

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
@replay replay marked this pull request as ready for review December 20, 2021 20:24
@replay replay requested a review from codesome as a code owner December 20, 2021 20:24
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
@replay replay force-pushed the chunk_write_queue branch from fa52843 to 3e9160c Compare January 4, 2022 14:17
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
@replay replay force-pushed the chunk_write_queue branch from 3f1544c to 210f9fa Compare January 4, 2022 17:11
replay added 3 commits January 4, 2022 17:21
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
replay added 4 commits January 5, 2022 12:49
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Copy link
Member

@codesome codesome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am currently reviewing this, but there is one common comment for entire change: code comments should start with capital letters (except when referencing a variable that starts with lower case), and end with a period ..

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
@replay
Copy link
Contributor Author

replay commented Jan 5, 2022

code comments should start with capital letters (except when referencing a variable that starts with lower case), and end with a period ..

Thanks, I fixed it.

Copy link
Member

@codesome codesome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good with some nits! Let's run prombench till tomorrow on this while I finish reviewing some unit test changes.

@codesome
Copy link
Member

codesome commented Jan 5, 2022

/prombench main

@prombot
Copy link
Contributor

prombot commented Jan 5, 2022

⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️

Compared versions: PR-10051 and main

After successful deployment, the benchmarking metrics can be viewed at:

Other Commands:
To stop benchmark: /prombench cancel
To restart benchmark: /prombench restart main

replay added 2 commits January 5, 2022 22:18
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
@replay replay force-pushed the chunk_write_queue branch from cbefe75 to 544c2ca Compare January 5, 2022 22:24
@codesome
Copy link
Member

codesome commented Jan 6, 2022

/prombench cancel

All looks good

@prombot
Copy link
Contributor

prombot commented Jan 6, 2022

Benchmark cancel is in progress.

Copy link
Member

@codesome codesome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments about changed test semantics, should be good to merge after fixing those

@@ -297,42 +326,39 @@ func TestChunkDiskMapper_Truncate_PreservesFileSequence(t *testing.T) {
// though files 4 and 6 are empty.
file2Maxt := hrw.mmappedChunkFiles[2].maxt
require.NoError(t, hrw.Truncate(file2Maxt+1))
// As 6 was empty, it should not create another file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to test this. We should add a chunk and see that 3,4,5,6 is present and no 7. And then truncate and add a chunk to create the file 7 (we need this truncate to make sure again that it preserves sequence of files).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this what you had in mind? 04e94b9

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly, we should be truncating file2Maxt+1 like before. Because the bug was the it was deleting empty files from later files.

replay and others added 3 commits January 6, 2022 12:58
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
@replay replay force-pushed the chunk_write_queue branch from f3802ad to 07ce3f9 Compare January 6, 2022 13:23
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
@replay replay force-pushed the chunk_write_queue branch from 07ce3f9 to 4d3d594 Compare January 6, 2022 13:24
replay added 3 commits January 6, 2022 13:32
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
@replay
Copy link
Contributor Author

replay commented Jan 6, 2022

I also replaced most occurrences of callbackWg in tsdb/chunks/chunk_write_queue_test.go with awaitCb (6bd1f8e). But there was one test where this wasn't possible because it's critical that we can wait for the callback after the call to addChunk, so I left the callbackWg there: TestChunkWriteQueue_WrappingAroundSizeLimit

@replay replay force-pushed the chunk_write_queue branch from 69db63b to 6bd1f8e Compare January 10, 2022 13:21
Copy link
Member

@codesome codesome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@codesome codesome enabled auto-merge (squash) January 10, 2022 13:36
@codesome codesome merged commit 0df3489 into prometheus:main Jan 10, 2022
@replay replay deleted the chunk_write_queue branch January 10, 2022 13:45
suyashtava pushed a commit to suyashtava/prometheus that referenced this pull request Jan 18, 2022
* Write chunks via queue, predicting the refs

Our load tests have shown that there is a latency spike in the
remote write handler whenever the head chunks need to be written,
because chunkDiskMapper.WriteChunk() blocks until the chunks are written
to disk.

This adds a queue to the chunk disk mapper which makes the WriteChunk()
method non-blocking unless the queue is full. Reads can still be served
from the queue.

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* address PR feeddback

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* initialize metrics without .Add(0)

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* change isRunningMtx to normal lock

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* do not re-initialize chunkrefmap

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* update metric outside of lock scope

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* add benchmark for adding job to chunk write queue

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* remove unnecessary "success" var

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* gofumpt -extra

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* avoid WithLabelValues call in addJob

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* format comments

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* addressing PR feedback

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* rename cutExpectRef to cutAndExpectRef

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* use head.Init() instead of .initTime()

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* address PR feedback

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* PR feedback

Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* update test according to PR feedback

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* replace callbackWg -> awaitCb

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* better test of truncation with empty files

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* replace callbackWg -> awaitCb

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Signed-off-by: suyashtava <suyashtava@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants