Skip to content

Conversation

bwplotka
Copy link

@bwplotka bwplotka commented Dec 7, 2023

Changes:

  • Changed textparse Parser interface for consistency and robustness.
  • Changed CT interface to be more explicit and handle validation.
  • Simplified test, change scrapeManager to allow testability.
  • Added TODOs - we need appender tests and check how hard is to get agent to use it too (and if we need that)

Changes:

* Changed textparse Parser interface for consistency and robustness.
* Changed CT interface to be more explicit and handle validation.
* Simplified test, change scrapeManager to allow testability.
* Added TODOs.


Signed-off-by: bwplotka <bwplotka@gmail.com>
Signed-off-by: bwplotka <bwplotka@gmail.com>
Copy link
Owner

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising this and challenging many decisions I've made!

I liked how you changed many places, e.g. feature-flag and method names, to make a point that we're ingesting zeros.

One thing that I couldn't understand was the metadata test, what are we trying to accomplish over there? 🤔

Comment on lines +248 to 252
// CreatedTimestamp returns nil as it's not implemented yet.
// TODO(bwplotka): https://github.com/prometheus/prometheus/issues/12980
func (p *PromParser) CreatedTimestamp() *int64 {
return nil
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the impression we didn't want to implement CT for Prometheus text. The issue mentioned here also doesn't mention prometheus text parsing 🤔

func (p *ProtobufParser) CreatedTimestamp(ct *types.Timestamp) bool {
var foundCT *types.Timestamp
// CreatedTimestamp returns CT or nil if CT is not present or
// invalid (as timestamp e.g. negative value) on counters, summaries or histograms.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// invalid (as timestamp e.g. negative value) on counters, summaries or histograms.
// invalid (e.g. a timestamp with negative value) on counters, summaries or histograms.

Not sure if that's what you meant here, but it reads weird 🤔

Comment on lines +377 to +379
if err != nil {
// Errors means ct == nil or invalid timestamp, which we silently ignore.
return nil
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm, ideally we would want to at least log client-side errors. Would it make sense to return an error as well?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also change the proto definition to return int64 like callum suggested and not worry with conversion errors

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return where? So by making int64 we don't magically remove error surface. We simply not check things. So why not doing the same here and simply not check things? (:

Also how do you propose changing field to int64 without braking changes? Do you propose adding another field (as we should in proto)? What would you do with other timestamp fields that have google Timestamp type? (:

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that adding a new message is the right way of doing backward/forward compatibility, but since the proto change was released only in Prometheus 2.48 and seeing that proto changes weren't announced in the changelog, I don't expect that replacing the field will break anyone (even though this is a breaking change)

tsdb/agent/db.go Outdated
func (a *appender) AppendCreatedTimestamp(ref storage.SeriesRef, l labels.Labels, t int64) (storage.SeriesRef, error) {
func (a *appender) AppendCTZeroSample(ref storage.SeriesRef, l labels.Labels, t int64, ct int64) (storage.SeriesRef, error) {
// TODO(bwplotka): Implement
panic("to implement")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

panic? 😱

Don't Prometheus server and agent share the same scrape codebase? Won't this cause agents to panic on every scrape? 🤔

Copy link
Author

@bwplotka bwplotka Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will, but only for this PR. I planned to do extra work in this PR, but now I think we should postpone, so fixed in next commit to this PR (pushing in a 10m)

// Disable regular scrapes.
ScrapeInterval: model.Duration(9999 * time.Minute),
ScrapeTimeout: model.Duration(5 * time.Second),
// Ensure proto is chosen.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point adding a comment here! Maybe we could extend by saying that we're choosing proto because at the time it is the only protocol that parses CT? We can remove this option once we add more protocols

ScrapeProtocols: []config.ScrapeProtocol{config.PrometheusProto},
},
ScrapeConfigs: []*config.ScrapeConfig{{JobName: "test"}},
}))

// Start fake HTTP target to scrape returning a single metric.
once := sync.Once{}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL about once

scrape/scrape.go Outdated
Comment on lines 1572 to 1577
if ctMs := p.CreatedTimestamp(); sl.enableCTZeroIngestion && ctMs != nil {
ref, err = app.AppendCTZeroSample(ref, lset, t, *ctMs)
if err != nil && !errors.Is(err, storage.ErrOutOfOrderCT) { // OOO is a common case, ignoring completely for now.
// CT is an experimental feature. For now, we don't need to fail the
// scrape on errors updating the created timestamp, log debug.
level.Debug(sl.l).Log("msg", "Error when updating metadata in scrape loop", "series", string(met), "ct", *ctMs, "t", t, "err", err)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks so much simpler now!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the log message looks like a copy-paste mistake? We're not updating metadata yet, are we?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whooops, great catch!

Signed-off-by: bwplotka <bwplotka@gmail.com>
@ArthurSens ArthurSens merged commit 385c860 into ArthurSens:scrape-synthetic-zero Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants