remote_write client metrics: a partially failed batch is counted as all samples failed

### What did you do?

1. Using a remote_write pipeline to send samples and exemplars to a remote backend
2. A batch of samples and exemplars is sent, for example 2000 samples and 1 exemplar. 
3. The one exemplar is rejected, for example, [due to this issue](https://github.com/prometheus/prometheus/issues/13577).
4. remote_write receives an HTTP status 400, logs a "non-recoverable" error log message and drops the batch
5. The [failure metrics are incremented](https://github.com/prometheus/prometheus/blob/5a218708f1f755a67b3d9b68549ccbbb6463d850/storage/remote/queue_manager.go#L1541-L1543) with number of samples/exemplars/histograms in the entire batch. Even if all samples were ingested, but only one exemplar failed.

### What did you expect to see?

1. The [failure metrics](https://github.com/prometheus/prometheus/blob/5a218708f1f755a67b3d9b68549ccbbb6463d850/storage/remote/queue_manager.go#L1541-L1543) would ideally reflect how many data points have failed in the batch. Otherwise we don't have accurate success rate metrics.
2. Exemplars / native histograms are experimental. It may be desirable to retry the batch without exemplars and native histograms to not drop the samples?

### What did you see instead? Under which circumstances?

The [failure metrics](https://github.com/prometheus/prometheus/blob/5a218708f1f755a67b3d9b68549ccbbb6463d850/storage/remote/queue_manager.go#L1541-L1543) counted the entire batch as failed, even when the samples were ingested correctly.

### System information

_No response_

### Prometheus version

_No response_

### Prometheus configuration file

_No response_

### Alertmanager version

_No response_

### Alertmanager configuration file

_No response_

### Logs

```text
ts=2024-06-19T17:38:16.946175Z level=error msg="non-recoverable error" ... url=(...) count=5 exemplarCount=1 err="server returned HTTP status 400 Bad Request: send data to ingesters: failed pushing to ingester ...: user=...: err: out of order exemplar. timestamp=2024-06-19T17:35:15Z, series=a_test_total{...}, exemplar={...}"
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

remote_write client metrics: a partially failed batch is counted as all samples failed #14323

What did you do?

What did you expect to see?

What did you see instead? Under which circumstances?

System information

Prometheus version

Prometheus configuration file

Alertmanager version

Alertmanager configuration file

Logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

remote_write client metrics: a partially failed batch is counted as all samples failed #14323

Description

What did you do?

What did you expect to see?

What did you see instead? Under which circumstances?

System information

Prometheus version

Prometheus configuration file

Alertmanager version

Alertmanager configuration file

Logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions