Skip to content

Implement custom RangeQuery interface to support Elasticsearch v9 #7358

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jul 25, 2025

Conversation

shuraih775
Copy link
Contributor

@shuraih775 shuraih775 commented Jul 21, 2025

Which problem is this PR solving?

  • Resolves [Feature]: Support Elasticsearch v9 #7274
  • Resolves compatibility issues with Elasticsearch v9 caused by the removal of deprecated fields in the RangeQuery structure from the olivere/elastic client.
  • Enables Jaeger to run with Elasticsearch v9+ by replacing the use of olivere/elastic/v7.RangeQuery with a custom implementation.

Description of the changes

  • Introduced a custom RangeQuery interface under internal/storage/elasticsearch/query.
  • Removed usage of deprecated fields such as include_lower, include_upper, from, and to.
  • Replaced all internal references to the legacy RangeQuery with the new implementation.
  • Ensured the new query format aligns with the expected request structure in Elasticsearch v9's _search API.

How was this change tested?

  • Added and ran unit tests for the custom RangeQuery implementation.
  • Successfully built and ran Jaeger components with Elasticsearch v9 Docker image (docker.elastic.co/elasticsearch/elasticsearch:9.0.3).

Checklist

Signed-off-by: Mohammed Shuraih <shuraih15@gmail.com>
Copy link

codecov bot commented Jul 21, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.48%. Comparing base (2319df2) to head (a08138a).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7358      +/-   ##
==========================================
+ Coverage   96.46%   96.48%   +0.01%     
==========================================
  Files         376      377       +1     
  Lines       22957    22989      +32     
==========================================
+ Hits        22146    22181      +35     
+ Misses        613      611       -2     
+ Partials      198      197       -1     
Flag Coverage Δ
badger_v1 9.11% <0.00%> (-0.03%) ⬇️
badger_v2 1.72% <0.00%> (-0.01%) ⬇️
cassandra-4.x-v1-manual 11.83% <0.00%> (-0.04%) ⬇️
cassandra-4.x-v2-auto 1.71% <0.00%> (-0.01%) ⬇️
cassandra-4.x-v2-manual 1.71% <0.00%> (-0.01%) ⬇️
cassandra-5.x-v1-manual 11.83% <0.00%> (-0.04%) ⬇️
cassandra-5.x-v2-auto 1.71% <0.00%> (-0.01%) ⬇️
cassandra-5.x-v2-manual 1.71% <0.00%> (-0.01%) ⬇️
elasticsearch-6.x-v1 16.80% <58.33%> (+0.10%) ⬆️
elasticsearch-7.x-v1 16.84% <58.33%> (+0.10%) ⬆️
elasticsearch-8.x-v1 16.98% <58.33%> (+0.10%) ⬆️
elasticsearch-8.x-v2 1.72% <0.00%> (-0.01%) ⬇️
elasticsearch-9.x-v2 1.72% <0.00%> (?)
grpc_v1 10.35% <0.00%> (-0.03%) ⬇️
grpc_v2 1.72% <0.00%> (-0.01%) ⬇️
kafka-3.x-v1 9.27% <0.00%> (-0.03%) ⬇️
kafka-3.x-v2 1.72% <0.00%> (-0.01%) ⬇️
memory_v2 1.72% <0.00%> (-0.01%) ⬇️
opensearch-1.x-v1 16.89% <58.33%> (+0.10%) ⬆️
opensearch-2.x-v1 16.89% <58.33%> (+0.10%) ⬆️
opensearch-2.x-v2 1.72% <0.00%> (-0.01%) ⬇️
opensearch-3.x-v2 1.72% <0.00%> (-0.01%) ⬇️
query 1.72% <0.00%> (-0.01%) ⬇️
tailsampling-processor 0.47% <0.00%> (-0.01%) ⬇️
unittests 95.46% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

shuraih775 and others added 2 commits July 21, 2025 23:42
…anstore/reader_test.go

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Signed-off-by: shuraih775 <74284337+shuraih775@users.noreply.github.com>
Signed-off-by: Mohammed Shuraih <shuraih15@gmail.com>
Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you think this solves the v9 support problem then please add v9 into integration tests, to validate this change

Signed-off-by: Mohammed Shuraih <shuraih15@gmail.com>
@@ -1134,8 +1132,8 @@ func TestSpanReader_buildStartTimeQuery(t *testing.T) {
expected := make(map[string]any)
json.Unmarshal([]byte(expectedStr), &expected)
// We need to do this because we cannot process a json into uint64.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we know what this comment refers to? "to do this" - do what?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"To do this" refers to replacing the float64 values that json.Unmarshal produces with uint64, because the actual query output uses uint64 (e.g., microseconds for durations). This ensures the test compares values with matching types.

json.Unmarshal parses all numeric values as float64 by default when decoding into interface{}.

We convert to uint64 because the function under test internally uses model.TimeAsEpochMicroseconds, which returns a uint64:
Source — TimeAsEpochMicroseconds

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we unmarshal a string that contains two "expected" numbers but then override those numbers with other values. Why not just construct the expected map directly? What's the value of unmarshalling the string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original test used JSON unmarshalling to handle nested fields like include_lower/include_upper. In the new query, that's no longer needed, but I kept the style consistent with the existing tests. Happy to simplify it if preferred.

Comment on lines 13 to 22
name string
gt any
gte any
lt any
lte any
timeZone string
boost *float64
queryName string
format string
relation string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: why store these values as fields instead of adding them directly to the map right away? It would reduce the amount of code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This RangeQueryimplementation follows the elastic.Query interface format:

type Query interface {
    // Source returns the JSON-serializable query request.
    Source() (interface{}, error)
}

This design is intentional and critical for compatibility with the existing Jaeger codebase. Many parts of Jaeger — such as query builders and Elasticsearch readers — expect all queries to satisfy the elastic.Query interface.

If we were to remove the Source() method or replace this with a direct map[string]any-style implementation, it would break the entire pipeline wherever elastic.Query is used. Calls like query.Source() would fail at compile time.

Moreover, this struct-based implementation is modeled after the previously used olivere/elastic/v7 library. By keeping the method signatures (.Gte(), .Lte(), etc.) and Source() behavior consistent, we ensure minimal refactoring and preserve the existing query composition logic.

So while a direct map-based query might look simpler in isolation, it’s not viable in this context. Our approach ensures:

  • Full compatibility with the elastic.Query interface
  • Minimal changes to existing logic

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not suggesting changing the interface, but changing implementation where

func (q *RangeQuery) Lte(val any) *RangeQuery {
	q.lte = val
	return q
}

becomes

func (q *RangeQuery) Lte(val any) *RangeQuery {
	q.params["lte"] = val
	return q
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification! You're right—refactoring to directly update params in the setters would simplify Source() and reduce redundancy. I’ll go ahead and refactor it that way.

Signed-off-by: Mohammed Shuraih <shuraih15@gmail.com>
@shuraih775 shuraih775 requested a review from yurishkuro July 24, 2025 15:40
@yurishkuro yurishkuro added the changelog:new-feature Change that should be called out as new feature in CHANGELOG label Jul 25, 2025
@yurishkuro yurishkuro enabled auto-merge July 25, 2025 01:23
@yurishkuro yurishkuro added this pull request to the merge queue Jul 25, 2025
auto-merge was automatically disabled July 25, 2025 01:37

Pull Request is not mergeable

@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 25, 2025
@yurishkuro yurishkuro added this pull request to the merge queue Jul 25, 2025
Merged via the queue into jaegertracing:main with commit f0417ca Jul 25, 2025
64 of 65 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/storage changelog:new-feature Change that should be called out as new feature in CHANGELOG storage/elasticsearch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: Support Elasticsearch v9
2 participants