feat: support prompt cache for anthropic models in sap ai core provider #4683

GTxx · 2025-07-06T16:52:26Z

Related Issue

Issue:: NA

Description

What problem does this PR solve?
A: This PR is to support prompt cache for anthropic models in SAP AI Core provider. Specifically, sonnet-3.7, sonnet-4, opus-4.
Why were these changes introduced and what purpose do they serve?
A: prompt cache is a significant way to save the cost.
For larger changes, provide context about your approach and reasoning
A: I mostly referred to bedrock provider and anthropic provider to build the prompt cache function. SAP AI Core provides the anthropic API in bedrock's converse API format.

Test Procedure

I tested the same prompt with the same code base for cases with cache and without cache. These are the token usages:

Without prompt cache:

inputToken = 15601
outputTokens = 185
totalTokens = 15786

with prompt cache applied to system prompt:

inputToken = 3753
outputTokens = 454
totalTokens = 16190

with prompt cache applied to both system prompt and user-assistant messages:

inputToken = 4
outputTokens = 213
totalTokens = 16025

As it illustrates, with only the system prompt cache, 10,000 tokens are cached; and with both system prompt cached and user-assistant messages cached, the input token count drops to single digit, showing more tokens are cached.

Type of Change

🐛 Bug fix (non-breaking change which fixes an issue)
[x ] ✨ New feature (non-breaking change which adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
♻️ Refactor Changes
💅 Cosmetic Changes
📚 Documentation update
🏃 Workflow Changes

Pre-flight Checklist

Changes are limited to a single feature, bugfix or chore (split larger changes into separate PRs)
Tests are passing (npm test) and code is formatted and linted (npm run format && npm run lint)
I have created a changeset using npm run changeset (required for user-facing changes)
I have reviewed contributor guidelines

Screenshots

Additional Notes

Caveat: According to my testing with my own SAP AI Core account with prompt cache enabled, SAP AI Core doesn't return cache_read and cache_write.
User will see input tokens is very minimum, just it doesn't reflect the real token usage.

prompt token = input token + cache read + cache write

because most prompt tokens are either cache_read or cache_write, leave input token to a small number.

Important

Adds prompt caching for SAP AI Core models sonnet-3.7, sonnet-4, and opus-4 to reduce costs.

Behavior:
- Adds prompt caching for sonnet-3.7, sonnet-4, and opus-4 models in sapaicore.ts.
- Caching is enabled by default to reduce costs.
- Updates sapAiCoreModels in api.ts to reflect caching support and pricing.
Implementation:
- Introduces applyCacheControlToMessages() in sapaicore.ts to manage cache points in messages.
- Modifies createMessage() in sapaicore.ts to include cache points in payloads.
Testing:
- Demonstrates token usage reduction with caching enabled, showing significant savings in input tokens.

^{This description was created by}^{for 855025c. You can customize this summary. It will automatically update as commits are pushed.}

…models

changeset-bot · 2025-07-10T01:31:06Z

🦋 Changeset detected

Latest commit: 4dfdfbb

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package

Name	Type
claude-dev	Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

GTxx · 2025-07-23T10:39:19Z

hi， @saoudrizwan @ocasta181 @NightTrek.

I am using SAP AI Core + sonnet daily, the cache is critical for improving API response time and cost saving.
Can you help review the PR, and provide the review feedback.

Thanks

saoudrizwan · 2025-07-26T14:42:59Z

Calling contributors to sapaicore.ts @tjandy98 @lizzzcai @schardosin

If you could please help test and review this PR, that would be greatly appreciated! 🙏

lizzzcai · 2025-08-06T10:13:46Z

Hi @tjandy98 , can you help to test if this will work locally, thanks.

tjandy98 · 2025-08-09T05:04:17Z

src/api/providers/sapaicore.ts

@@ -361,11 +391,15 @@ export class SapAiCoreHandler implements ApiHandler {
 							if (data.metadata?.usage) {
 								const inputTokens = data.metadata.usage.inputTokens || 0
 								const outputTokens = data.metadata.usage.outputTokens || 0
+								const cacheReadInputTokens = data.metadata.usage.cacheReadInputTokens || 0


From my testing, cache usage information is not returned when using converse-stream. Despite that, it is implicitly reflected in usage.inputTokens showing low token count (due to cache). This appears to be specific to AI Core only.

Thanks for the review.

You are right, SAP AI Core doesn't return the cache usage information, neither cache read and cache write.

This is one example of metadata payload I found in the streamed response from sonnet model of SAP AI Core:

MetadataEvent(usage=TokenUsage(input_tokens=10, output_tokens=145, total_tokens=23150, cache_read_input_tokens=None, cache_write_input_tokens=None), metrics=Metrics(latency_ms=5109))

input_tokens=10

output_tokens=145

total_tokens=23150

I believe cache mechanism works, so that some input tokens are written to cache, and some input token is read from cache.

I think the best way right now is to calibrate the input_token as this:

input_token = total_tokens - output_tokens

Yes, cache mechanism does work

Here is an example of a raw event:

:message-type��event{"metrics":{"latencyMs":17329},"p":"abcdefghijklmnopqrstuvwxyzAB","usage":{"cacheReadInputTokenCount":1490,"cacheReadInputTokens":1490,"cacheWriteInputTokenCount":0,"cacheWriteInputTokens":0,"inputTokens":4,"outputTokens":832,"totalTokens":2326}}%�qI

total_tokens = 2326
output_tokens = 832
input_tokens = 4
cache_read_input_tokens = 2326-832-4= 1490

The calculation logic may be adjusted according to the example calculation above

Agree.

The equation shall be :
totalTokens = cacheReadInputTokenCount + cacheWriteInputTokenCount + inputTokens + outputTokens

I will close this PR as another PR for the same purpose is merged.

I am creating another PR to calibrate the inputToken, otherwise the tokens number in context window is incorrect, context window might show a few token used but it could already exceeds the context window limit.

feat: support prompt cache for anthropic models in sap ai core provider

855025c

GTxx requested review from saoudrizwan, ocasta181, NightTrek, pashpashpash, dcbartlett, saito-sv and Garoth as code owners July 6, 2025 16:52

dcbartlett force-pushed the main branch from cc68ff0 to 988b65f Compare July 9, 2025 01:41

Merge branch 'main' into feat/sap_ai_core_prompt_cache_for_anthropic_…

4dfdfbb

…models

saoudrizwan assigned celestial-vault Jul 26, 2025

ncryptedV1 mentioned this pull request Aug 6, 2025

Feat: Prompt Caching in SAP AI Core #5399

Merged

11 tasks

tjandy98 reviewed Aug 9, 2025

View reviewed changes

GTxx closed this Aug 9, 2025

GTxx mentioned this pull request Aug 9, 2025

fix: calibrate input token when using anthropic models of sap ai core… #5469

Merged

11 tasks

dtrugman pushed a commit to dtrugman/cline that referenced this pull request Aug 24, 2025

Fix errant maxReadFileLine default (cline#4683)

254baa8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support prompt cache for anthropic models in sap ai core provider #4683

feat: support prompt cache for anthropic models in sap ai core provider #4683

Uh oh!

GTxx commented Jul 6, 2025 •

edited by ellipsis-dev bot

Loading

Uh oh!

changeset-bot bot commented Jul 10, 2025

Uh oh!

GTxx commented Jul 23, 2025

Uh oh!

saoudrizwan commented Jul 26, 2025 •

edited

Loading

Uh oh!

lizzzcai commented Aug 6, 2025

Uh oh!

tjandy98 Aug 9, 2025 •

edited

Loading

Uh oh!

GTxx Aug 9, 2025

Uh oh!

tjandy98 Aug 9, 2025

Uh oh!

GTxx Aug 9, 2025

Uh oh!

Uh oh!

feat: support prompt cache for anthropic models in sap ai core provider #4683

feat: support prompt cache for anthropic models in sap ai core provider #4683

Uh oh!

Conversation

GTxx commented Jul 6, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related Issue

Description

Test Procedure

Type of Change

Pre-flight Checklist

Screenshots

Additional Notes

Uh oh!

changeset-bot bot commented Jul 10, 2025

🦋 Changeset detected

Uh oh!

GTxx commented Jul 23, 2025

Uh oh!

saoudrizwan commented Jul 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lizzzcai commented Aug 6, 2025

Uh oh!

tjandy98 Aug 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GTxx Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

tjandy98 Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

GTxx Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

GTxx commented Jul 6, 2025 •

edited by ellipsis-dev bot

Loading

saoudrizwan commented Jul 26, 2025 •

edited

Loading

tjandy98 Aug 9, 2025 •

edited

Loading