Skip to content

Conversation

susan-shu-c
Copy link
Member

@susan-shu-c susan-shu-c commented Jun 11, 2025

Overview

This PR proposes changes on top of

based on discussion with @felixbarny @eyalkoren in this PR:

namely, updating gen_ai.request.model and gen_ai.agent.description from text to keyword.

The rationale:

  • gen_ai.request.model: doesn't require text analyzing
  • gen_ai.agent.description: phrase queries may not be used often for this field, and may be expensive from storage and indexing - via @felixbarny

While it has a limit on how many characters it can store in doc_values it falls back to storing the value in a stored field if ignore_above is set, which seems like a good compromise here.

I'm looking for feedback if this is covered, as it's part of the generated files and I'm not completely sure which changes are expected for ignore_above to work.

Checklist

  • Have you signed the contributor license agreement?
  • Have you followed the contributor guidelines?
  • For proposing substantial changes or additions to the schema, have you reviewed the RFC process?
  • If submitting code/script changes, have you verified all tests pass locally using make test?
  • If submitting schema/fields updates, have you generated new artifacts by running make and committed those changes?
  • Is your pull request against main? Unless there is a good reason otherwise, we prefer pull requests against main and will backport as needed.
  • Have you added an entry to the CHANGELOG.next.md?

@susan-shu-c susan-shu-c requested a review from a team as a code owner June 11, 2025 20:04
Copy link

Documentation changes preview: https://docs-v3-preview.elastic.dev/elastic/ecs/pull/2489/reference/

Copy link

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@eyalkoren
Copy link
Contributor

eyalkoren commented Jun 12, 2025

LGTM

While it has a limit on how many characters it can store in doc_values it falls back to storing the value in a stored field if ignore_above is set, which seems like a good compromise here.

I'm looking for feedback if this is covered , as it's part of the generated files and I'm not completely sure which changes are expected for ignore_above to work.

I am not sure what you mean. Most (or all) keyword fields in ECS are defined with ignore_above: 1024.
In addition, this is the setting we use in ecs@mappings by default for all string fields that are not mapped otherwise.

@eyalkoren eyalkoren requested a review from felixbarny June 12, 2025 07:06
Copy link
Member

@felixbarny felixbarny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional suggestion to disable indexing for the description field, otherwise LGTM.

@susan-shu-c
Copy link
Member Author

Thanks @eyalkoren this is clear enough to me! I just wanted to make sure I wasn't missing something

Most (or all) keyword fields in ECS are defined with ignore_above: 1024.

susan-shu-c and others added 2 commits June 12, 2025 08:59
Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>
@susan-shu-c
Copy link
Member Author

cc @trisch-me @mjwolf regarding updating gen_ai.request.model and gen_ai.agent.description from text to keyword, as the former doesn't require text analyzing and and latter is not often used for direct phrase queries; gen_ai.agent.name can be used instead.

This came up as part of:
Update ecs@mappings.json

@trisch-me trisch-me merged commit a1237a3 into main Jun 13, 2025
6 checks passed
@susan-shu-c
Copy link
Member Author

Thank you, @trisch-me !

description: Free-form description of the GenAI agent provided by the application.
example: Helps with math problems; Generates fiction stories
index: false
doc_values: false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's curious that doc_values is also false now but just setting index: false in the schema.

Is it possible to explicitly set doc_values: true in here to retain it in the mappings (and to keep ignore_above)?

ecs/schemas/gen_ai.yml

Lines 29 to 37 in a1237a3

- name: agent.description
type: keyword
index: false
description: Free-form description of the GenAI agent provided by the application.
example: Helps with math problems; Generates fiction stories
level: extended
beta: This field is beta and subject to change.
otel:
- relation: match

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out! - To summarize what we discussed on the Elasticsearch issue we'll merge first and then continue to adjust in the upcoming weeks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixbarny It's possible to explicitly set doc_values: true, and it will be applied in the generated files.

I've created an ECS issue to update the generator: #2492.

Should ignore_above: 1024 be set as the default value for all keywords? I can add that to the issue to update the generator too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants