-
Notifications
You must be signed in to change notification settings - Fork 72
feat(ap): adding snowflake s3 integration role #4167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ap): adding snowflake s3 integration role #4167
Conversation
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
WalkthroughA new Snowflake integration configuration was added to the analytics platform settings, including two fields for AWS IAM integration. The infrastructure stack was updated to create an IAM policy and role for Snowflake access to a specific S3 bucket prefix, using values from the new configuration properties. Changes
Sequence Diagram(s)sequenceDiagram
participant Config as analyticsPlatform-config
participant Example as example-config
participant Stack as AnalyticsPlatformsNestedStack
participant IAM as AWS IAM
participant S3 as S3 Bucket
Config->>Example: Defines snowflake integrationUserArn & integrationExternalId
Example->>Stack: Passes snowflake config to stack
Stack->>IAM: Creates policy for S3 access (snowflake prefix)
Stack->>IAM: Creates role for Snowflake, using integrationUserArn & integrationExternalId
IAM->>S3: Grants Snowflake role permissions on S3 bucket (snowflake prefix)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
npm error code ERR_SSL_WRONG_VERSION_NUMBER 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
packages/infra/config/analytics-platform-config.ts
(1 hunks)packages/infra/config/example.ts
(1 hunks)packages/infra/lib/analytics-platform/analytics-platform-stack.ts
(3 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
`**/*.{js,jsx,ts,tsx}`: Don’t use null inside the app, only on code interacting ...
**/*.{js,jsx,ts,tsx}
: Don’t use null inside the app, only on code interacting with external interfaces/services, like DB and HTTP; convert to undefined before sending inwards into the code
Useconst
whenever possible
Useasync/await
instead of.then()
Naming: classes, enums: PascalCase
Naming: constants, variables, functions: camelCase
Naming: file names: kebab-case
Naming: Don’t use negative names, likenotEnabled
, preferisDisabled
If possible, use decomposing objects for function parameters
Prefer Nullish Coalesce (??) than the OR operator (||) when you want to provide a default value
Avoid creating arrow functions
Use truthy syntax instead ofin
- i.e.,if (data.link)
notif ('link' in data)
While handling errors, keep the stack trace around: if you create a new Error (e.g., MetriportError), make sure to pass the original error as the new one’s cause so the stack trace is available upstream.
max column length is 100 chars
multi-line comments use/** */
top-level comments go after the import (save pre-import to basic file header, like license)
move literals to constants declared after imports when possible
📄 Source: CodeRabbit Inference Engine (.cursorrules)
List of files the instruction was applied to:
packages/infra/config/analytics-platform-config.ts
packages/infra/config/example.ts
packages/infra/lib/analytics-platform/analytics-platform-stack.ts
`**/*.{ts,tsx}`: Use types whenever possible
**/*.{ts,tsx}
: Use types whenever possible
📄 Source: CodeRabbit Inference Engine (.cursorrules)
List of files the instruction was applied to:
packages/infra/config/analytics-platform-config.ts
packages/infra/config/example.ts
packages/infra/lib/analytics-platform/analytics-platform-stack.ts
`**/*.ts`: - Use the Onion Pattern to organize a package's code in layers - Try ...
**/*.ts
: - Use the Onion Pattern to organize a package's code in layers
- Try to use immutable code and avoid sharing state across different functions, objects, and systems
- Try to build code that's idempotent whenever possible
- Prefer functional programming style functions: small, deterministic, 1 input, 1 output
- Minimize coupling / dependencies
- Avoid modifying objects received as parameter
- Only add comments to code to explain why something was done, not how it works
- Naming
- classes, enums:
PascalCase
- constants, variables, functions:
camelCase
- file names:
kebab-case
- table and column names:
snake_case
- Use meaningful names, so whoever is reading the code understands what it means
- Don’t use negative names, like
notEnabled
, preferisDisabled
- For numeric values, if the type doesn’t convey the unit, add the unit to the name
- Typescript
- Use types
- Prefer
const
instead oflet
- Avoid
any
and casting fromany
to other types- Type predicates: only applicable to narrow down the type, not to force a complete type conversion
- Prefer deconstructing parameters for functions instead of multiple parameters that might be of
the same type- Don’t use
null
inside the app, only on code interacting with external interfaces/services,
like DB and HTTP; convert toundefined
before sending inwards into the code- Use
async/await
instead of.then()
- Use the strict equality operator
===
, don’t use abstract equality operator==
- When calling a Promise-returning function asynchronously (i.e., not awaiting), use
.catch()
to
handle errors (seeprocessAsyncError
andemptyFunction
depending on the case)- Date and Time
- Always use
buildDayjs()
to createdayjs
instances- Prefer
dayjs.duration(...)
to create duration consts and keep them asduration
- Prefer Nullish Coalesce (??) than the OR operator (||) to provide a default value
- Avoid creating arrow functions
- Use truthy syntax instead of
in
- i.e.,if (data.link)
notif ('link' in data)
- Error handling
- Pass the original error as the new one’s
cause
so the stack trace is persisted- Error messages should have a static message - add dynamic data to MetriportError's
additionalInfo
prop- Avoid sending multiple events to Sentry for a single error
- Global constants and variables
- Move literals to constants declared after imports when possible (avoid magic numbers)
- Avoid shared, global objects
- Avoid using
console.log
andconsole.error
in packages other than utils, infra and shared,
and try to useout().log
instead- Avoid multi-line logs
- don't send objects as a second parameter to
console.log()
orout().log()
- don't create multi-line strings when using
JSON.stringify()
- Use
eslint
to enforce code style- Use
prettier
to format code- max column length is 100 chars
- multi-line comments use
/** */
- scripts: top-level comments go after the import
⚙️ Source: CodeRabbit Configuration File
List of files the instruction was applied to:
packages/infra/config/analytics-platform-config.ts
packages/infra/config/example.ts
packages/infra/lib/analytics-platform/analytics-platform-stack.ts
🧠 Learnings (3)
📓 Common learnings
Learnt from: leite08
PR: metriport/metriport#3550
File: packages/api/src/external/commonwell/__tests__/patient.test.ts:5-5
Timestamp: 2025-04-01T20:57:29.282Z
Learning: PR #3574 is the follow-up PR that moves the domain feature flags from packages/api/src/aws/app-config.ts to packages/core/src/command/feature-flags/domain-ffs.ts.
Learnt from: leite08
PR: metriport/metriport#0
File: :0-0
Timestamp: 2025-03-05T18:43:30.389Z
Learning: For backmerge PRs at metriport/metriport, only verify two things: (1) that the source branch is `master` and destination branch is `develop`, and (2) that there's a link to at least one PR (usually a "patch" PR) in the description. No need for detailed review comments or updates to the PR description unless there's an issue with these criteria.
Learnt from: leite08
PR: metriport/metriport#3550
File: packages/api/src/external/commonwell/__tests__/patient.test.ts:5-5
Timestamp: 2025-04-01T20:57:29.282Z
Learning: PR #3574 is the follow-up PR that moves the domain feature flags from app-config.ts to the domain folder in packages/core.
packages/infra/config/example.ts (5)
Learnt from: leite08
PR: metriport/metriport#3550
File: packages/api/src/external/commonwell/__tests__/patient.test.ts:5-5
Timestamp: 2025-04-01T19:59:22.396Z
Learning: The file at `packages/api/src/aws/app-config.ts` contains domain feature flags (not AWS AppConfig specific functionality) and is planned to be moved to the domain folder in packages/core in a future PR.
Learnt from: lucasdellabella
PR: metriport/metriport#3948
File: packages/infra/config/example.ts:179-180
Timestamp: 2025-06-03T14:41:10.149Z
Learning: Example configuration files (like example.ts) use placeholder values and don't need to match real environment-specific constants or actual deployment values. They're meant to demonstrate structure and format.
Learnt from: leite08
PR: metriport/metriport#3550
File: packages/api/src/external/commonwell/__tests__/patient.test.ts:5-5
Timestamp: 2025-04-01T20:57:29.282Z
Learning: PR #3574 is the follow-up PR that moves the domain feature flags from packages/api/src/aws/app-config.ts to packages/core/src/command/feature-flags/domain-ffs.ts.
Learnt from: lucasdellabella
PR: metriport/metriport#3948
File: packages/infra/lib/hl7-notification-stack/constants.ts:6-14
Timestamp: 2025-06-03T14:43:16.951Z
Learning: The environment-specific IP constants (MLLP_SERVER_NLB_PROD_INTERNAL_IP_A/B and MLLP_SERVER_NLB_STAGING_INTERNAL_IP_A/B) in packages/infra/lib/hl7-notification-stack/constants.ts are used in symlinked configuration files that are not visible during codebase analysis, which is why they may appear unused despite being actively referenced.
Learnt from: leite08
PR: metriport/metriport#3469
File: packages/utils/src/s3/list-objects.ts:12-13
Timestamp: 2025-03-14T01:14:58.991Z
Learning: In utility scripts under `./packages/utils/src/s3/`, empty string values like `const bucketName = ``; const filePrefix = ``;` are intentional placeholders meant to be filled in by developers before running the script, as documented in the TSDoc comments. These should not be flagged as issues or suggested to be replaced with environment variables.
packages/infra/lib/analytics-platform/analytics-platform-stack.ts (2)
Learnt from: leite08
PR: metriport/metriport#3545
File: packages/infra/lib/api-stack.ts:383-391
Timestamp: 2025-03-28T23:03:04.409Z
Learning: When creating a nested stack in AWS CDK, don't include "NestedStack" in the ID parameter since the class name already has that suffix.
Learnt from: thomasyopes
PR: metriport/metriport#3871
File: packages/infra/lib/surescripts-stack.ts:189-189
Timestamp: 2025-05-21T20:57:09.547Z
Learning: When creating AWS S3 buckets with AWS CDK, if the bucketName property is not provided, CDK will generate a default name rather than failing. In the Metriport codebase, properties like bucket names should be made required in type definitions rather than adding runtime validation.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: check-pr / lint-build-test
- GitHub Check: Analyze (javascript)
🔇 Additional comments (5)
packages/infra/config/analytics-platform-config.ts (1)
6-9
: LGTM! Interface addition follows TypeScript best practices.The new
snowflake
configuration interface is well-structured with appropriate field names and types.packages/infra/config/example.ts (1)
243-246
: LGTM! Example configuration uses appropriate placeholder values.The placeholder values follow the expected format for Snowflake integration configuration.
packages/infra/lib/analytics-platform/analytics-platform-stack.ts (3)
3-3
: LGTM! IAM import added for Snowflake integration.The import is necessary for the new IAM role and policy creation.
18-18
: LGTM! Bucket variable assignment enables reuse in policy statements.Storing the bucket reference in a variable is good practice for referencing its ARN in the IAM policy.
29-55
: LGTM! IAM policy follows Snowflake S3 integration best practices.The policy statements correctly:
- Grant object-level permissions on the
snowflake/*
prefix- Restrict bucket-level operations with appropriate prefix conditions
- Use meaningful resource ARNs and conditions
The implementation follows AWS IAM best practices for scoped S3 access.
new iam.Role(this, "SnowflakeIntegrationRole", { | ||
assumedBy: new iam.AccountPrincipal( | ||
props.config.analyticsPlatform.snowflake.integrationUserArn | ||
), | ||
externalIds: [props.config.analyticsPlatform.snowflake.integrationExternalId], | ||
inlinePolicies: { | ||
SnowflakeAnalyticsPlatformS3Policy: s3Policy.document, | ||
}, | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Fix the AccountPrincipal usage with integrationUserArn.
AccountPrincipal
expects an AWS account ID (12-digit number), but integrationUserArn
contains an ARN. For Snowflake integration, you typically need to use the Snowflake AWS account ID as the principal.
Consider this approach:
-assumedBy: new iam.AccountPrincipal(
- props.config.analyticsPlatform.snowflake.integrationUserArn
-),
+assumedBy: new iam.AccountPrincipal("474142129181"), // Snowflake's AWS account ID
Or extract the account ID from the ARN if that's the intended usage:
-assumedBy: new iam.AccountPrincipal(
- props.config.analyticsPlatform.snowflake.integrationUserArn
-),
+assumedBy: new iam.AccountPrincipal(
+ props.config.analyticsPlatform.snowflake.integrationUserArn.split(':')[4]
+),
🌐 Web query:
What is the correct AWS account ID for Snowflake S3 integration in 2024?
💡 Result:
To integrate Snowflake with Amazon S3, Snowflake provisions a unique AWS Identity and Access Management (IAM) user for your account. This IAM user is consistent across all S3 storage integrations within your Snowflake account. To identify this IAM user, you can execute the following SQL command in Snowflake:
DESC INTEGRATION <integration_name>;
Replace <integration_name>
with the name of your storage integration. This command will return various properties, including STORAGE_AWS_IAM_USER_ARN
, which contains the ARN of the IAM user. The ARN will be in the format:
arn:aws:iam::<aws_account_id>:user/<user_name>
Here, <aws_account_id>
is the AWS account ID associated with your Snowflake account, and <user_name>
is the specific IAM user's name. This IAM user is used by Snowflake to access your S3 buckets securely. (docs.snowflake.com)
For more detailed information on configuring Snowflake storage integrations with Amazon S3, refer to the Snowflake documentation: (docs.snowflake.com)
Citations:
- 1: https://docs.snowflake.com/en/user-guide/data-load-s3-config-storage-integration?utm_source=openai
- 2: https://docs.snowflake.com/en/user-guide/data-load-s3-config-storage-integration?utm_source=openai
Use an ArnPrincipal for the Snowflake IAM user ARN
The AccountPrincipal
construct expects a 12-digit AWS account ID, not a full IAM user ARN. Since integrationUserArn
is the ARN of the Snowflake-provisioned IAM user, switch to ArnPrincipal
, or if you truly intend to grant the entire Snowflake account, extract the account ID from the ARN.
• File: packages/infra/lib/analytics-platform/analytics-platform-stack.ts
Lines: 56–64
Suggested change:
- new iam.Role(this, "SnowflakeIntegrationRole", {
- assumedBy: new iam.AccountPrincipal(
- props.config.analyticsPlatform.snowflake.integrationUserArn
- ),
+ new iam.Role(this, "SnowflakeIntegrationRole", {
+ assumedBy: new iam.ArnPrincipal(
+ props.config.analyticsPlatform.snowflake.integrationUserArn
+ ),
externalIds: [props.config.analyticsPlatform.snowflake.integrationExternalId],
inlinePolicies: {
SnowflakeAnalyticsPlatformS3Policy: s3Policy.document,
},
});
If your goal is to allow the entire Snowflake AWS account (rather than just the IAM user), you can instead extract the account ID:
- assumedBy: new iam.AccountPrincipal(
- props.config.analyticsPlatform.snowflake.integrationUserArn
- ),
+ assumedBy: new iam.AccountPrincipal(
+ props.config.analyticsPlatform.snowflake.integrationUserArn.split(':')[4]
+ ),
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
new iam.Role(this, "SnowflakeIntegrationRole", { | |
assumedBy: new iam.AccountPrincipal( | |
props.config.analyticsPlatform.snowflake.integrationUserArn | |
), | |
externalIds: [props.config.analyticsPlatform.snowflake.integrationExternalId], | |
inlinePolicies: { | |
SnowflakeAnalyticsPlatformS3Policy: s3Policy.document, | |
}, | |
}); | |
new iam.Role(this, "SnowflakeIntegrationRole", { | |
assumedBy: new iam.ArnPrincipal( | |
props.config.analyticsPlatform.snowflake.integrationUserArn | |
), | |
externalIds: [props.config.analyticsPlatform.snowflake.integrationExternalId], | |
inlinePolicies: { | |
SnowflakeAnalyticsPlatformS3Policy: s3Policy.document, | |
}, | |
}); |
🤖 Prompt for AI Agents
In packages/infra/lib/analytics-platform/analytics-platform-stack.ts around
lines 56 to 64, the code incorrectly uses iam.AccountPrincipal with a full IAM
user ARN, but AccountPrincipal expects a 12-digit AWS account ID. To fix this,
replace iam.AccountPrincipal with iam.ArnPrincipal using the full
integrationUserArn, or if the intention is to allow the entire Snowflake
account, extract the account ID from the ARN and pass it to iam.AccountPrincipal
accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, the ARN is actually an account ID temporarily. I will update it to an ARN in a second pass, and update this as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thomasyopes, understood! That clarifies the temporary implementation. The naming is indeed misleading for now, but it makes sense given the phased approach. When you update it to an actual ARN, you'll need to switch to ArnPrincipal
as the bot suggested.
✏️ Learnings added
Learnt from: thomasyopes
PR: metriport/metriport#4167
File: packages/infra/lib/analytics-platform/analytics-platform-stack.ts:56-64
Timestamp: 2025-07-09T20:41:08.494Z
Learning: In the analytics platform Snowflake integration (packages/infra/lib/analytics-platform/analytics-platform-stack.ts), the `integrationUserArn` field in the config temporarily contains an AWS account ID rather than an ARN. This is planned to be updated to an actual ARN in a second pass, at which point the IAM Role's assumedBy should be changed from AccountPrincipal to ArnPrincipal.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: leite08
PR: metriport/metriport#3545
File: packages/infra/lib/api-stack.ts:383-391
Timestamp: 2025-03-28T23:03:04.409Z
Learning: When creating a nested stack in AWS CDK, don't include "NestedStack" in the ID parameter since the class name already has that suffix.
Learnt from: thomasyopes
PR: metriport/metriport#3709
File: packages/infra/lib/ehr-nested-stack.ts:470-486
Timestamp: 2025-04-23T19:33:45.489Z
Learning: The `refreshEhrBundles` Lambda in the EhrNestedStack uses the API to refresh bundles, rather than accessing the S3 bucket directly. This is why it doesn't need the EHR_BUNDLE_BUCKET_NAME environment variable or direct S3 bucket permissions, unlike other similar Lambdas.
Learnt from: leite08
PR: metriport/metriport#3550
File: packages/api/src/external/commonwell/__tests__/patient.test.ts:5-5
Timestamp: 2025-04-01T20:57:29.282Z
Learning: PR #3574 is the follow-up PR that moves the domain feature flags from packages/api/src/aws/app-config.ts to packages/core/src/command/feature-flags/domain-ffs.ts.
Learnt from: leite08
PR: metriport/metriport#0
File: :0-0
Timestamp: 2025-03-05T18:43:30.389Z
Learning: For backmerge PRs at metriport/metriport, only verify two things: (1) that the source branch is `master` and destination branch is `develop`, and (2) that there's a link to at least one PR (usually a "patch" PR) in the description. No need for detailed review comments or updates to the PR description unless there's an issue with these criteria.
Learnt from: leite08
PR: metriport/metriport#3550
File: packages/api/src/external/commonwell/__tests__/patient.test.ts:5-5
Timestamp: 2025-04-01T20:57:29.282Z
Learning: PR #3574 is the follow-up PR that moves the domain feature flags from app-config.ts to the domain folder in packages/core.
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602
Issues:
Dependencies
Description
Testing
Release Plan
Summary by CodeRabbit
New Features
Documentation