Skip to content

Conversation

thomasyopes
Copy link
Contributor

@thomasyopes thomasyopes commented Jul 9, 2025

Ref: ENG-602

Dependencies

Description

  • aws batch job running the fhir-to-csv transform

Testing

  • Local
    • transform works against development snowflake
  • Staging
    • transform works against staging snowflake
  • Sandbox
    • N/A
  • Production
    • transform works against production snowflake

Release Plan

  • Upstream dependencies are met/released
  • Merge this

Summary by CodeRabbit

  • New Features

    • Introduced reusable deployment workflows and integrated analytics platform deployment into staging and production pipelines.
    • Added API endpoints and CLI commands to initiate FHIR-to-CSV transformation jobs supporting direct, Lambda, SQS queue, and AWS Batch execution.
    • Released a Python-based FHIR-to-CSV transformation service with Docker Compose and Lambda deployment options.
    • Provided extensive configuration files for detailed mapping of FHIR resources to CSV outputs.
    • Enabled Snowflake integration with automated table setup, data loading, and job table management.
    • Added AWS Batch, ECS, SQS, and Lambda resources for scalable and asynchronous batch processing.
    • Enhanced API service and infrastructure with environment variables, permissions, and deployment workflows for analytics platform components.
  • Documentation

    • Added comprehensive README and configuration documentation for the FHIR-to-CSV data transformation package.
  • Chores

    • Integrated new deployment scripts, Dockerfiles, and reusable GitHub Actions workflows for analytics platform automation.
    • Added utility modules for Snowflake credentials, database naming conventions, environment enums, and API utilities.
    • Updated infrastructure stacks to support analytics batch jobs, queues, Lambda functions, and permissions.

Thomas Yopes added 4 commits July 9, 2025 05:39
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
…triport/metriport into eng-602-analytics-platform-v0

Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Copy link

linear bot commented Jul 9, 2025

Copy link

coderabbitai bot commented Jul 9, 2025

Walkthrough

This change introduces a comprehensive analytics platform for FHIR-to-CSV data transformation and loading into Snowflake, including new Python transformation code, configuration files, and deployment infrastructure. It adds AWS Batch, ECS, SQS, and Lambda resources, new API endpoints, deployment workflows, and supporting utilities for managing FHIR data processing jobs, integrating tightly with Snowflake and AWS services.

Changes

File(s) / Path(s) Change Summary
.github/workflows/_deploy-analytics-platform.yml, branch-to-staging.yml, deploy-production.yml, deploy-staging.yml Added a reusable GitHub Actions workflow for deploying the analytics platform, integrated into staging and production deployment workflows, and extended job selection logic for analytics-platform.
packages/api/src/routes/internal/analytics-platform/index.ts, packages/api/src/routes/internal/index.ts Added new Express router with three POST endpoints for FHIR-to-CSV jobs, registered under /internal/analytics-platform.
packages/core/src/command/analytics-platform/config.ts New module to parse and validate Snowflake credentials from environment variables using zod schema.
packages/core/src/external/aws/batch.ts New AWS Batch integration module with BatchUtils class for submitting jobs and managing AWS Batch client.
packages/core/src/util/config.ts Removed Snowflake credentials method; added static methods for FHIR-to-CSV queue URL, Lambda name, and Batch job ARNs.
packages/data-transformation/fhir-to-csv/ (multiple new files: Dockerfile.dev, Dockerfile.lambda, main.py, requirements.txt, README.md, docker-compose.yml, .gitignore, src/...) New Python FHIR-to-CSV transformation package: includes Dockerfiles, main transformation script, requirements, configs for resource parsing, utilities for file naming, database, environment, and API access, modules for parsing FHIR/NDJSON bundles, Snowflake setup, and resource-specific configuration files for detailed CSV extraction.
packages/infra/config/analytics-platform-config.ts, example.ts Added warehouse, role, and integrationName properties to analyticsPlatform's snowflake config.
packages/infra/lib/analytics-platform/analytics-platform-stack.ts, packages/infra/lib/analytics-platform/types.ts Major extension of nested stack: adds AWS Batch, ECS, Lambda, SQS, ECR, S3, and Snowflake integration resources and exposes them via new class properties and asset types.
packages/infra/lib/api-stack.ts, packages/infra/lib/api-stack/api-service.ts Integrates analytics platform stack into API stack, adds asset references, environment variables, and grants permissions for ECS tasks to interact with analytics platform resources.
packages/scripts/deploy-analytics-platform.sh, push-image.sh New deployment script for analytics platform; push-image.sh updated for optional tag prefix and ECR login.
packages/core/src/command/analytics-platform/api/start-fhir-to-csv-transform.ts, packages/core/src/command/analytics-platform/fhir-to-csv/batch/fhir-to-csv.ts, packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv-transform.ts, packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-cloud.ts, packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts, packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-factory.ts, packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv.ts Added interfaces and implementations for FHIR-to-CSV job processing: direct, cloud (SQS), Lambda, and Batch job initiation, with request/handler types and factory logic.
packages/lambdas/src/analytics-platform/fhir-to-csv.ts, packages/lambdas/src/shared/analytics-platform.ts New Lambda handler for FHIR-to-CSV SQS jobs, with Zod schema validation and direct processing logic.

Sequence Diagram(s)

High-Level FHIR-to-CSV Batch Processing Flow

sequenceDiagram
    participant API_Client
    participant API_Service
    participant SQS_Queue
    participant Lambda_Transform
    participant AWS_Batch
    participant ECS_Container
    participant S3_Bucket
    participant Snowflake

    API_Client->>API_Service: POST /internal/analytics-platform/fhir-to-csv (job params)
    API_Service->>SQS_Queue: Enqueue FHIR-to-CSV job
    SQS_Queue->>Lambda_Transform: Trigger Lambda with job message
    Lambda_Transform->>AWS_Batch: (optionally) Submit Batch job
    AWS_Batch->>ECS_Container: Start FHIR-to-CSV container
    ECS_Container->>S3_Bucket: Read FHIR bundles, write CSV outputs
    ECS_Container->>Snowflake: Load CSV data into tables
    ECS_Container-->>API_Service: (optional) Callback/status update
Loading

Direct Lambda Processing (Dev/Small Jobs)

sequenceDiagram
    participant API_Client
    participant API_Service
    participant Lambda_Transform
    participant S3_Bucket
    participant Snowflake

    API_Client->>API_Service: POST /internal/analytics-platform/fhir-to-csv/transform
    API_Service->>Lambda_Transform: Invoke Lambda synchronously
    Lambda_Transform->>S3_Bucket: Read/write FHIR/CSV data
    Lambda_Transform->>Snowflake: Load CSV data
    Lambda_Transform-->>API_Service: Return result
Loading

Possibly related PRs

  • metriport/metriport#4168: Adds integrationUserArn and integrationExternalId to the Snowflake config, closely related to this PR's addition of warehouse, role, and integrationName in the same config object.
  • metriport/metriport#4031: Refactors Lambda handlers to use Zod schema validation for SQS events, directly related to this PR's Lambda handler for analytics-platform/fhir-to-csv which uses a similar validation pattern.
  • metriport/metriport#4167: Adds Snowflake S3 integration IAM role and policy, which this PR builds upon by expanding Snowflake integration and infrastructure for batch analytics.

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

npm error code ERR_SSL_WRONG_VERSION_NUMBER
npm error errno ERR_SSL_WRONG_VERSION_NUMBER
npm error request to https://10.0.0.28:4873/punycode/-/punycode-2.3.1.tgz failed, reason: C00CF19FCB7F0000:error:0A00010B:SSL routines:ssl3_get_record:wrong version number:../deps/openssl/openssl/ssl/record/ssl3_record.c:354:
npm error
npm error A complete log of this run can be found in: /.npm/_logs/2025-07-19T22_14_33_598Z-debug-0.log

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Thomas Yopes added 2 commits July 9, 2025 09:41
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
@thomasyopes thomasyopes changed the base branch from develop to eng-602-analytics-platform-base-infra July 9, 2025 16:43
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Base automatically changed from eng-602-analytics-platform-base-infra to develop July 9, 2025 18:10
Thomas Yopes added 4 commits July 9, 2025 13:38
…om:metriport/metriport into eng-602-analytics-platform-v0

Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
…om:metriport/metriport into eng-602-analytics-platform-v0

Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Comment on lines 11 to 19
environment:
DBT_PROFILES_DIR: /usr/app/dbt/profiles/snowflake
DBT_TARGET: ${DBT_TARGET}
CX_ID: ${CX_ID}
DBT_SNOWFLAKE_CI_ACCOUNT: ${DBT_SNOWFLAKE_ACCOUNT}
DBT_SNOWFLAKE_CI_USER: ${DBT_SNOWFLAKE_CI_USER}
DBT_SNOWFLAKE_CI_PASSWORD: ${DBT_SNOWFLAKE_CI_PASSWORD}
DBT_SNOWFLAKE_CI_ROLE: ${DBT_SNOWFLAKE_CI_ROLE}
DBT_SNOWFLAKE_CI_WAREHOUSE: ${DBT_SNOWFLAKE_CI_WAREHOUSE}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do the environment variables get exposed to the docker-compose run again? Does their need to be a sibling .env with the corresponding env variables?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locally, yes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise they're assed in docker run as -e values I believe

Comment on lines 1 to 34
default:
outputs:
development:
account: "{{ env_var('DBT_SNOWFLAKE_CI_ACCOUNT') }}"
role: "{{ env_var('DBT_SNOWFLAKE_CI_ROLE') }}"
user: "{{ env_var('DBT_SNOWFLAKE_CI_USER') }}"
password: "{{ env_var('DBT_SNOWFLAKE_CI_PASSWORD') }}"
warehouse: "{{ env_var('DBT_SNOWFLAKE_CI_WAREHOUSE') }}"
database: "DEVELOPMENT_METRICS_{{ env_var('CX_ID') }}"
schema: public
threads: 5
type: snowflake
staging:
account: "{{ env_var('DBT_SNOWFLAKE_CI_ACCOUNT') }}"
role: "{{ env_var('DBT_SNOWFLAKE_CI_ROLE') }}"
user: "{{ env_var('DBT_SNOWFLAKE_CI_USER') }}"
password: "{{ env_var('DBT_SNOWFLAKE_CI_PASSWORD') }}"
warehouse: "{{ env_var('DBT_SNOWFLAKE_CI_WAREHOUSE') }}"
database: "STAGING_METRICS_{{ env_var('CX_ID') }}"
schema: public
threads: 5
type: snowflake
production:
account: "{{ env_var('DBT_SNOWFLAKE_CI_ACCOUNT') }}"
role: "{{ env_var('DBT_SNOWFLAKE_CI_ROLE') }}"
user: "{{ env_var('DBT_SNOWFLAKE_CI_USER') }}"
password: "{{ env_var('DBT_SNOWFLAKE_CI_PASSWORD') }}"
warehouse: "{{ env_var('DBT_SNOWFLAKE_CI_WAREHOUSE') }}"
database: "PRODUCTION_METRICS_{{ env_var('CX_ID') }}"
schema: public
threads: 5
type: snowflake

target: development
Copy link
Contributor

@lucasdellabella lucasdellabella Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The env vars used per env probably need to be different right? If not, we should just have profile that is injected with different data per environment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, once we include this we should likely get rid of the duplicate values 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not applicable any longer

pushd ${PACKAGE_FOLDER}

# Build and push Docker images
docker buildx build \
--platform linux/amd64 \
--tag "$ECR_REPO_URI:latest" \
--tag "$ECR_REPO_URI:${TAG_PREFIX}latest" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this tag prefix work? id normally expect that we either have latest (backwards compatibility) or we support a tag, not a some tag suffixed with -latest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's basically allowing more than one image to live in the same ECR... otherwise ECR isn't super scalable but maybe there's a way to do what I want without adjusting the tags.

Comment on lines 26 to 35
fhirToCsvBatchJob: {
imageName: "fhir-to-csv",
memory: cdk.Size.mebibytes(1024),
cpu: 512,
},
csvToMetricsBatchJob: {
imageName: "csv-to-metrics",
memory: cdk.Size.mebibytes(1024),
cpu: 512,
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

howd you decide on the memory? That's 1GB - was expecting it to need to be a lot higher. Does the job work on data in chunks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chosen at random at the moment. The transform cleans up after itself, but also may need more.

Comment on lines 45 to 46
with open(local_bundle_file_key, "wb") as f:
s3_client.download_file(input_bucket, s3_bundle_file_key, local_bundle_file_key)
Copy link
Contributor

@lucasdellabella lucasdellabella Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just call download_file, it's not like we're writing to f, yeah?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Thomas Yopes added 4 commits July 11, 2025 13:23
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
…-analytics-platform-v0

Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
@coderabbitai coderabbitai bot mentioned this pull request Aug 18, 2025
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants