-
Notifications
You must be signed in to change notification settings - Fork 72
feat(analytics): metrics integration #4165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
…triport/metriport into eng-602-analytics-platform-v0 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
WalkthroughThis change introduces a comprehensive analytics platform for FHIR-to-CSV data transformation and loading into Snowflake, including new Python transformation code, configuration files, and deployment infrastructure. It adds AWS Batch, ECS, SQS, and Lambda resources, new API endpoints, deployment workflows, and supporting utilities for managing FHIR data processing jobs, integrating tightly with Snowflake and AWS services. Changes
Sequence Diagram(s)High-Level FHIR-to-CSV Batch Processing FlowsequenceDiagram
participant API_Client
participant API_Service
participant SQS_Queue
participant Lambda_Transform
participant AWS_Batch
participant ECS_Container
participant S3_Bucket
participant Snowflake
API_Client->>API_Service: POST /internal/analytics-platform/fhir-to-csv (job params)
API_Service->>SQS_Queue: Enqueue FHIR-to-CSV job
SQS_Queue->>Lambda_Transform: Trigger Lambda with job message
Lambda_Transform->>AWS_Batch: (optionally) Submit Batch job
AWS_Batch->>ECS_Container: Start FHIR-to-CSV container
ECS_Container->>S3_Bucket: Read FHIR bundles, write CSV outputs
ECS_Container->>Snowflake: Load CSV data into tables
ECS_Container-->>API_Service: (optional) Callback/status update
Direct Lambda Processing (Dev/Small Jobs)sequenceDiagram
participant API_Client
participant API_Service
participant Lambda_Transform
participant S3_Bucket
participant Snowflake
API_Client->>API_Service: POST /internal/analytics-platform/fhir-to-csv/transform
API_Service->>Lambda_Transform: Invoke Lambda synchronously
Lambda_Transform->>S3_Bucket: Read/write FHIR/CSV data
Lambda_Transform->>Snowflake: Load CSV data
Lambda_Transform-->>API_Service: Return result
Possibly related PRs
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
npm error code ERR_SSL_WRONG_VERSION_NUMBER ✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
…om:metriport/metriport into eng-602-analytics-platform-v0 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
…om:metriport/metriport into eng-602-analytics-platform-v0 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
environment: | ||
DBT_PROFILES_DIR: /usr/app/dbt/profiles/snowflake | ||
DBT_TARGET: ${DBT_TARGET} | ||
CX_ID: ${CX_ID} | ||
DBT_SNOWFLAKE_CI_ACCOUNT: ${DBT_SNOWFLAKE_ACCOUNT} | ||
DBT_SNOWFLAKE_CI_USER: ${DBT_SNOWFLAKE_CI_USER} | ||
DBT_SNOWFLAKE_CI_PASSWORD: ${DBT_SNOWFLAKE_CI_PASSWORD} | ||
DBT_SNOWFLAKE_CI_ROLE: ${DBT_SNOWFLAKE_CI_ROLE} | ||
DBT_SNOWFLAKE_CI_WAREHOUSE: ${DBT_SNOWFLAKE_CI_WAREHOUSE} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do the environment variables get exposed to the docker-compose run again? Does their need to be a sibling .env
with the corresponding env variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Locally, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise they're assed in docker run as -e values I believe
default: | ||
outputs: | ||
development: | ||
account: "{{ env_var('DBT_SNOWFLAKE_CI_ACCOUNT') }}" | ||
role: "{{ env_var('DBT_SNOWFLAKE_CI_ROLE') }}" | ||
user: "{{ env_var('DBT_SNOWFLAKE_CI_USER') }}" | ||
password: "{{ env_var('DBT_SNOWFLAKE_CI_PASSWORD') }}" | ||
warehouse: "{{ env_var('DBT_SNOWFLAKE_CI_WAREHOUSE') }}" | ||
database: "DEVELOPMENT_METRICS_{{ env_var('CX_ID') }}" | ||
schema: public | ||
threads: 5 | ||
type: snowflake | ||
staging: | ||
account: "{{ env_var('DBT_SNOWFLAKE_CI_ACCOUNT') }}" | ||
role: "{{ env_var('DBT_SNOWFLAKE_CI_ROLE') }}" | ||
user: "{{ env_var('DBT_SNOWFLAKE_CI_USER') }}" | ||
password: "{{ env_var('DBT_SNOWFLAKE_CI_PASSWORD') }}" | ||
warehouse: "{{ env_var('DBT_SNOWFLAKE_CI_WAREHOUSE') }}" | ||
database: "STAGING_METRICS_{{ env_var('CX_ID') }}" | ||
schema: public | ||
threads: 5 | ||
type: snowflake | ||
production: | ||
account: "{{ env_var('DBT_SNOWFLAKE_CI_ACCOUNT') }}" | ||
role: "{{ env_var('DBT_SNOWFLAKE_CI_ROLE') }}" | ||
user: "{{ env_var('DBT_SNOWFLAKE_CI_USER') }}" | ||
password: "{{ env_var('DBT_SNOWFLAKE_CI_PASSWORD') }}" | ||
warehouse: "{{ env_var('DBT_SNOWFLAKE_CI_WAREHOUSE') }}" | ||
database: "PRODUCTION_METRICS_{{ env_var('CX_ID') }}" | ||
schema: public | ||
threads: 5 | ||
type: snowflake | ||
|
||
target: development |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The env vars used per env probably need to be different right? If not, we should just have profile that is injected with different data per environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, once we include this we should likely get rid of the duplicate values 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not applicable any longer
pushd ${PACKAGE_FOLDER} | ||
|
||
# Build and push Docker images | ||
docker buildx build \ | ||
--platform linux/amd64 \ | ||
--tag "$ECR_REPO_URI:latest" \ | ||
--tag "$ECR_REPO_URI:${TAG_PREFIX}latest" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does this tag prefix work? id normally expect that we either have latest (backwards compatibility) or we support a tag, not a some tag suffixed with -latest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's basically allowing more than one image to live in the same ECR... otherwise ECR isn't super scalable but maybe there's a way to do what I want without adjusting the tags.
packages/infra/lib/analytics-platform/analytics-platform-stack.ts
Outdated
Show resolved
Hide resolved
fhirToCsvBatchJob: { | ||
imageName: "fhir-to-csv", | ||
memory: cdk.Size.mebibytes(1024), | ||
cpu: 512, | ||
}, | ||
csvToMetricsBatchJob: { | ||
imageName: "csv-to-metrics", | ||
memory: cdk.Size.mebibytes(1024), | ||
cpu: 512, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
howd you decide on the memory? That's 1GB - was expecting it to need to be a lot higher. Does the job work on data in chunks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chosen at random at the moment. The transform cleans up after itself, but also may need more.
with open(local_bundle_file_key, "wb") as f: | ||
s3_client.download_file(input_bucket, s3_bundle_file_key, local_bundle_file_key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just call download_file, it's not like we're writing to f
, yeah?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
packages/data-transformation/fhir-to-csv/src/parseNdjsonBundle/parseNdjsonBundle.py
Outdated
Show resolved
Hide resolved
packages/data-transformation/fhir-to-csv/src/parseFhir/__pycache__/__init__.cpython-313.pyc
Outdated
Show resolved
Hide resolved
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
…-analytics-platform-v0 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602
Dependencies
Description
Testing
Release Plan
Summary by CodeRabbit
New Features
Documentation
Chores