Skip to content

Conversation

leite08
Copy link
Member

@leite08 leite08 commented Jul 25, 2025

Dependencies

none

Description

  • remove DB tx and fix some mappings
  • remove /fhir-to-csv/transform endpoint
  • remove cx level processing
  • update S3 and job table structure to contain pt ID

Testing

  • Local
    • Run 2 pts in parallel
    • Rebuild 2 pts in parallel
  • Branch to staging
    • Run all pts of a cx using the bulk-insert-patients.ts script
  • Staging
    • Run all pts of a cx using the bulk-insert-patients.ts script
  • Sandbox
    • none
  • Production
    • Rebuild the cx's whole pt roster

Release Plan

  • Merge this

Summary by CodeRabbit

  • New Features

    • Enhanced MedicationDispense and MedicationAdministration data extraction with expanded and reordered field mappings.
    • Added new identifier mapping for FamilyMemberHistory condition data.
    • Introduced patient status tracking during data transformation and Snowflake integration.
    • Added new file prefix formatting for output data organization.
  • Improvements

    • Streamlined data transformation to focus on single-patient processing.
    • Improved error handling for missing data and S3 bundle downloads.
    • Updated Snowflake table naming conventions to include patient IDs and enhanced logging.
    • Standardized job ID generation and usage across bulk patient insert operations.
    • Increased Lambda function timeout for improved processing reliability.
    • Refined Snowflake data loading with explicit stage management and patient status updates.
  • Chores

    • Removed unused dependencies from requirements.
    • Cleaned up and simplified internal API routes and related code.
  • Bug Fixes

    • Fixed issues with missing or inconsistent patient ID handling during transformation and database operations.

Copy link

linear bot commented Jul 25, 2025

@leite08 leite08 changed the title ENG-xxx ...WIP ENG-722 Data transformation on pt level with updates Jul 25, 2025
Copy link

coderabbitai bot commented Jul 25, 2025

Walkthrough

This change set removes the FHIR-to-CSV transform API endpoint and its client, refactors the FHIR-to-CSV data transformation pipeline to operate on a single patient per job, and updates configuration mappings for FamilyMemberHistory and Medication resources. It also reworks Snowflake integration to use patient-specific job tables and status tracking, adds logging, and updates utility functions and dependencies.

Changes

File(s) / Path(s) Change Summary
packages/api/src/routes/internal/analytics-platform/index.ts Removed the POST route handler and import for /internal/analytics-platform/fhir-to-csv/transform.
packages/core/src/command/analytics-platform/api/start-fhir-to-csv-transform.ts Deleted file: removed the function and type for initiating FHIR-to-CSV transformation via API.
packages/data-transformation/fhir-to-csv/main.py Refactored to process a single patient per job, removed api_url logic, improved error handling, updated S3 key structure, enforced patient_id requirement, adjusted Snowflake calls, added patient status tracking, and simplified table operations.
packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_FamilyMemberHistory_condition.ini Added familymemberhistory_id = id mapping in both [Struct] and [root_paths] sections.
packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_MedicationAdministration.ini Added mappings for subject_type, context_reference, context_type; reordered effectiveperiod_start in [Struct] and [root_paths].
packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_MedicationDispense.ini Added extensive new field mappings for MedicationDispense resource in [Struct] and [root_paths]; no deletions.
packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py Added patient_id to key functions, split and refactored table setup, added logging, removed explicit commit, removed rename_job_tables, added set_patient_status and stage management functions, updated table name logic to include patient_id.
packages/data-transformation/fhir-to-csv/src/utils/database.py Updated format_database_name to use "ANALYTICS2_", updated format_job_table_name to include patient_id, added format_patient_status_table_name, and changed format_stage_name to use cx_id, patient_id, and job_id.
packages/utils/src/snowflake/bulk-insert-patients.ts Changed jobId generation to one per batch (timestamp-based), removed uuidv4 import, updated logs to include jobId.
packages/data-transformation/fhir-to-csv/requirements.txt Removed numpy==2.3.1 dependency.
packages/infra/lib/analytics-platform/analytics-platform-stack.ts Increased Lambda function timeout for fhirToCsvTransform from 2 to 10 minutes; adjusted dependent timeout for fhirToCsv accordingly.
packages/data-transformation/fhir-to-csv/src/utils/file.py Added new function create_output_file_prefix to generate S3 key prefixes for output files based on parameters.
packages/data-transformation/fhir-to-csv/src/parseFhir/parseFhir.py Removed timestamp from log message format, keeping only log level and message.
packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv-transform.ts Enhanced log message in startFhirToCsvTransform to include patient ID and job ID for better traceability.
packages/data-transformation/fhir-to-csv/Dockerfile.lambda Added PYTHONUNBUFFERED=1 environment variable to Dockerfile to ensure unbuffered Python output.

Sequence Diagram(s)

sequenceDiagram
    participant SQS
    participant bulk-insert-patients.ts
    participant Lambda (main.py)
    participant S3
    participant Snowflake

    bulk-insert-patients.ts->>SQS: Send message (patientId, jobId, ...)
    SQS->>Lambda (main.py): Trigger with event (patientId, jobId, ...)
    Lambda (main.py)->>S3: Download bundle for patientId
    alt Bundle missing
        Lambda (main.py)->>Lambda (main.py): Raise error, abort
    else Bundle found
        Lambda (main.py)->>Lambda (main.py): Transform bundle to CSV(s)
        Lambda (main.py)->>S3: Upload CSV(s) (key includes patientId)
        Lambda (main.py)->>Snowflake: setup_database(cxId)
        Lambda (main.py)->>Snowflake: set_patient_status(cxId, patientId, "processing")
        Lambda (main.py)->>Snowflake: create_job_tables(cxId, patientId, jobId)
        Lambda (main.py)->>Snowflake: copy_into_job_table(..., patientId, jobId, ...)
        Lambda (main.py)->>Snowflake: append_job_tables(cxId, patientId, jobId, rebuild_flag)
        Lambda (main.py)->>Snowflake: set_patient_status(cxId, patientId, "completed")
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~35 minutes

Possibly related PRs

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

npm error code ERR_SSL_WRONG_VERSION_NUMBER
npm error errno ERR_SSL_WRONG_VERSION_NUMBER
npm error request to https://10.0.0.28:4873/punycode/-/punycode-2.3.1.tgz failed, reason: C00CAC46737F0000:error:0A00010B:SSL routines:ssl3_get_record:wrong version number:../deps/openssl/openssl/ssl/record/ssl3_record.c:354:
npm error
npm error A complete log of this run can be found in: /.npm/_logs/2025-07-25T22_48_38_454Z-debug-0.log


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e64efcf and 1f3b75b.

📒 Files selected for processing (2)
  • packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv-transform.ts (1 hunks)
  • packages/data-transformation/fhir-to-csv/Dockerfile.lambda (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • packages/data-transformation/fhir-to-csv/Dockerfile.lambda
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{js,jsx,ts,tsx}

📄 CodeRabbit Inference Engine (.cursorrules)

**/*.{js,jsx,ts,tsx}: Don’t use null inside the app, only on code interacting with external interfaces/services, like DB and HTTP; convert to undefined before sending inwards into the code
Use const whenever possible
Use async/await instead of .then()
Naming: classes, enums: PascalCase
Naming: constants, variables, functions: camelCase
Naming: file names: kebab-case
Naming: Don’t use negative names, like notEnabled, prefer isDisabled
If possible, use decomposing objects for function parameters
Prefer Nullish Coalesce (??) than the OR operator (||) when you want to provide a default value
Avoid creating arrow functions
Use truthy syntax instead of in - i.e., if (data.link) not if ('link' in data)
While handling errors, keep the stack trace around: if you create a new Error (e.g., MetriportError), make sure to pass the original error as the new one’s cause so the stack trace is available upstream.
max column length is 100 chars
multi-line comments use /** */
top-level comments go after the import (save pre-import to basic file header, like license)
move literals to constants declared after imports when possible

Files:

  • packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv-transform.ts
**/*.{ts,tsx}

📄 CodeRabbit Inference Engine (.cursorrules)

Use types whenever possible

Files:

  • packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv-transform.ts
**/*.ts

⚙️ CodeRabbit Configuration File

**/*.ts: - Use the Onion Pattern to organize a package's code in layers

  • Try to use immutable code and avoid sharing state across different functions, objects, and systems
  • Try to build code that's idempotent whenever possible
  • Prefer functional programming style functions: small, deterministic, 1 input, 1 output
  • Minimize coupling / dependencies
  • Avoid modifying objects received as parameter
  • Only add comments to code to explain why something was done, not how it works
  • Naming
    • classes, enums: PascalCase
    • constants, variables, functions: camelCase
    • file names: kebab-case
    • table and column names: snake_case
    • Use meaningful names, so whoever is reading the code understands what it means
    • Don’t use negative names, like notEnabled, prefer isDisabled
    • For numeric values, if the type doesn’t convey the unit, add the unit to the name
  • Typescript
    • Use types
    • Prefer const instead of let
    • Avoid any and casting from any to other types
    • Type predicates: only applicable to narrow down the type, not to force a complete type conversion
    • Prefer deconstructing parameters for functions instead of multiple parameters that might be of
      the same type
    • Don’t use null inside the app, only on code interacting with external interfaces/services,
      like DB and HTTP; convert to undefined before sending inwards into the code
    • Use async/await instead of .then()
    • Use the strict equality operator ===, don’t use abstract equality operator ==
    • When calling a Promise-returning function asynchronously (i.e., not awaiting), use .catch() to
      handle errors (see processAsyncError and emptyFunction depending on the case)
    • Date and Time
      • Always use buildDayjs() to create dayjs instances
      • Prefer dayjs.duration(...) to create duration consts and keep them as duration
  • Prefer Nullish Coalesce (??) than the OR operator (||) to provide a default value
  • Avoid creating arrow functions
  • U...

Files:

  • packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv-transform.ts
🧠 Learnings (2)
📓 Common learnings
Learnt from: leite08
PR: metriport/metriport#3814
File: packages/api/src/routes/internal/medical/patient-consolidated.ts:141-174
Timestamp: 2025-05-20T21:26:26.804Z
Learning: The functionality introduced in packages/api/src/routes/internal/medical/patient-consolidated.ts is planned to be refactored in downstream PR #3857, including improvements to error handling and validation.
Learnt from: leite08
PR: metriport/metriport#3412
File: packages/api/src/models/medical/patient-import-mapping.ts:16-41
Timestamp: 2025-04-26T22:17:45.248Z
Learning: The patient_import_mapping table already has a unique constraint on the fields "job_id", "row_number", and "data_pipeline_request_id" defined in the migration file 2025-04-24_02_create-patient_import_mapping-table.ts, which ensures data integrity.
Learnt from: thomasyopes
PR: metriport/metriport#3427
File: packages/core/src/external/ehr/api/sync-patient.ts:16-55
Timestamp: 2025-03-11T20:42:46.516Z
Learning: In the patient synchronization architecture, the flow follows this pattern: (1) `ehr-sync-patient-cloud.ts` sends messages to an SQS queue, (2) the `ehr-sync-patient` Lambda consumes these messages, and (3) the Lambda uses the `syncPatient` function to make the API calls to process the patient data.
Learnt from: thomasyopes
PR: metriport/metriport#3882
File: packages/core/src/external/ehr/command/get-bundle-by-resource-type.ts:31-37
Timestamp: 2025-06-05T14:09:59.683Z
Learning: When EHR handlers in mapping objects are set to `undefined` and there are downstream PRs mentioned in the PR objectives, this typically indicates a staged rollout approach where the core structure is implemented first and specific handler implementations follow in subsequent PRs.
Learnt from: RamilGaripov
PR: metriport/metriport#3676
File: packages/core/src/command/hl7v2-subscriptions/hl7v2-to-fhir-conversion/shared.ts:1-10
Timestamp: 2025-04-16T00:25:25.196Z
Learning: The circular dependency between shared.ts (importing getPatientIdsOrFail) and adt/utils.ts (using unpackPidFieldOrFail) will be addressed in a follow-up PR.
Learnt from: thomasyopes
PR: metriport/metriport#3466
File: packages/api/src/routes/ehr/elation/chart.ts:20-37
Timestamp: 2025-03-17T15:30:34.647Z
Learning: The request body parsing in packages/api/src/routes/ehr/elation/chart.ts is tracked with TODO referencing issue #2170 and will be handled in a separate PR, not in PR #3466.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/api/src/routes/internal/analytics-platform/index.ts:28-28
Timestamp: 2025-07-18T09:33:29.581Z
Learning: In packages/api/src/routes/internal/analytics-platform/index.ts, the patientId parameter handling is intentionally inconsistent: the `/fhir-to-csv` and `/fhir-to-csv/transform` endpoints require patientId (using getFromQueryOrFail) for single patient processing, while the `/fhir-to-csv/batch` endpoint treats patientId as optional (using getFromQuery) because it's designed to run over all patients when no specific patient ID is provided.
Learnt from: leite08
PR: metriport/metriport#4193
File: packages/carequality-sdk/src/client/carequality-fhir.ts:198-208
Timestamp: 2025-07-16T00:54:56.156Z
Learning: User leite08 accepted the risk of parameter mutation in packages/carequality-sdk/src/client/carequality-fhir.ts when modifying org.active = false directly on input parameters, choosing to proceed despite the violation of the "Avoid modifying objects received as parameter" coding guideline.
Learnt from: leite08
PR: metriport/metriport#4215
File: packages/core/src/command/analytics-platform/config.ts:4-13
Timestamp: 2025-07-20T01:18:28.093Z
Learning: User leite08 prefers that code improvement suggestions (like consolidating duplicated schemas) be made on feature PRs rather than release PRs. Such improvements should be deferred to follow-up PRs during the release cycle.
Learnt from: leite08
PR: metriport/metriport#3648
File: packages/utils/src/shared/patient-create.ts:35-53
Timestamp: 2025-04-10T17:30:30.368Z
Learning: The function `mergePatients` in the patient-create.ts file is intentionally designed to take all properties from the first patient (p1) and only update address and contact fields, as the system supports duplicated patients specifically to allow multiple addresses across different records.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:93-97
Timestamp: 2025-07-19T14:04:34.757Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes accepts SQL injection risk in the append_job_tables function's DELETE query where patient_id is directly interpolated, based on their assessment that they control all inputs to the system.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:0-0
Timestamp: 2025-07-18T09:54:43.598Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes prefers to log and swallow exceptions in the copy_into_temp_table function rather than re-raising them. This allows the data transformation job to continue processing other files even if one file fails to copy to Snowflake, supporting partial success scenarios in batch data processing.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/main.py:0-0
Timestamp: 2025-07-18T09:53:38.906Z
Learning: In packages/data-transformation/fhir-to-csv/main.py, S3 upload operations should be allowed to hard fail without try-catch blocks. The team prefers immediate failure over error handling and continuation for S3 upload operations in the FHIR-to-CSV transformation process.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/main.py:0-0
Timestamp: 2025-07-18T09:53:47.448Z
Learning: In packages/data-transformation/fhir-to-csv/main.py, the user thomasyopes prefers to let Snowflake operations (create_temp_tables, copy_into_temp_table, append_temp_tables, rename_temp_tables) hard fail rather than adding explicit try-catch error handling blocks.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_DocumentReference.ini:1-188
Timestamp: 2025-07-17T19:57:25.199Z
Learning: In packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/ directory, configuration files like config_DocumentReference.ini should be kept as-is with their hard-coded field mappings, as confirmed by user thomasyopes. The team prefers to maintain these static configuration files rather than implementing dynamic mapping generation.
Learnt from: RamilGaripov
PR: metriport/metriport#3962
File: packages/api/src/sequelize/migrations/2025-06-17_00_create-cohort-tables.ts:50-56
Timestamp: 2025-06-17T20:03:18.931Z
Learning: In the Metriport codebase, the patient table ID is defined as DataTypes.STRING in the database schema, not UUID. Foreign key references to patient.id should use DataTypes.STRING to maintain consistency.
Learnt from: thomasyopes
PR: metriport/metriport#3970
File: packages/api/src/external/ehr/athenahealth/command/write-back/medication.ts:17-17
Timestamp: 2025-06-06T16:45:31.832Z
Learning: The writeMedicationToChart function in packages/api/src/external/ehr/athenahealth/command/write-back/medication.ts returns a response that is not currently used by any consumers, so changes to its return type are not breaking changes in practice.
Learnt from: leite08
PR: metriport/metriport#4013
File: packages/core/src/fhir-to-cda/cda-templates/components/procedures.ts:104-105
Timestamp: 2025-06-12T22:53:09.606Z
Learning: User leite08 prefers responses in English only.
Learnt from: leite08
PR: metriport/metriport#3991
File: packages/core/src/external/aws/s3.ts:621-630
Timestamp: 2025-06-10T22:20:21.203Z
Learning: User leite08 prefers that all review comments are written in English.
Learnt from: leite08
PR: metriport/metriport#3912
File: packages/core/src/command/consolidated/search/fhir-resource/search-consolidated.ts:74-83
Timestamp: 2025-06-04T00:14:38.041Z
Learning: User leite08 prefers incremental progress in refactoring scenarios, accepting some inconsistency as a stepping stone towards better architecture rather than requiring perfect consistency immediately.
packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv-transform.ts (16)

Learnt from: thomasyopes
PR: #4165
File: packages/api/src/routes/internal/analytics-platform/index.ts:28-28
Timestamp: 2025-07-18T09:33:29.581Z
Learning: In packages/api/src/routes/internal/analytics-platform/index.ts, the patientId parameter handling is intentionally inconsistent: the /fhir-to-csv and /fhir-to-csv/transform endpoints require patientId (using getFromQueryOrFail) for single patient processing, while the /fhir-to-csv/batch endpoint treats patientId as optional (using getFromQuery) because it's designed to run over all patients when no specific patient ID is provided.

Learnt from: leite08
PR: #3814
File: packages/api/src/routes/internal/medical/patient-consolidated.ts:141-174
Timestamp: 2025-05-20T21:26:26.804Z
Learning: The functionality introduced in packages/api/src/routes/internal/medical/patient-consolidated.ts is planned to be refactored in downstream PR #3857, including improvements to error handling and validation.

Learnt from: keshavsaharia
PR: #3885
File: packages/core/src/external/sftp/surescripts/client.ts:1-134
Timestamp: 2025-05-28T19:23:20.179Z
Learning: In packages/core/src/external/sftp/surescripts/client.ts, the standalone getPatientLoadFileName function intentionally omits the transmission ID from the filename, which differs from the class method getPatientLoadFileName. This difference in filename generation is expected behavior.

Learnt from: thomasyopes
PR: #3427
File: packages/lambdas/src/ehr-sync-patient.ts:31-33
Timestamp: 2025-03-11T19:12:22.472Z
Learning: In this codebase, a specific logging pattern is used: console.log is used for early logging before context is available, then a prefixed logger (created with prefixedLog) is initialized after parsing data to include contextual information (like ehr, cxId, practiceId, patientId) in all subsequent log messages.

Learnt from: thomasyopes
PR: #3970
File: packages/api/src/external/ehr/athenahealth/command/write-back/medication.ts:17-17
Timestamp: 2025-06-06T16:45:31.832Z
Learning: The writeMedicationToChart function in packages/api/src/external/ehr/athenahealth/command/write-back/medication.ts returns a response that is not currently used by any consumers, so changes to its return type are not breaking changes in practice.

Learnt from: thomasyopes
PR: #4090
File: packages/core/src/command/conversion-fhir/conversion-fhir-direct.ts:18-25
Timestamp: 2025-06-25T01:56:08.732Z
Learning: In packages/core/src/command/conversion-fhir/conversion-fhir-direct.ts, the ConversionFhirDirect class is designed for local testing purposes and does not require comprehensive error handling for HTTP requests, as confirmed by the user thomasyopes.

Learnt from: RamilGaripov
PR: #3676
File: packages/core/src/command/hl7v2-subscriptions/hl7v2-to-fhir-conversion/shared.ts:1-10
Timestamp: 2025-04-16T00:25:25.196Z
Learning: The circular dependency between shared.ts (importing getPatientIdsOrFail) and adt/utils.ts (using unpackPidFieldOrFail) will be addressed in a follow-up PR.

Learnt from: leite08
PR: #3944
File: packages/core/src/command/consolidated/search/fhir-resource/ingest-lexical.ts:53-53
Timestamp: 2025-06-01T02:28:19.913Z
Learning: In packages/core/src/command/consolidated/search/fhir-resource/ingest-lexical.ts, the processErrors function intentionally throws MetriportError to bubble errors up the call stack rather than handling them locally. This is by design - errors from ingestPatientConsolidated should propagate upward rather than being caught at immediate calling locations.

Learnt from: thomasyopes
PR: #4090
File: packages/core/src/command/conversion-fhir/conversion-fhir-cloud.ts:30-30
Timestamp: 2025-06-25T01:55:42.627Z
Learning: In packages/core/src/command/conversion-fhir/conversion-fhir-cloud.ts, the FHIR converter Lambda endpoint is controlled by the team and guaranteed to return valid JSON in the expected Bundle format, so JSON.parse() error handling and type validation are not necessary for this specific endpoint.

Learnt from: RamilGaripov
PR: #3962
File: packages/api/src/sequelize/migrations/2025-06-17_00_create-cohort-tables.ts:50-56
Timestamp: 2025-06-17T20:03:18.931Z
Learning: In the Metriport codebase, the patient table ID is defined as DataTypes.STRING in the database schema, not UUID. Foreign key references to patient.id should use DataTypes.STRING to maintain consistency.

Learnt from: thomasyopes
PR: #4061
File: packages/api/src/external/ehr/shared/job/bundle/create-resource-diff-bundles/run-job.ts:34-47
Timestamp: 2025-06-19T22:44:49.393Z
Learning: In packages/api/src/external/ehr/shared/job/bundle/create-resource-diff-bundles/run-job.ts, the team prefers to keep refreshEhrBundles operations grouped using fire-and-forget pattern rather than awaiting all operations with Promise.allSettled(). This allows individual refresh operations to run independently without blocking the job runner.

Learnt from: RamilGaripov
PR: #3976
File: packages/api/src/command/medical/cohort/create-cohort.ts:12-12
Timestamp: 2025-06-18T21:06:10.803Z
Learning: The codebase uses template literals with variable interpolation in out() function calls as their standard logging pattern, e.g., out(\createCohort - cx: ${cxId}`)`. This is not a violation of their coding guidelines about avoiding multi-line logs.

Learnt from: thomasyopes
PR: #3882
File: packages/core/src/external/ehr/lambdas/get-bundle-by-resource-type/ehr-get-bundle-by-resource-type-cloud.ts:0-0
Timestamp: 2025-05-28T19:20:47.442Z
Learning: In packages/core/src/external/ehr/lambdas/get-bundle-by-resource-type/ehr-get-bundle-by-resource-type-cloud.ts, the EHR get bundle by resource type Lambda endpoint is guaranteed to return valid JSON, so JSON.parse() error handling is not necessary for this specific endpoint.

Learnt from: leite08
PR: #4034
File: packages/core/src/command/patient-import/patient-or-record-failed.ts:19-30
Timestamp: 2025-06-16T17:03:23.069Z
Learning: In patient import error handling functions like setPatientOrRecordFailed, the team prefers Promise.all over Promise.allSettled to ensure atomic all-or-nothing behavior - if either the job update or patient record update fails, the entire operation should fail rather than allowing partial success.

Learnt from: thomasyopes
PR: #3427
File: packages/core/src/external/ehr/api/sync-patient.ts:16-55
Timestamp: 2025-03-11T20:42:46.516Z
Learning: In the patient synchronization architecture, the flow follows this pattern: (1) ehr-sync-patient-cloud.ts sends messages to an SQS queue, (2) the ehr-sync-patient Lambda consumes these messages, and (3) the Lambda uses the syncPatient function to make the API calls to process the patient data.

Learnt from: thomasyopes
PR: #3882
File: packages/core/src/external/ehr/command/get-bundle-by-resource-type/ehr-get-bundle-by-resource-type-cloud.ts:27-49
Timestamp: 2025-05-28T19:22:09.281Z
Learning: In packages/core/src/external/ehr/command/get-bundle-by-resource-type/ehr-get-bundle-by-resource-type-cloud.ts, the EHR get bundle by resource type Lambda endpoint is guaranteed to return valid JSON, so JSON.parse() error handling is not necessary for this specific endpoint.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: secrets_cdk_stack / deploy
  • GitHub Check: ihe_stack / deploy
  • GitHub Check: location_services_cdk_stack / deploy
  • GitHub Check: infra-api-lambdas / deploy
  • GitHub Check: infra-hl7-notification / deploy
  • GitHub Check: mllp-server / deploy
  • GitHub Check: api / deploy
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: Analyze (javascript)
🔇 Additional comments (1)
packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv-transform.ts (1)

20-20: LGTM! Enhanced logging improves observability.

The addition of patient ID and job ID to the log message provides better traceability for the FHIR to CSV transform process, especially valuable given the shift to patient-level processing mentioned in the PR objectives.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch eng-722-snowflake-per-pt

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.


output_bucket_and_file_keys_and_table_names = []
for file in local_output_files:
output_file_key = f"{dwh}/{transform_name}/{cx_id}/{job_id}/{file.replace('/', '_')}"
output_file_key = f"{dwh}/{transform_name}/{cx_id}/{patient_id}/{job_id}/{file.replace('/', '_')}"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

important, added the pt ID on the S3 path

else:
patient_ids = [patient_id] if patient_id is not None and patient_id != "" else get_patient_ids(api_url, cx_id)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't be used on the cx level, needs pt ID

raise ValueError("PATIENT_ID is not set")
if not job_id:
raise ValueError("JOB_ID is not set")
job_id = job_id.replace(":", "-").replace(".", "-")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ISO date to a format that can use used in the DB

@@ -5,6 +5,7 @@ writemode = append
anchor = condition

[Struct]
familymemberhistory_id = id
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allows to join by ID w/ the familymemberhistory table

@@ -38,7 +38,7 @@ def parse(input_path: str, outputs_folder: str) -> list[str]:
# Process each resource type separately
for resource_type, config_files in config_groups.items():
# First, filter the input file for just this resource type
filtered_input = f'{outputs_folder}/temp_{resource_type.lower()}.ndjson'
filtered_input = f'{outputs_folder}/temp_{patient_id}_{resource_type.lower()}.ndjson'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

important, so we can remove existing data based on the pt ID

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This temp file gets deleted as a heads up. I believe that the "filename" column is the derived from the path + filename, so as long as new file is along the patient ID path, it'll appear in the column.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the file name column is - normalized version of "output_file_path"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, reverted.

@leite08 leite08 marked this pull request as ready for review July 25, 2025 01:54
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (1)
packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py (1)

86-90: Potential data integrity issue with LIKE pattern matching

The DELETE query uses LIKE '%{patient_id}%' which could unintentionally match other patients' data if one patient_id is a substring of another (e.g., patient "123" would match "1234", "0123", etc.).

Consider using a more precise matching pattern or storing patient_id in a dedicated column for exact matching.

🧹 Nitpick comments (1)
packages/data-transformation/fhir-to-csv/main.py (1)

48-53: Improve exception handling with context

When re-raising exceptions, use from to preserve the exception chain for better debugging.

         except s3_client.exceptions.ClientError as e:
             if e.response['Error']['Code'] == '404':
                 logging.warning(f"Bundle {bundle_key} not found in input bucket {input_bucket}")
-                raise ValueError("Bundle not found") 
+                raise ValueError("Bundle not found") from e
             else:
                 raise e
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3ccd202 and 1c401d5.

📒 Files selected for processing (10)
  • packages/api/src/routes/internal/analytics-platform/index.ts (0 hunks)
  • packages/core/src/command/analytics-platform/api/start-fhir-to-csv-transform.ts (0 hunks)
  • packages/data-transformation/fhir-to-csv/main.py (2 hunks)
  • packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_FamilyMemberHistory_condition.ini (2 hunks)
  • packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_MedicationAdministration.ini (4 hunks)
  • packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_MedicationDispense.ini (1 hunks)
  • packages/data-transformation/fhir-to-csv/src/parseNdjsonBundle/parseNdjsonBundle.py (2 hunks)
  • packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py (3 hunks)
  • packages/data-transformation/fhir-to-csv/src/utils/database.py (1 hunks)
  • packages/utils/src/snowflake/bulk-insert-patients.ts (2 hunks)
💤 Files with no reviewable changes (2)
  • packages/api/src/routes/internal/analytics-platform/index.ts
  • packages/core/src/command/analytics-platform/api/start-fhir-to-csv-transform.ts
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{js,jsx,ts,tsx}

📄 CodeRabbit Inference Engine (.cursorrules)

**/*.{js,jsx,ts,tsx}: Don’t use null inside the app, only on code interacting with external interfaces/services, like DB and HTTP; convert to undefined before sending inwards into the code
Use const whenever possible
Use async/await instead of .then()
Naming: classes, enums: PascalCase
Naming: constants, variables, functions: camelCase
Naming: file names: kebab-case
Naming: Don’t use negative names, like notEnabled, prefer isDisabled
If possible, use decomposing objects for function parameters
Prefer Nullish Coalesce (??) than the OR operator (||) when you want to provide a default value
Avoid creating arrow functions
Use truthy syntax instead of in - i.e., if (data.link) not if ('link' in data)
While handling errors, keep the stack trace around: if you create a new Error (e.g., MetriportError), make sure to pass the original error as the new one’s cause so the stack trace is available upstream.
max column length is 100 chars
multi-line comments use /** */
top-level comments go after the import (save pre-import to basic file header, like license)
move literals to constants declared after imports when possible

Files:

  • packages/utils/src/snowflake/bulk-insert-patients.ts
**/*.{ts,tsx}

📄 CodeRabbit Inference Engine (.cursorrules)

Use types whenever possible

Files:

  • packages/utils/src/snowflake/bulk-insert-patients.ts
**/*.ts

⚙️ CodeRabbit Configuration File

**/*.ts: - Use the Onion Pattern to organize a package's code in layers

  • Try to use immutable code and avoid sharing state across different functions, objects, and systems
  • Try to build code that's idempotent whenever possible
  • Prefer functional programming style functions: small, deterministic, 1 input, 1 output
  • Minimize coupling / dependencies
  • Avoid modifying objects received as parameter
  • Only add comments to code to explain why something was done, not how it works
  • Naming
    • classes, enums: PascalCase
    • constants, variables, functions: camelCase
    • file names: kebab-case
    • table and column names: snake_case
    • Use meaningful names, so whoever is reading the code understands what it means
    • Don’t use negative names, like notEnabled, prefer isDisabled
    • For numeric values, if the type doesn’t convey the unit, add the unit to the name
  • Typescript
    • Use types
    • Prefer const instead of let
    • Avoid any and casting from any to other types
    • Type predicates: only applicable to narrow down the type, not to force a complete type conversion
    • Prefer deconstructing parameters for functions instead of multiple parameters that might be of
      the same type
    • Don’t use null inside the app, only on code interacting with external interfaces/services,
      like DB and HTTP; convert to undefined before sending inwards into the code
    • Use async/await instead of .then()
    • Use the strict equality operator ===, don’t use abstract equality operator ==
    • When calling a Promise-returning function asynchronously (i.e., not awaiting), use .catch() to
      handle errors (see processAsyncError and emptyFunction depending on the case)
    • Date and Time
      • Always use buildDayjs() to create dayjs instances
      • Prefer dayjs.duration(...) to create duration consts and keep them as duration
  • Prefer Nullish Coalesce (??) than the OR operator (||) to provide a default value
  • Avoid creating arrow functions
  • U...

Files:

  • packages/utils/src/snowflake/bulk-insert-patients.ts
🧠 Learnings (9)
📓 Common learnings
Learnt from: leite08
PR: metriport/metriport#3814
File: packages/api/src/routes/internal/medical/patient-consolidated.ts:141-174
Timestamp: 2025-05-20T21:26:26.804Z
Learning: The functionality introduced in packages/api/src/routes/internal/medical/patient-consolidated.ts is planned to be refactored in downstream PR #3857, including improvements to error handling and validation.
Learnt from: leite08
PR: metriport/metriport#3412
File: packages/api/src/models/medical/patient-import-mapping.ts:16-41
Timestamp: 2025-04-26T22:17:45.248Z
Learning: The patient_import_mapping table already has a unique constraint on the fields "job_id", "row_number", and "data_pipeline_request_id" defined in the migration file 2025-04-24_02_create-patient_import_mapping-table.ts, which ensures data integrity.
Learnt from: thomasyopes
PR: metriport/metriport#3427
File: packages/core/src/external/ehr/api/sync-patient.ts:16-55
Timestamp: 2025-03-11T20:42:46.516Z
Learning: In the patient synchronization architecture, the flow follows this pattern: (1) `ehr-sync-patient-cloud.ts` sends messages to an SQS queue, (2) the `ehr-sync-patient` Lambda consumes these messages, and (3) the Lambda uses the `syncPatient` function to make the API calls to process the patient data.
Learnt from: thomasyopes
PR: metriport/metriport#3882
File: packages/core/src/external/ehr/command/get-bundle-by-resource-type.ts:31-37
Timestamp: 2025-06-05T14:09:59.683Z
Learning: When EHR handlers in mapping objects are set to `undefined` and there are downstream PRs mentioned in the PR objectives, this typically indicates a staged rollout approach where the core structure is implemented first and specific handler implementations follow in subsequent PRs.
Learnt from: RamilGaripov
PR: metriport/metriport#3676
File: packages/core/src/command/hl7v2-subscriptions/hl7v2-to-fhir-conversion/shared.ts:1-10
Timestamp: 2025-04-16T00:25:25.196Z
Learning: The circular dependency between shared.ts (importing getPatientIdsOrFail) and adt/utils.ts (using unpackPidFieldOrFail) will be addressed in a follow-up PR.
Learnt from: thomasyopes
PR: metriport/metriport#3466
File: packages/api/src/routes/ehr/elation/chart.ts:20-37
Timestamp: 2025-03-17T15:30:34.647Z
Learning: The request body parsing in packages/api/src/routes/ehr/elation/chart.ts is tracked with TODO referencing issue #2170 and will be handled in a separate PR, not in PR #3466.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/api/src/routes/internal/analytics-platform/index.ts:28-28
Timestamp: 2025-07-18T09:33:29.581Z
Learning: In packages/api/src/routes/internal/analytics-platform/index.ts, the patientId parameter handling is intentionally inconsistent: the `/fhir-to-csv` and `/fhir-to-csv/transform` endpoints require patientId (using getFromQueryOrFail) for single patient processing, while the `/fhir-to-csv/batch` endpoint treats patientId as optional (using getFromQuery) because it's designed to run over all patients when no specific patient ID is provided.
Learnt from: leite08
PR: metriport/metriport#4193
File: packages/carequality-sdk/src/client/carequality-fhir.ts:198-208
Timestamp: 2025-07-16T00:54:56.156Z
Learning: User leite08 accepted the risk of parameter mutation in packages/carequality-sdk/src/client/carequality-fhir.ts when modifying org.active = false directly on input parameters, choosing to proceed despite the violation of the "Avoid modifying objects received as parameter" coding guideline.
Learnt from: leite08
PR: metriport/metriport#4215
File: packages/core/src/command/analytics-platform/config.ts:4-13
Timestamp: 2025-07-20T01:18:28.093Z
Learning: User leite08 prefers that code improvement suggestions (like consolidating duplicated schemas) be made on feature PRs rather than release PRs. Such improvements should be deferred to follow-up PRs during the release cycle.
Learnt from: leite08
PR: metriport/metriport#3648
File: packages/utils/src/shared/patient-create.ts:35-53
Timestamp: 2025-04-10T17:30:30.368Z
Learning: The function `mergePatients` in the patient-create.ts file is intentionally designed to take all properties from the first patient (p1) and only update address and contact fields, as the system supports duplicated patients specifically to allow multiple addresses across different records.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_DocumentReference.ini:1-188
Timestamp: 2025-07-17T19:57:25.199Z
Learning: In packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/ directory, configuration files like config_DocumentReference.ini should be kept as-is with their hard-coded field mappings, as confirmed by user thomasyopes. The team prefers to maintain these static configuration files rather than implementing dynamic mapping generation.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:93-97
Timestamp: 2025-07-19T14:04:34.757Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes accepts SQL injection risk in the append_job_tables function's DELETE query where patient_id is directly interpolated, based on their assessment that they control all inputs to the system.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/main.py:0-0
Timestamp: 2025-07-18T09:53:38.906Z
Learning: In packages/data-transformation/fhir-to-csv/main.py, S3 upload operations should be allowed to hard fail without try-catch blocks. The team prefers immediate failure over error handling and continuation for S3 upload operations in the FHIR-to-CSV transformation process.
Learnt from: leite08
PR: metriport/metriport#3412
File: packages/api/src/models/medical/patient-import-mapping.ts:16-41
Timestamp: 2025-04-26T22:17:45.248Z
Learning: The patient_import_mapping table already has a unique constraint on the fields "job_id", "row_number", and "data_pipeline_request_id" defined in the migration file, which prevents duplicates in the mapping between patient imports and data pipeline requests.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/main.py:0-0
Timestamp: 2025-07-18T09:53:47.448Z
Learning: In packages/data-transformation/fhir-to-csv/main.py, the user thomasyopes prefers to let Snowflake operations (create_temp_tables, copy_into_temp_table, append_temp_tables, rename_temp_tables) hard fail rather than adding explicit try-catch error handling blocks.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:0-0
Timestamp: 2025-07-18T09:54:43.598Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes prefers to log and swallow exceptions in the copy_into_temp_table function rather than re-raising them. This allows the data transformation job to continue processing other files even if one file fails to copy to Snowflake, supporting partial success scenarios in batch data processing.
Learnt from: keshavsaharia
PR: metriport/metriport#3885
File: packages/utils/src/surescripts/generate-patient-load-file.ts:92-94
Timestamp: 2025-05-30T01:11:35.300Z
Learning: In V0 of the Surescripts implementation (packages/utils/src/surescripts/generate-patient-load-file.ts), there is always only 1 facility ID, so the current facility_ids parsing logic with substring operations is sufficient and additional validation is not necessary.
Learnt from: leite08
PR: metriport/metriport#4013
File: packages/core/src/fhir-to-cda/cda-templates/components/procedures.ts:104-105
Timestamp: 2025-06-12T22:53:09.606Z
Learning: User leite08 prefers responses in English only.
Learnt from: leite08
PR: metriport/metriport#3991
File: packages/core/src/external/aws/s3.ts:621-630
Timestamp: 2025-06-10T22:20:21.203Z
Learning: User leite08 prefers that all review comments are written in English.
Learnt from: leite08
PR: metriport/metriport#3912
File: packages/core/src/command/consolidated/search/fhir-resource/search-consolidated.ts:74-83
Timestamp: 2025-06-04T00:14:38.041Z
Learning: User leite08 prefers incremental progress in refactoring scenarios, accepting some inconsistency as a stepping stone towards better architecture rather than requiring perfect consistency immediately.
packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_FamilyMemberHistory_condition.ini (1)

Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_DocumentReference.ini:1-188
Timestamp: 2025-07-17T19:57:25.199Z
Learning: In packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/ directory, configuration files like config_DocumentReference.ini should be kept as-is with their hard-coded field mappings, as confirmed by user thomasyopes. The team prefers to maintain these static configuration files rather than implementing dynamic mapping generation.

packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_MedicationDispense.ini (5)

Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_DocumentReference.ini:1-188
Timestamp: 2025-07-17T19:57:25.199Z
Learning: In packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/ directory, configuration files like config_DocumentReference.ini should be kept as-is with their hard-coded field mappings, as confirmed by user thomasyopes. The team prefers to maintain these static configuration files rather than implementing dynamic mapping generation.

Learnt from: keshavsaharia
PR: #4075
File: packages/core/src/external/surescripts/fhir/medication-request.ts:76-83
Timestamp: 2025-06-25T18:42:07.231Z
Learning: In packages/core/src/external/surescripts/fhir/medication-request.ts, the getDispenseNote and getDosageInstruction functions are intentionally identical because the dashboard requires both the note and dosageInstruction fields of MedicationRequest to be populated with detail.directions for proper note display functionality.

Learnt from: thomasyopes
PR: #3970
File: packages/api/src/external/ehr/athenahealth/command/write-back/medication.ts:17-17
Timestamp: 2025-06-06T16:45:31.832Z
Learning: The writeMedicationToChart function in packages/api/src/external/ehr/athenahealth/command/write-back/medication.ts returns a response that is not currently used by any consumers, so changes to its return type are not breaking changes in practice.

Learnt from: keshavsaharia
PR: #4045
File: packages/core/src/external/surescripts/fhir/coverage.ts:0-0
Timestamp: 2025-06-18T18:34:10.489Z
Learning: Coverage resources in Surescripts FHIR conversion are currently excluded from bundles to prevent skewing data lift metrics. The team plans to examine available insurance data thoroughly before including properly structured Coverage resources with mandatory FHIR R4 elements like beneficiary and payor references.

Learnt from: thomasyopes
PR: #3891
File: packages/api/src/routes/internal/ehr/patient.ts:142-142
Timestamp: 2025-06-20T15:35:00.546Z
Learning: In the EHR resource diff contribution system, there are no "invalid" resource types. The system is designed to gracefully handle any resource type string by attempting to find a corresponding bundle, and if no bundle exists for that resource type, it will simply exit early rather than throwing an error.

packages/utils/src/snowflake/bulk-insert-patients.ts (19)

Learnt from: keshavsaharia
PR: #3885
File: packages/core/src/external/sftp/surescripts/client.ts:1-134
Timestamp: 2025-05-28T19:23:20.179Z
Learning: In packages/core/src/external/sftp/surescripts/client.ts, the standalone getPatientLoadFileName function intentionally omits the transmission ID from the filename, which differs from the class method getPatientLoadFileName. This difference in filename generation is expected behavior.

Learnt from: leite08
PR: #3814
File: packages/api/src/routes/internal/medical/patient-consolidated.ts:141-174
Timestamp: 2025-05-20T21:26:26.804Z
Learning: The functionality introduced in packages/api/src/routes/internal/medical/patient-consolidated.ts is planned to be refactored in downstream PR #3857, including improvements to error handling and validation.

Learnt from: RamilGaripov
PR: #3962
File: packages/api/src/sequelize/migrations/2025-06-17_00_create-cohort-tables.ts:50-56
Timestamp: 2025-06-17T20:03:18.931Z
Learning: In the Metriport codebase, the patient table ID is defined as DataTypes.STRING in the database schema, not UUID. Foreign key references to patient.id should use DataTypes.STRING to maintain consistency.

Learnt from: thomasyopes
PR: #4165
File: packages/api/src/routes/internal/analytics-platform/index.ts:28-28
Timestamp: 2025-07-18T09:33:29.581Z
Learning: In packages/api/src/routes/internal/analytics-platform/index.ts, the patientId parameter handling is intentionally inconsistent: the /fhir-to-csv and /fhir-to-csv/transform endpoints require patientId (using getFromQueryOrFail) for single patient processing, while the /fhir-to-csv/batch endpoint treats patientId as optional (using getFromQuery) because it's designed to run over all patients when no specific patient ID is provided.

Learnt from: RamilGaripov
PR: #3676
File: packages/core/src/command/hl7v2-subscriptions/hl7v2-to-fhir-conversion/shared.ts:1-10
Timestamp: 2025-04-16T00:25:25.196Z
Learning: The circular dependency between shared.ts (importing getPatientIdsOrFail) and adt/utils.ts (using unpackPidFieldOrFail) will be addressed in a follow-up PR.

Learnt from: leite08
PR: #3412
File: packages/api/src/models/medical/patient-import-mapping.ts:16-41
Timestamp: 2025-04-26T22:17:45.248Z
Learning: The patient_import_mapping table already has a unique constraint on the fields "job_id", "row_number", and "data_pipeline_request_id" defined in the migration file 2025-04-24_02_create-patient_import_mapping-table.ts, which ensures data integrity.

Learnt from: thomasyopes
PR: #3427
File: packages/core/src/external/ehr/api/sync-patient.ts:16-55
Timestamp: 2025-03-11T20:42:46.516Z
Learning: In the patient synchronization architecture, the flow follows this pattern: (1) ehr-sync-patient-cloud.ts sends messages to an SQS queue, (2) the ehr-sync-patient Lambda consumes these messages, and (3) the Lambda uses the syncPatient function to make the API calls to process the patient data.

Learnt from: leite08
PR: #4034
File: packages/core/src/command/patient-import/patient-or-record-failed.ts:19-30
Timestamp: 2025-06-16T17:03:23.069Z
Learning: In patient import error handling functions like setPatientOrRecordFailed, the team prefers Promise.all over Promise.allSettled to ensure atomic all-or-nothing behavior - if either the job update or patient record update fails, the entire operation should fail rather than allowing partial success.

Learnt from: keshavsaharia
PR: #3885
File: packages/utils/src/surescripts/generate-patient-load-file.ts:92-94
Timestamp: 2025-05-30T01:11:35.300Z
Learning: In V0 of the Surescripts implementation (packages/utils/src/surescripts/generate-patient-load-file.ts), there is always only 1 facility ID, so the current facility_ids parsing logic with substring operations is sufficient and additional validation is not necessary.

Learnt from: leite08
PR: #3953
File: packages/core/src/command/consolidated/search/fhir-resource/tests/search-consolidated-setup.ts:28-30
Timestamp: 2025-06-03T21:02:23.374Z
Learning: The makePatient() function returns PatientWithId type which requires the id field to be present, ensuring patient.id is never undefined.

Learnt from: lucasdellabella
PR: #4098
File: packages/api/src/routes/medical/dtos/tcm-encounter-dto.ts:10-22
Timestamp: 2025-06-27T01:50:14.227Z
Learning: In packages/api/src/routes/medical/dtos/tcm-encounter-dto.ts, the patient fields patientData.firstName, patientData.lastName, and patientData.dob are guaranteed to be non-nullable values, so defensive null/undefined checks are not needed when accessing these fields.

Learnt from: leite08
PR: #3814
File: packages/core/src/command/consolidated/search/fhir-resource/search-semantic.ts:36-36
Timestamp: 2025-05-19T13:53:09.828Z
Learning: In the getConsolidatedPatientData function from @metriport/core/command/consolidated/consolidated-get, only the patient parameter is required. Other parameters like requestId, resources, dateFrom, dateTo, fromDashboard, and forceDataFromFhir are all optional.

Learnt from: RamilGaripov
PR: #3976
File: packages/api/src/routes/medical/patient.ts:541-543
Timestamp: 2025-06-18T21:05:22.256Z
Learning: In packages/api/src/routes/medical/patient.ts, inline schema definitions like cohortIdSchema are acceptable and don't need to be moved to separate schema files when the user prefers to keep them inline.

Learnt from: thomasyopes
PR: #3383
File: packages/api/src/routes/ehr-internal/canvas/internal/patient.ts:10-24
Timestamp: 2025-03-05T21:42:32.929Z
Learning: The processPatientsFromAppointments endpoint in the Canvas API is intentionally designed to be asynchronous (fire-and-forget pattern). The endpoint returns a 200 OK response immediately after initiating the process, without waiting for it to complete. Error handling is handled through the .catch() method with processAsyncError rather than awaiting the promise.

Learnt from: thomasyopes
PR: #3788
File: packages/api/src/external/ehr/shared/command/create-resource-diff-bundles.ts:0-0
Timestamp: 2025-05-08T19:16:10.258Z
Learning: Avoid using Promise.all for parallel execution of functions that access the cxMapping table, as this may lead to database locking issues. Serial execution (using for loops with await) is preferred for such operations even if it might be slower for large datasets.

Learnt from: thomasyopes
PR: #4061
File: packages/api/src/external/ehr/shared/job/bundle/create-resource-diff-bundles/run-job.ts:34-47
Timestamp: 2025-06-19T22:44:49.393Z
Learning: In packages/api/src/external/ehr/shared/job/bundle/create-resource-diff-bundles/run-job.ts, the team prefers to keep refreshEhrBundles operations grouped using fire-and-forget pattern rather than awaiting all operations with Promise.allSettled(). This allows individual refresh operations to run independently without blocking the job runner.

Learnt from: keshavsaharia
PR: #4099
File: packages/utils/src/surescripts/find-largest.ts:84-84
Timestamp: 2025-06-26T06:26:17.643Z
Learning: In utility scripts for Surescripts analysis tools, when orgName parameter is undefined, prefer using date timestamps as fallbacks for directory naming rather than using cxId or other identifiers. This creates unique directories for each run and helps with organization.

Learnt from: thomasyopes
PR: #3427
File: packages/lambdas/src/ehr-sync-patient.ts:31-33
Timestamp: 2025-03-11T19:12:22.472Z
Learning: In this codebase, a specific logging pattern is used: console.log is used for early logging before context is available, then a prefixed logger (created with prefixedLog) is initialized after parsing data to include contextual information (like ehr, cxId, practiceId, patientId) in all subsequent log messages.

Learnt from: thomasyopes
PR: #3997
File: packages/api/src/command/job/patient/status/cancel.ts:30-34
Timestamp: 2025-06-12T23:25:08.139Z
Learning: cancelPatientJob is intentionally non-idempotent: if the current job status is already cancelled, the function must throw a BadRequestError rather than silently succeeding.

packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_MedicationAdministration.ini (6)

Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_DocumentReference.ini:1-188
Timestamp: 2025-07-17T19:57:25.199Z
Learning: In packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/ directory, configuration files like config_DocumentReference.ini should be kept as-is with their hard-coded field mappings, as confirmed by user thomasyopes. The team prefers to maintain these static configuration files rather than implementing dynamic mapping generation.

Learnt from: thomasyopes
PR: #3970
File: packages/api/src/external/ehr/athenahealth/command/write-back/medication.ts:17-17
Timestamp: 2025-06-06T16:45:31.832Z
Learning: The writeMedicationToChart function in packages/api/src/external/ehr/athenahealth/command/write-back/medication.ts returns a response that is not currently used by any consumers, so changes to its return type are not breaking changes in practice.

Learnt from: keshavsaharia
PR: #3885
File: packages/core/src/external/surescripts/schema/load.ts:65-71
Timestamp: 2025-05-30T03:42:53.380Z
Learning: In Surescripts schema files like packages/core/src/external/surescripts/schema/load.ts, the key property in field definitions is only used for direct one-to-one field mappings. When multiple fields are derived from the same source property but with different transformations (like extracting date vs. time portions), they should not have key properties as they represent transformations rather than direct mappings.

Learnt from: thomasyopes
PR: #3936
File: packages/core/src/external/ehr/athenahealth/index.ts:472-482
Timestamp: 2025-06-06T15:37:44.571Z
Learning: In AthenaHealth medication creation, even if medication statements have the same start/stop dates, they should not be deduplicated because there may be relevant differences beyond dates (dosage, administration routes, prescriber info, etc.) that make each statement unique and worth creating separately.

Learnt from: thomasyopes
PR: #3981
File: packages/core/src/external/ehr/canvas/index.ts:1376-1379
Timestamp: 2025-06-10T20:14:27.998Z
Learning: Canvas EHR only supports the MedicationStatement statuses "active", "stopped", and "entered-in-error"; it does not accept "completed".

Learnt from: keshavsaharia
PR: #4075
File: packages/core/src/external/surescripts/fhir/medication-request.ts:76-83
Timestamp: 2025-06-25T18:42:07.231Z
Learning: In packages/core/src/external/surescripts/fhir/medication-request.ts, the getDispenseNote and getDosageInstruction functions are intentionally identical because the dashboard requires both the note and dosageInstruction fields of MedicationRequest to be populated with detail.directions for proper note display functionality.

packages/data-transformation/fhir-to-csv/src/parseNdjsonBundle/parseNdjsonBundle.py (7)

Learnt from: thomasyopes
PR: #4165
File: packages/api/src/routes/internal/analytics-platform/index.ts:28-28
Timestamp: 2025-07-18T09:33:29.581Z
Learning: In packages/api/src/routes/internal/analytics-platform/index.ts, the patientId parameter handling is intentionally inconsistent: the /fhir-to-csv and /fhir-to-csv/transform endpoints require patientId (using getFromQueryOrFail) for single patient processing, while the /fhir-to-csv/batch endpoint treats patientId as optional (using getFromQuery) because it's designed to run over all patients when no specific patient ID is provided.

Learnt from: leite08
PR: #3814
File: packages/api/src/routes/internal/medical/patient-consolidated.ts:141-174
Timestamp: 2025-05-20T21:26:26.804Z
Learning: The functionality introduced in packages/api/src/routes/internal/medical/patient-consolidated.ts is planned to be refactored in downstream PR #3857, including improvements to error handling and validation.

Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/parseFhir/parseFhir.py:94-175
Timestamp: 2025-07-17T19:58:54.738Z
Learning: The user thomasyopes prefers to keep the parseFhir.py file in packages/data-transformation/fhir-to-csv/src/parseFhir/ as-is, declining code style improvements suggested by static analysis tools like Ruff.

Learnt from: RamilGaripov
PR: #3962
File: packages/api/src/models/medical/cohort.ts:0-0
Timestamp: 2025-06-17T23:14:46.017Z
Learning: In the Metriport codebase, cohort deletion is handled with application-level validation that prevents deletion of cohorts if there are any patients assigned to them. This is why the cohortId foreign key constraint in patient_cohort table does not have cascade delete behavior - it's intentionally asymmetric. The patientId has cascade delete to clean up cohort assignments when patients are deleted, but cohortId lacks cascade delete because cohorts with patients should not be deletable at all according to business rules.

Learnt from: thomasyopes
PR: #3771
File: packages/api/src/external/ehr/healthie/command/process-patients-from-appointments.ts:121-128
Timestamp: 2025-05-01T16:15:20.823Z
Learning: In the Healthie EHR system, patient IDs are globally unique across practices, making patientId alone sufficient for deduplication without needing to include practiceId.

Learnt from: leite08
PR: #3412
File: packages/api/src/models/medical/patient-import-mapping.ts:16-41
Timestamp: 2025-04-26T22:17:45.248Z
Learning: The patient_import_mapping table already has a unique constraint on the fields "job_id", "row_number", and "data_pipeline_request_id" defined in the migration file 2025-04-24_02_create-patient_import_mapping-table.ts, which ensures data integrity.

Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_DocumentReference.ini:1-188
Timestamp: 2025-07-17T19:57:25.199Z
Learning: In packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/ directory, configuration files like config_DocumentReference.ini should be kept as-is with their hard-coded field mappings, as confirmed by user thomasyopes. The team prefers to maintain these static configuration files rather than implementing dynamic mapping generation.

packages/data-transformation/fhir-to-csv/main.py (18)

Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/main.py:0-0
Timestamp: 2025-07-18T09:53:38.906Z
Learning: In packages/data-transformation/fhir-to-csv/main.py, S3 upload operations should be allowed to hard fail without try-catch blocks. The team prefers immediate failure over error handling and continuation for S3 upload operations in the FHIR-to-CSV transformation process.

Learnt from: thomasyopes
PR: #4165
File: packages/api/src/routes/internal/analytics-platform/index.ts:28-28
Timestamp: 2025-07-18T09:33:29.581Z
Learning: In packages/api/src/routes/internal/analytics-platform/index.ts, the patientId parameter handling is intentionally inconsistent: the /fhir-to-csv and /fhir-to-csv/transform endpoints require patientId (using getFromQueryOrFail) for single patient processing, while the /fhir-to-csv/batch endpoint treats patientId as optional (using getFromQuery) because it's designed to run over all patients when no specific patient ID is provided.

Learnt from: leite08
PR: #3814
File: packages/api/src/routes/internal/medical/patient-consolidated.ts:141-174
Timestamp: 2025-05-20T21:26:26.804Z
Learning: The functionality introduced in packages/api/src/routes/internal/medical/patient-consolidated.ts is planned to be refactored in downstream PR #3857, including improvements to error handling and validation.

Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/main.py:0-0
Timestamp: 2025-07-18T09:53:47.448Z
Learning: In packages/data-transformation/fhir-to-csv/main.py, the user thomasyopes prefers to let Snowflake operations (create_temp_tables, copy_into_temp_table, append_temp_tables, rename_temp_tables) hard fail rather than adding explicit try-catch error handling blocks.

Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:93-97
Timestamp: 2025-07-19T14:04:34.757Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes accepts SQL injection risk in the append_job_tables function's DELETE query where patient_id is directly interpolated, based on their assessment that they control all inputs to the system.

Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:0-0
Timestamp: 2025-07-18T09:54:43.598Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes prefers to log and swallow exceptions in the copy_into_temp_table function rather than re-raising them. This allows the data transformation job to continue processing other files even if one file fails to copy to Snowflake, supporting partial success scenarios in batch data processing.

Learnt from: leite08
PR: #3859
File: packages/utils/src/consolidated/check-patient-consolidated.ts:28-32
Timestamp: 2025-05-26T16:24:10.245Z
Learning: For utility scripts in the packages/utils directory, developers are expected to update the main constants at the top of the file (like bundleFilePath, cxId, patientId) rather than implementing command-line argument parsing or environment variable reading. This is the preferred pattern for configuring utility scripts in this codebase.

Learnt from: keshavsaharia
PR: #3885
File: packages/core/src/external/sftp/surescripts/client.ts:1-134
Timestamp: 2025-05-28T19:23:20.179Z
Learning: In packages/core/src/external/sftp/surescripts/client.ts, the standalone getPatientLoadFileName function intentionally omits the transmission ID from the filename, which differs from the class method getPatientLoadFileName. This difference in filename generation is expected behavior.

Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/parseFhir/parseFhir.py:94-175
Timestamp: 2025-07-17T19:58:54.738Z
Learning: The user thomasyopes prefers to keep the parseFhir.py file in packages/data-transformation/fhir-to-csv/src/parseFhir/ as-is, declining code style improvements suggested by static analysis tools like Ruff.

Learnt from: leite08
PR: #3412
File: packages/api/.env.test:12-12
Timestamp: 2025-04-24T01:57:35.097Z
Learning: The PATIENT_IMPORT_BUCKET_NAME environment variable is specific to the API package and doesn't need to be added to other packages' environment configurations.

Learnt from: lucasdellabella
PR: #3855
File: packages/core/src/command/hl7-notification/hl7-notification-webhook-sender-direct.ts:88-104
Timestamp: 2025-05-16T02:10:01.792Z
Learning: When logging file paths or other strings that might contain patient identifiers like patientId or cxId, these should be masked or redacted (e.g., using regex replacement) to avoid logging Protected Health Information (PHI) to CloudWatch or error tracking systems like Sentry, ensuring HIPAA compliance.

Learnt from: leite08
PR: #3612
File: packages/core/src/command/conversion-result/send-multiple-conversion-results.ts:48-62
Timestamp: 2025-04-05T01:03:51.989Z
Learning: Patient IDs in this codebase are UUIDs and don't contain PII, so they're safe to include in logs and error contexts.

Learnt from: RamilGaripov
PR: #3962
File: packages/api/src/sequelize/migrations/2025-06-17_00_create-cohort-tables.ts:50-56
Timestamp: 2025-06-17T20:03:18.931Z
Learning: In the Metriport codebase, the patient table ID is defined as DataTypes.STRING in the database schema, not UUID. Foreign key references to patient.id should use DataTypes.STRING to maintain consistency.

Learnt from: thomasyopes
PR: #3883
File: packages/api/src/external/ehr/shared/job/bundle/create-resource-diff-bundles/start-jobs-across-ehrs.ts:38-46
Timestamp: 2025-05-28T19:42:17.301Z
Learning: Athena patient IDs necessarily contain a "." character - this is a required format for all valid Athena patient IDs.

Learnt from: thomasyopes
PR: #3973
File: packages/core/src/external/ehr/athenahealth/shared.ts:8-16
Timestamp: 2025-06-06T18:17:23.577Z
Learning: For internal functions in the Metriport codebase where inputs are controlled by the development team, input validation for required parameters like cxId and practiceId is not needed and should be avoided to keep functions simple.

Learnt from: RamilGaripov
PR: #3675
File: packages/core/src/command/hl7v2-subscriptions/hl7v2-to-fhir-converter.ts:58-59
Timestamp: 2025-04-23T08:43:00.803Z
Learning: In the Metriport codebase, cxId is expected to be a UUID format (containing only alphanumeric characters and hyphens), which doesn't require URL encoding when used in query parameters.

Learnt from: leite08
PR: #3869
File: packages/core/src/external/fhir/export/string/resources/medication-request.ts:93-94
Timestamp: 2025-05-21T16:49:07.877Z
Learning: Raw ISO date strings in FHIR resources (like authoredOn in MedicationRequest) should be preserved as-is without additional formatting when converting resources to string representations.

Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:54-58
Timestamp: 2025-07-18T09:54:01.790Z
Learning: The user thomasyopes prefers to keep the setupSnowflake.py file in packages/data-transformation/fhir-to-csv/src/setupSnowflake/ as-is, declining suggestions for additional error handling around configuration file parsing.

packages/data-transformation/fhir-to-csv/src/utils/database.py (9)

Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:93-97
Timestamp: 2025-07-19T14:04:34.757Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes accepts SQL injection risk in the append_job_tables function's DELETE query where patient_id is directly interpolated, based on their assessment that they control all inputs to the system.

Learnt from: RamilGaripov
PR: #3962
File: packages/api/src/sequelize/migrations/2025-06-17_00_create-cohort-tables.ts:50-56
Timestamp: 2025-06-17T20:03:18.931Z
Learning: In the Metriport codebase, the patient table ID is defined as DataTypes.STRING in the database schema, not UUID. Foreign key references to patient.id should use DataTypes.STRING to maintain consistency.

Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/main.py:0-0
Timestamp: 2025-07-18T09:53:47.448Z
Learning: In packages/data-transformation/fhir-to-csv/main.py, the user thomasyopes prefers to let Snowflake operations (create_temp_tables, copy_into_temp_table, append_temp_tables, rename_temp_tables) hard fail rather than adding explicit try-catch error handling blocks.

Learnt from: thomasyopes
PR: #4165
File: packages/api/src/routes/internal/analytics-platform/index.ts:28-28
Timestamp: 2025-07-18T09:33:29.581Z
Learning: In packages/api/src/routes/internal/analytics-platform/index.ts, the patientId parameter handling is intentionally inconsistent: the /fhir-to-csv and /fhir-to-csv/transform endpoints require patientId (using getFromQueryOrFail) for single patient processing, while the /fhir-to-csv/batch endpoint treats patientId as optional (using getFromQuery) because it's designed to run over all patients when no specific patient ID is provided.

Learnt from: leite08
PR: #3412
File: packages/api/src/models/medical/patient-import-mapping.ts:16-41
Timestamp: 2025-04-26T22:17:45.248Z
Learning: The patient_import_mapping table already has a unique constraint on the fields "job_id", "row_number", and "data_pipeline_request_id" defined in the migration file 2025-04-24_02_create-patient_import_mapping-table.ts, which ensures data integrity.

Learnt from: leite08
PR: #3814
File: packages/api/src/routes/internal/medical/patient-consolidated.ts:141-174
Timestamp: 2025-05-20T21:26:26.804Z
Learning: The functionality introduced in packages/api/src/routes/internal/medical/patient-consolidated.ts is planned to be refactored in downstream PR #3857, including improvements to error handling and validation.

Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:0-0
Timestamp: 2025-07-18T09:54:43.598Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes prefers to log and swallow exceptions in the copy_into_temp_table function rather than re-raising them. This allows the data transformation job to continue processing other files even if one file fails to copy to Snowflake, supporting partial success scenarios in batch data processing.

Learnt from: leite08
PR: #3412
File: packages/api/src/models/medical/patient-import-mapping.ts:16-41
Timestamp: 2025-04-26T22:17:45.248Z
Learning: The patient_import_mapping table already has a unique constraint on the fields "job_id", "row_number", and "data_pipeline_request_id" defined in the migration file, which prevents duplicates in the mapping between patient imports and data pipeline requests.

Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/parseFhir/parseFhir.py:94-175
Timestamp: 2025-07-17T19:58:54.738Z
Learning: The user thomasyopes prefers to keep the parseFhir.py file in packages/data-transformation/fhir-to-csv/src/parseFhir/ as-is, declining code style improvements suggested by static analysis tools like Ruff.

packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py (10)

Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:93-97
Timestamp: 2025-07-19T14:04:34.757Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes accepts SQL injection risk in the append_job_tables function's DELETE query where patient_id is directly interpolated, based on their assessment that they control all inputs to the system.

Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:0-0
Timestamp: 2025-07-18T09:54:43.598Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes prefers to log and swallow exceptions in the copy_into_temp_table function rather than re-raising them. This allows the data transformation job to continue processing other files even if one file fails to copy to Snowflake, supporting partial success scenarios in batch data processing.

Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/main.py:0-0
Timestamp: 2025-07-18T09:53:47.448Z
Learning: In packages/data-transformation/fhir-to-csv/main.py, the user thomasyopes prefers to let Snowflake operations (create_temp_tables, copy_into_temp_table, append_temp_tables, rename_temp_tables) hard fail rather than adding explicit try-catch error handling blocks.

Learnt from: thomasyopes
PR: #4165
File: packages/api/src/routes/internal/analytics-platform/index.ts:28-28
Timestamp: 2025-07-18T09:33:29.581Z
Learning: In packages/api/src/routes/internal/analytics-platform/index.ts, the patientId parameter handling is intentionally inconsistent: the /fhir-to-csv and /fhir-to-csv/transform endpoints require patientId (using getFromQueryOrFail) for single patient processing, while the /fhir-to-csv/batch endpoint treats patientId as optional (using getFromQuery) because it's designed to run over all patients when no specific patient ID is provided.

Learnt from: RamilGaripov
PR: #3962
File: packages/api/src/sequelize/migrations/2025-06-17_00_create-cohort-tables.ts:50-56
Timestamp: 2025-06-17T20:03:18.931Z
Learning: In the Metriport codebase, the patient table ID is defined as DataTypes.STRING in the database schema, not UUID. Foreign key references to patient.id should use DataTypes.STRING to maintain consistency.

Learnt from: leite08
PR: #3412
File: packages/api/src/models/medical/patient-import-mapping.ts:16-41
Timestamp: 2025-04-26T22:17:45.248Z
Learning: The patient_import_mapping table already has a unique constraint on the fields "job_id", "row_number", and "data_pipeline_request_id" defined in the migration file 2025-04-24_02_create-patient_import_mapping-table.ts, which ensures data integrity.

Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:54-58
Timestamp: 2025-07-18T09:54:01.790Z
Learning: The user thomasyopes prefers to keep the setupSnowflake.py file in packages/data-transformation/fhir-to-csv/src/setupSnowflake/ as-is, declining suggestions for additional error handling around configuration file parsing.

Learnt from: leite08
PR: #3814
File: packages/api/src/routes/internal/medical/patient-consolidated.ts:141-174
Timestamp: 2025-05-20T21:26:26.804Z
Learning: The functionality introduced in packages/api/src/routes/internal/medical/patient-consolidated.ts is planned to be refactored in downstream PR #3857, including improvements to error handling and validation.

Learnt from: keshavsaharia
PR: #3885
File: packages/core/src/external/sftp/surescripts/client.ts:1-134
Timestamp: 2025-05-28T19:23:20.179Z
Learning: In packages/core/src/external/sftp/surescripts/client.ts, the standalone getPatientLoadFileName function intentionally omits the transmission ID from the filename, which differs from the class method getPatientLoadFileName. This difference in filename generation is expected behavior.

Learnt from: leite08
PR: #3412
File: packages/api/src/models/medical/patient-import-mapping.ts:16-41
Timestamp: 2025-04-26T22:17:45.248Z
Learning: The patient_import_mapping table already has a unique constraint on the fields "job_id", "row_number", and "data_pipeline_request_id" defined in the migration file, which prevents duplicates in the mapping between patient imports and data pipeline requests.

🧬 Code Graph Analysis (2)
packages/utils/src/snowflake/bulk-insert-patients.ts (2)
packages/utils/src/patient/get-ids.ts (1)
  • getAllPatientIds (22-29)
packages/utils/src/shared/duration.ts (1)
  • elapsedTimeAsStr (8-13)
packages/data-transformation/fhir-to-csv/src/utils/database.py (1)
packages/data-transformation/fhir-to-csv/src/utils/file.py (1)
  • strip_config_file_name (6-7)
🪛 Ruff (0.12.2)
packages/data-transformation/fhir-to-csv/main.py

31-31: Local variable patient_ids_and_bundle_keys is assigned to but never used

Remove assignment to unused variable patient_ids_and_bundle_keys

(F841)


51-51: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: Analyze (javascript)
🔇 Additional comments (16)
packages/utils/src/snowflake/bulk-insert-patients.ts (3)

58-58: Good enhancement for traceability.

Adding jobId to the log messages improves observability and debugging capabilities for bulk operations.

Also applies to: 93-93


65-65: Downstream handling of shared jobId is verified

All downstream Lambdas parse and use both jobId and patientId, so they correctly distinguish records when multiple patients share the same jobId. Specifically:

  • packages/lambdas/src/analytics-platform/fhir-to-csv.ts
  • packages/lambdas/src/patient-monitoring/discharge-requery.ts
  • packages/lambdas/src/sqs-to-converter.ts
  • packages/lambdas/src/patient-import-query.ts

No further changes required.


50-50: jobId format verified as compatible with downstream processing

All references to jobId across SQS, Lambda consumers, Python transformations, and Snowflake table naming handle the timestamp format correctly:

  • The Python fhir-to-csv code replaces only : and . (none remain after our .replace(/[:.]/g, "-")) and then maps hyphens to underscores for table names.
  • S3 key paths and HTTP route parameters accept hyphens and the “T” separator without issue.

No changes required.

packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_FamilyMemberHistory_condition.ini (1)

8-8: LGTM: Consistent identifier mapping added.

The addition of familymemberhistory_id = id mapping in both sections supports better data identification and joining capabilities, aligning with the patient-level processing refactor.

Also applies to: 49-49

packages/data-transformation/fhir-to-csv/src/utils/database.py (2)

3-5: LGTM: Temporary database naming change.

The "ANALYTICS2_" prefix with TODO comment indicates a planned migration strategy. This approach allows for safe testing of the new patient-level processing without affecting existing data.


10-11: LGTM: Function signature updated for patient-level processing.

The addition of patient_id parameter and its incorporation into the table naming pattern job_{job_id}_pt_{patient_id}_{table_name} properly supports the shift to patient-specific data processing and storage segregation.

packages/data-transformation/fhir-to-csv/src/parseNdjsonBundle/parseNdjsonBundle.py (2)

27-27: LGTM: Function signature updated for patient-level processing.

The addition of patient_id parameter aligns with the broader refactoring to handle single-patient, single-bundle processing per job.


41-41: LGTM: Patient-specific temporary file naming.

Including patient_id in temporary file names (temp_{patient_id}_{resource_type.lower()}.ndjson) provides better isolation and supports the data cleanup capabilities mentioned in previous reviews.

packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_MedicationAdministration.ini (2)

17-19: LGTM: New subject and context mappings added.

The addition of subject_type, context_reference, and context_type fields expands the data extraction capabilities for MedicationAdministration resources, with consistent mapping across both configuration sections.

Also applies to: 89-91


28-28: LGTM: Field reordering maintained consistency.

The repositioning of effectiveperiod_start maintains the logical grouping of related fields without affecting parsing functionality.

Also applies to: 100-100

packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_MedicationDispense.ini (2)

7-104: LGTM: Comprehensive MedicationDispense field mappings added.

The extensive expansion of field mappings covers a wide range of FHIR MedicationDispense resource fields including metadata, identifiers, status information, performers, dosage instructions, substitution details, and event history. All mappings are consistently applied across both configuration sections and follow proper naming conventions.


109-206: LGTM: Consistent root path mappings.

The [root_paths] section properly mirrors all the structural mappings, ensuring consistent data extraction across the parsing pipeline.

packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py (3)

43-61: Good addition of patient-level traceability

The inclusion of patient_id in the function signature and job table naming is well-implemented. The added logging will help with debugging and monitoring patient-specific transformations.


63-74: Improved connection management and parameter consistency

Good refactoring:

  • Using context manager for automatic connection cleanup
  • Consistent parameter ordering (cx_id, patient_id, job_id)
  • Simplified transaction handling by removing explicit autocommit

94-124: Well-structured S3 to Snowflake copy operation

Good improvements:

  • Clear logging of S3 URLs and operations
  • Proper stage cleanup in the normal flow
  • Exception handling allows partial success (as per team's design decision)
packages/data-transformation/fhir-to-csv/main.py (1)

78-120: Well-implemented single-patient processing logic

The handler properly enforces single-patient processing with:

  • Mandatory patient_id validation
  • Consistent parameter ordering in Snowflake function calls
  • Clear rebuild logic based on input_bundle presence
  • Proper job_id sanitization for compatibility

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
packages/data-transformation/fhir-to-csv/main.py (2)

58-60: Fix inconsistent return type.

The function should return an empty list instead of None for consistency with the return type annotation.

         if entries is None or len(entries) < 1:
             logging.warning(f"Bundle {bundle_key} has no entries")
-            return
+            return []

69-69: S3 path structure correctly updated with patient ID.

The S3 key structure now includes patient_id as intended, supporting the patient-level data organization.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1c401d5 and 3fa149e.

📒 Files selected for processing (4)
  • packages/data-transformation/fhir-to-csv/main.py (3 hunks)
  • packages/data-transformation/fhir-to-csv/requirements.txt (0 hunks)
  • packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py (4 hunks)
  • packages/data-transformation/fhir-to-csv/src/utils/database.py (1 hunks)
💤 Files with no reviewable changes (1)
  • packages/data-transformation/fhir-to-csv/requirements.txt
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/data-transformation/fhir-to-csv/src/utils/database.py
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: leite08
PR: metriport/metriport#3814
File: packages/api/src/routes/internal/medical/patient-consolidated.ts:141-174
Timestamp: 2025-05-20T21:26:26.804Z
Learning: The functionality introduced in packages/api/src/routes/internal/medical/patient-consolidated.ts is planned to be refactored in downstream PR #3857, including improvements to error handling and validation.
Learnt from: leite08
PR: metriport/metriport#3412
File: packages/api/src/models/medical/patient-import-mapping.ts:16-41
Timestamp: 2025-04-26T22:17:45.248Z
Learning: The patient_import_mapping table already has a unique constraint on the fields "job_id", "row_number", and "data_pipeline_request_id" defined in the migration file 2025-04-24_02_create-patient_import_mapping-table.ts, which ensures data integrity.
Learnt from: thomasyopes
PR: metriport/metriport#3427
File: packages/core/src/external/ehr/api/sync-patient.ts:16-55
Timestamp: 2025-03-11T20:42:46.516Z
Learning: In the patient synchronization architecture, the flow follows this pattern: (1) `ehr-sync-patient-cloud.ts` sends messages to an SQS queue, (2) the `ehr-sync-patient` Lambda consumes these messages, and (3) the Lambda uses the `syncPatient` function to make the API calls to process the patient data.
Learnt from: thomasyopes
PR: metriport/metriport#3882
File: packages/core/src/external/ehr/command/get-bundle-by-resource-type.ts:31-37
Timestamp: 2025-06-05T14:09:59.683Z
Learning: When EHR handlers in mapping objects are set to `undefined` and there are downstream PRs mentioned in the PR objectives, this typically indicates a staged rollout approach where the core structure is implemented first and specific handler implementations follow in subsequent PRs.
Learnt from: thomasyopes
PR: metriport/metriport#3466
File: packages/api/src/routes/ehr/elation/chart.ts:20-37
Timestamp: 2025-03-17T15:30:34.647Z
Learning: The request body parsing in packages/api/src/routes/ehr/elation/chart.ts is tracked with TODO referencing issue #2170 and will be handled in a separate PR, not in PR #3466.
Learnt from: RamilGaripov
PR: metriport/metriport#3676
File: packages/core/src/command/hl7v2-subscriptions/hl7v2-to-fhir-conversion/shared.ts:1-10
Timestamp: 2025-04-16T00:25:25.196Z
Learning: The circular dependency between shared.ts (importing getPatientIdsOrFail) and adt/utils.ts (using unpackPidFieldOrFail) will be addressed in a follow-up PR.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/api/src/routes/internal/analytics-platform/index.ts:28-28
Timestamp: 2025-07-18T09:33:29.581Z
Learning: In packages/api/src/routes/internal/analytics-platform/index.ts, the patientId parameter handling is intentionally inconsistent: the `/fhir-to-csv` and `/fhir-to-csv/transform` endpoints require patientId (using getFromQueryOrFail) for single patient processing, while the `/fhir-to-csv/batch` endpoint treats patientId as optional (using getFromQuery) because it's designed to run over all patients when no specific patient ID is provided.
Learnt from: leite08
PR: metriport/metriport#4193
File: packages/carequality-sdk/src/client/carequality-fhir.ts:198-208
Timestamp: 2025-07-16T00:54:56.156Z
Learning: User leite08 accepted the risk of parameter mutation in packages/carequality-sdk/src/client/carequality-fhir.ts when modifying org.active = false directly on input parameters, choosing to proceed despite the violation of the "Avoid modifying objects received as parameter" coding guideline.
Learnt from: leite08
PR: metriport/metriport#4215
File: packages/core/src/command/analytics-platform/config.ts:4-13
Timestamp: 2025-07-20T01:18:28.093Z
Learning: User leite08 prefers that code improvement suggestions (like consolidating duplicated schemas) be made on feature PRs rather than release PRs. Such improvements should be deferred to follow-up PRs during the release cycle.
Learnt from: leite08
PR: metriport/metriport#3648
File: packages/utils/src/shared/patient-create.ts:35-53
Timestamp: 2025-04-10T17:30:30.368Z
Learning: The function `mergePatients` in the patient-create.ts file is intentionally designed to take all properties from the first patient (p1) and only update address and contact fields, as the system supports duplicated patients specifically to allow multiple addresses across different records.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_DocumentReference.ini:1-188
Timestamp: 2025-07-17T19:57:25.199Z
Learning: In packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/ directory, configuration files like config_DocumentReference.ini should be kept as-is with their hard-coded field mappings, as confirmed by user thomasyopes. The team prefers to maintain these static configuration files rather than implementing dynamic mapping generation.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:93-97
Timestamp: 2025-07-19T14:04:34.757Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes accepts SQL injection risk in the append_job_tables function's DELETE query where patient_id is directly interpolated, based on their assessment that they control all inputs to the system.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/main.py:0-0
Timestamp: 2025-07-18T09:53:38.906Z
Learning: In packages/data-transformation/fhir-to-csv/main.py, S3 upload operations should be allowed to hard fail without try-catch blocks. The team prefers immediate failure over error handling and continuation for S3 upload operations in the FHIR-to-CSV transformation process.
Learnt from: leite08
PR: metriport/metriport#3412
File: packages/api/src/models/medical/patient-import-mapping.ts:16-41
Timestamp: 2025-04-26T22:17:45.248Z
Learning: The patient_import_mapping table already has a unique constraint on the fields "job_id", "row_number", and "data_pipeline_request_id" defined in the migration file, which prevents duplicates in the mapping between patient imports and data pipeline requests.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/main.py:0-0
Timestamp: 2025-07-18T09:53:47.448Z
Learning: In packages/data-transformation/fhir-to-csv/main.py, the user thomasyopes prefers to let Snowflake operations (create_temp_tables, copy_into_temp_table, append_temp_tables, rename_temp_tables) hard fail rather than adding explicit try-catch error handling blocks.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:0-0
Timestamp: 2025-07-18T09:54:43.598Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes prefers to log and swallow exceptions in the copy_into_temp_table function rather than re-raising them. This allows the data transformation job to continue processing other files even if one file fails to copy to Snowflake, supporting partial success scenarios in batch data processing.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:54-58
Timestamp: 2025-07-18T09:54:01.790Z
Learning: The user thomasyopes prefers to keep the setupSnowflake.py file in packages/data-transformation/fhir-to-csv/src/setupSnowflake/ as-is, declining suggestions for additional error handling around configuration file parsing.
Learnt from: leite08
PR: metriport/metriport#4013
File: packages/core/src/fhir-to-cda/cda-templates/components/procedures.ts:104-105
Timestamp: 2025-06-12T22:53:09.606Z
Learning: User leite08 prefers responses in English only.
Learnt from: leite08
PR: metriport/metriport#3991
File: packages/core/src/external/aws/s3.ts:621-630
Timestamp: 2025-06-10T22:20:21.203Z
Learning: User leite08 prefers that all review comments are written in English.
Learnt from: leite08
PR: metriport/metriport#3912
File: packages/core/src/command/consolidated/search/fhir-resource/search-consolidated.ts:74-83
Timestamp: 2025-06-04T00:14:38.041Z
Learning: User leite08 prefers incremental progress in refactoring scenarios, accepting some inconsistency as a stepping stone towards better architecture rather than requiring perfect consistency immediately.
packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py (11)

Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:93-97
Timestamp: 2025-07-19T14:04:34.757Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes accepts SQL injection risk in the append_job_tables function's DELETE query where patient_id is directly interpolated, based on their assessment that they control all inputs to the system.

Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:0-0
Timestamp: 2025-07-18T09:54:43.598Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes prefers to log and swallow exceptions in the copy_into_temp_table function rather than re-raising them. This allows the data transformation job to continue processing other files even if one file fails to copy to Snowflake, supporting partial success scenarios in batch data processing.

Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/main.py:0-0
Timestamp: 2025-07-18T09:53:47.448Z
Learning: In packages/data-transformation/fhir-to-csv/main.py, the user thomasyopes prefers to let Snowflake operations (create_temp_tables, copy_into_temp_table, append_temp_tables, rename_temp_tables) hard fail rather than adding explicit try-catch error handling blocks.

Learnt from: RamilGaripov
PR: #3962
File: packages/api/src/sequelize/migrations/2025-06-17_00_create-cohort-tables.ts:50-56
Timestamp: 2025-06-17T20:03:18.931Z
Learning: In the Metriport codebase, the patient table ID is defined as DataTypes.STRING in the database schema, not UUID. Foreign key references to patient.id should use DataTypes.STRING to maintain consistency.

Learnt from: thomasyopes
PR: #4165
File: packages/api/src/routes/internal/analytics-platform/index.ts:28-28
Timestamp: 2025-07-18T09:33:29.581Z
Learning: In packages/api/src/routes/internal/analytics-platform/index.ts, the patientId parameter handling is intentionally inconsistent: the /fhir-to-csv and /fhir-to-csv/transform endpoints require patientId (using getFromQueryOrFail) for single patient processing, while the /fhir-to-csv/batch endpoint treats patientId as optional (using getFromQuery) because it's designed to run over all patients when no specific patient ID is provided.

Learnt from: leite08
PR: #3412
File: packages/api/src/models/medical/patient-import-mapping.ts:16-41
Timestamp: 2025-04-26T22:17:45.248Z
Learning: The patient_import_mapping table already has a unique constraint on the fields "job_id", "row_number", and "data_pipeline_request_id" defined in the migration file 2025-04-24_02_create-patient_import_mapping-table.ts, which ensures data integrity.

Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:54-58
Timestamp: 2025-07-18T09:54:01.790Z
Learning: The user thomasyopes prefers to keep the setupSnowflake.py file in packages/data-transformation/fhir-to-csv/src/setupSnowflake/ as-is, declining suggestions for additional error handling around configuration file parsing.

Learnt from: leite08
PR: #3814
File: packages/api/src/routes/internal/medical/patient-consolidated.ts:141-174
Timestamp: 2025-05-20T21:26:26.804Z
Learning: The functionality introduced in packages/api/src/routes/internal/medical/patient-consolidated.ts is planned to be refactored in downstream PR #3857, including improvements to error handling and validation.

Learnt from: leite08
PR: #3412
File: packages/api/src/models/medical/patient-import-mapping.ts:16-41
Timestamp: 2025-04-26T22:17:45.248Z
Learning: The patient_import_mapping table already has a unique constraint on the fields "job_id", "row_number", and "data_pipeline_request_id" defined in the migration file, which prevents duplicates in the mapping between patient imports and data pipeline requests.

Learnt from: keshavsaharia
PR: #3885
File: packages/core/src/external/sftp/surescripts/client.ts:1-134
Timestamp: 2025-05-28T19:23:20.179Z
Learning: In packages/core/src/external/sftp/surescripts/client.ts, the standalone getPatientLoadFileName function intentionally omits the transmission ID from the filename, which differs from the class method getPatientLoadFileName. This difference in filename generation is expected behavior.

Learnt from: thomasyopes
PR: #3427
File: packages/core/src/external/ehr/api/sync-patient.ts:16-55
Timestamp: 2025-03-11T20:42:46.516Z
Learning: In the patient synchronization architecture, the flow follows this pattern: (1) ehr-sync-patient-cloud.ts sends messages to an SQS queue, (2) the ehr-sync-patient Lambda consumes these messages, and (3) the Lambda uses the syncPatient function to make the API calls to process the patient data.

packages/data-transformation/fhir-to-csv/main.py (24)

Learnt from: thomasyopes
PR: #4165
File: packages/api/src/routes/internal/analytics-platform/index.ts:28-28
Timestamp: 2025-07-18T09:33:29.581Z
Learning: In packages/api/src/routes/internal/analytics-platform/index.ts, the patientId parameter handling is intentionally inconsistent: the /fhir-to-csv and /fhir-to-csv/transform endpoints require patientId (using getFromQueryOrFail) for single patient processing, while the /fhir-to-csv/batch endpoint treats patientId as optional (using getFromQuery) because it's designed to run over all patients when no specific patient ID is provided.

Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/main.py:0-0
Timestamp: 2025-07-18T09:53:38.906Z
Learning: In packages/data-transformation/fhir-to-csv/main.py, S3 upload operations should be allowed to hard fail without try-catch blocks. The team prefers immediate failure over error handling and continuation for S3 upload operations in the FHIR-to-CSV transformation process.

Learnt from: leite08
PR: #3814
File: packages/api/src/routes/internal/medical/patient-consolidated.ts:141-174
Timestamp: 2025-05-20T21:26:26.804Z
Learning: The functionality introduced in packages/api/src/routes/internal/medical/patient-consolidated.ts is planned to be refactored in downstream PR #3857, including improvements to error handling and validation.

Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/main.py:0-0
Timestamp: 2025-07-18T09:53:47.448Z
Learning: In packages/data-transformation/fhir-to-csv/main.py, the user thomasyopes prefers to let Snowflake operations (create_temp_tables, copy_into_temp_table, append_temp_tables, rename_temp_tables) hard fail rather than adding explicit try-catch error handling blocks.

Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:93-97
Timestamp: 2025-07-19T14:04:34.757Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes accepts SQL injection risk in the append_job_tables function's DELETE query where patient_id is directly interpolated, based on their assessment that they control all inputs to the system.

Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:0-0
Timestamp: 2025-07-18T09:54:43.598Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes prefers to log and swallow exceptions in the copy_into_temp_table function rather than re-raising them. This allows the data transformation job to continue processing other files even if one file fails to copy to Snowflake, supporting partial success scenarios in batch data processing.

Learnt from: keshavsaharia
PR: #3885
File: packages/core/src/external/sftp/surescripts/client.ts:1-134
Timestamp: 2025-05-28T19:23:20.179Z
Learning: In packages/core/src/external/sftp/surescripts/client.ts, the standalone getPatientLoadFileName function intentionally omits the transmission ID from the filename, which differs from the class method getPatientLoadFileName. This difference in filename generation is expected behavior.

Learnt from: leite08
PR: #3412
File: packages/api/.env.test:12-12
Timestamp: 2025-04-24T01:57:35.097Z
Learning: The PATIENT_IMPORT_BUCKET_NAME environment variable is specific to the API package and doesn't need to be added to other packages' environment configurations.

Learnt from: leite08
PR: #3814
File: packages/core/src/command/consolidated/search/fhir-resource/search-semantic.ts:36-36
Timestamp: 2025-05-19T13:53:09.828Z
Learning: In the getConsolidatedPatientData function from @metriport/core/command/consolidated/consolidated-get, only the patient parameter is required. Other parameters like requestId, resources, dateFrom, dateTo, fromDashboard, and forceDataFromFhir are all optional.

Learnt from: thomasyopes
PR: #3427
File: packages/core/src/external/ehr/api/sync-patient.ts:16-55
Timestamp: 2025-03-11T20:42:46.516Z
Learning: In the patient synchronization architecture, the flow follows this pattern: (1) ehr-sync-patient-cloud.ts sends messages to an SQS queue, (2) the ehr-sync-patient Lambda consumes these messages, and (3) the Lambda uses the syncPatient function to make the API calls to process the patient data.

Learnt from: lucasdellabella
PR: #3855
File: packages/core/src/command/hl7-notification/hl7-notification-webhook-sender-direct.ts:88-104
Timestamp: 2025-05-16T02:10:01.792Z
Learning: When logging file paths or other strings that might contain patient identifiers like patientId or cxId, these should be masked or redacted (e.g., using regex replacement) to avoid logging Protected Health Information (PHI) to CloudWatch or error tracking systems like Sentry, ensuring HIPAA compliance.

Learnt from: leite08
PR: #3612
File: packages/core/src/command/conversion-result/send-multiple-conversion-results.ts:48-62
Timestamp: 2025-04-05T01:03:51.989Z
Learning: Patient IDs in this codebase are UUIDs and don't contain PII, so they're safe to include in logs and error contexts.

Learnt from: RamilGaripov
PR: #3962
File: packages/api/src/sequelize/migrations/2025-06-17_00_create-cohort-tables.ts:50-56
Timestamp: 2025-06-17T20:03:18.931Z
Learning: In the Metriport codebase, the patient table ID is defined as DataTypes.STRING in the database schema, not UUID. Foreign key references to patient.id should use DataTypes.STRING to maintain consistency.

Learnt from: thomasyopes
PR: #3883
File: packages/api/src/external/ehr/shared/job/bundle/create-resource-diff-bundles/start-jobs-across-ehrs.ts:38-46
Timestamp: 2025-05-28T19:42:17.301Z
Learning: Athena patient IDs necessarily contain a "." character - this is a required format for all valid Athena patient IDs.

Learnt from: thomasyopes
PR: #3973
File: packages/core/src/external/ehr/athenahealth/shared.ts:8-16
Timestamp: 2025-06-06T18:17:23.577Z
Learning: For internal functions in the Metriport codebase where inputs are controlled by the development team, input validation for required parameters like cxId and practiceId is not needed and should be avoided to keep functions simple.

Learnt from: RamilGaripov
PR: #3675
File: packages/core/src/command/hl7v2-subscriptions/hl7v2-to-fhir-converter.ts:58-59
Timestamp: 2025-04-23T08:43:00.803Z
Learning: In the Metriport codebase, cxId is expected to be a UUID format (containing only alphanumeric characters and hyphens), which doesn't require URL encoding when used in query parameters.

Learnt from: leite08
PR: #3869
File: packages/core/src/external/fhir/export/string/resources/medication-request.ts:93-94
Timestamp: 2025-05-21T16:49:07.877Z
Learning: Raw ISO date strings in FHIR resources (like authoredOn in MedicationRequest) should be preserved as-is without additional formatting when converting resources to string representations.

Learnt from: leite08
PR: #4194
File: packages/api/src/command/medical/patient/get-patient-facilities.ts:36-44
Timestamp: 2025-07-16T01:38:14.080Z
Learning: In packages/api/src/command/medical/patient/get-patient-facilities.ts, the patient.facilityIds field is strongly typed as array of strings (non-nullable), so null/undefined checks are not needed when accessing this property.

Learnt from: leite08
PR: #4194
File: packages/api/src/command/medical/patient/get-patient-facilities.ts:36-44
Timestamp: 2025-07-16T01:38:14.080Z
Learning: In packages/api/src/command/medical/patient/get-patient-facilities.ts, the patient.facilityIds field is strongly typed as array of strings (non-nullable), so null/undefined checks are not needed when accessing this property.

Learnt from: RamilGaripov
PR: #3676
File: packages/core/src/command/hl7v2-subscriptions/hl7v2-to-fhir-conversion/shared.ts:1-10
Timestamp: 2025-04-16T00:25:25.196Z
Learning: The circular dependency between shared.ts (importing getPatientIdsOrFail) and adt/utils.ts (using unpackPidFieldOrFail) will be addressed in a follow-up PR.

Learnt from: leite08
PR: #4095
File: packages/commonwell-cert-runner/src/single-commands/patient-create.ts:35-36
Timestamp: 2025-07-16T18:51:08.416Z
Learning: In packages/commonwell-cert-runner and the CommonWell SDK, PatientCollection.Patients is strongly typed as an array (z.array(patientCollectionItemSchema)), so resp.Patients will always exist and be an array. The getMetriportPatientIdOrFail utility function is designed to handle undefined/null input with proper error messages, making additional bounds checking unnecessary.

Learnt from: keshavsaharia
PR: #3885
File: packages/utils/src/surescripts/generate-patient-load-file.ts:92-94
Timestamp: 2025-05-30T01:11:35.300Z
Learning: In V0 of the Surescripts implementation (packages/utils/src/surescripts/generate-patient-load-file.ts), there is always only 1 facility ID, so the current facility_ids parsing logic with substring operations is sufficient and additional validation is not necessary.

Learnt from: lucasdellabella
PR: #3608
File: packages/core/src/external/ehr/resource-diff/steps/start/ehr-start-resource-diff-local.ts:0-0
Timestamp: 2025-04-21T03:47:54.332Z
Learning: In the EHR resource diff implementation, bundles should always have an entry property, and code should be allowed to throw if entry is missing rather than using null-coalescing operators, as this represents an unexpected state that should be caught and addressed.

Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:54-58
Timestamp: 2025-07-18T09:54:01.790Z
Learning: The user thomasyopes prefers to keep the setupSnowflake.py file in packages/data-transformation/fhir-to-csv/src/setupSnowflake/ as-is, declining suggestions for additional error handling around configuration file parsing.

🧬 Code Graph Analysis (1)
packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py (1)
packages/data-transformation/fhir-to-csv/src/utils/database.py (6)
  • format_patient_status_table_name (13-14)
  • format_table_name_from_config_file_name (7-8)
  • format_job_table_name (10-11)
  • get_data_type (20-26)
  • format_database_name (4-5)
  • format_stage_name (17-18)
🪛 Ruff (0.12.2)
packages/data-transformation/fhir-to-csv/main.py

52-52: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: Analyze (javascript)
🔇 Additional comments (14)
packages/data-transformation/fhir-to-csv/main.py (7)

8-8: LGTM! New imports align with patient status tracking.

The addition of setup_database and set_patient_status imports correctly supports the new patient status tracking functionality mentioned in the PR objectives.

Also applies to: 12-12


25-32: LGTM! Function signature correctly updated for single-patient processing.

The addition of patient_id as a required parameter aligns with the PR's shift from cohort-level to patient-level processing.


33-37: LGTM! Bundle key creation logic simplified appropriately.

The simplified logic for creating a single bundle key aligns well with the single-patient processing approach.


89-93: LGTM! Parameter validation correctly enforces single-patient processing.

The mandatory patient_id validation and job_id sanitization properly support the refactored single-patient processing approach.


117-118: LGTM! Patient status tracking properly implemented.

The addition of patient status tracking with "processing" and "completed" states provides good visibility into the transformation pipeline state and aligns with the PR objectives.

Also applies to: 127-128


120-125: LGTM! Snowflake function calls correctly updated with patient_id.

All Snowflake operations now properly include the patient_id parameter, and the rebuild_patient flag logic appropriately handles different processing scenarios.


102-102: LGTM! Logging improvements enhance traceability.

The updated log messages with patient_id context provide better debugging and monitoring capabilities for the single-patient processing flow.

Also applies to: 113-113, 116-116, 128-128

packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py (7)

10-10: LGTM! Import supports new patient status tracking.

The addition of format_patient_status_table_name import correctly supports the new patient status tracking functionality.


44-52: LGTM! Function correctly updated for patient-centric table naming.

The addition of patient_id parameter and logging enhances the function's alignment with single-patient processing and provides better traceability.


64-69: LGTM! Well-implemented database setup function.

The new setup_database function properly ensures database existence before table operations and uses appropriate connection management with context managers.


70-79: LGTM! Function correctly updated with patient_id parameter.

The create_job_tables function properly incorporates the patient_id parameter and maintains good logging and connection management practices.


81-97: LGTM! Function correctly implements patient-centric table operations.

The append_job_tables function properly incorporates patient_id and rebuild_patient parameters with appropriate logging. The SQL injection risk in the DELETE query was previously assessed and accepted by the team based on controlled inputs.


99-129: LGTM! Function correctly updated with enhanced logging and patient_id support.

The copy_into_job_table function properly incorporates the patient_id parameter and maintains the team's preferred error handling approach for partial success scenarios in batch processing.


130-140: LGTM! Well-implemented patient status tracking function.

The new set_patient_status function properly implements patient status tracking with appropriate table management, upsert logic, and consistent naming conventions. The direct SQL interpolation follows the team's accepted approach for controlled inputs.

@leite08 leite08 force-pushed the eng-722-snowflake-per-pt branch from 026bf0a to 12eb691 Compare July 25, 2025 02:38
with open(local_bundle_key, "rb") as f:
bundle = json.load(f)
entries = bundle["entry"]
if entries is None or len(entries) < 1:
Copy link
Contributor

@thomasyopes thomasyopes Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we want to check if entries is an array too before checking length?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

damn, there's a lot of diff ways to check for arrays in Python :/

I'll leave as is for now, it seems low impact given we always generate FHIR bundles, where entry is array

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we flag putting these changes back into the fhir inferno repo?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, added it as an item here

leite08 added 5 commits July 25, 2025 18:39
Ref eng-722

Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
Ref eng-722

Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
Ref eng-722

Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
- hybrid tables
- single DB conn per pt
- load config files once before DB conn

Ref eng-722

Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
Ref eng-722

Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
leite08 added 2 commits July 25, 2025 19:42
Ref eng-722

Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
Ref eng-722

Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
@leite08 leite08 added this pull request to the merge queue Jul 26, 2025
Merged via the queue into develop with commit 6be5aed Jul 26, 2025
30 checks passed
@leite08 leite08 deleted the eng-722-snowflake-per-pt branch July 26, 2025 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants