-
Notifications
You must be signed in to change notification settings - Fork 72
ENG-722 Data transformation on pt level with updates #4245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis change set removes the FHIR-to-CSV transform API endpoint and its client, refactors the FHIR-to-CSV data transformation pipeline to operate on a single patient per job, and updates configuration mappings for FamilyMemberHistory and Medication resources. It also reworks Snowflake integration to use patient-specific job tables and status tracking, adds logging, and updates utility functions and dependencies. Changes
Sequence Diagram(s)sequenceDiagram
participant SQS
participant bulk-insert-patients.ts
participant Lambda (main.py)
participant S3
participant Snowflake
bulk-insert-patients.ts->>SQS: Send message (patientId, jobId, ...)
SQS->>Lambda (main.py): Trigger with event (patientId, jobId, ...)
Lambda (main.py)->>S3: Download bundle for patientId
alt Bundle missing
Lambda (main.py)->>Lambda (main.py): Raise error, abort
else Bundle found
Lambda (main.py)->>Lambda (main.py): Transform bundle to CSV(s)
Lambda (main.py)->>S3: Upload CSV(s) (key includes patientId)
Lambda (main.py)->>Snowflake: setup_database(cxId)
Lambda (main.py)->>Snowflake: set_patient_status(cxId, patientId, "processing")
Lambda (main.py)->>Snowflake: create_job_tables(cxId, patientId, jobId)
Lambda (main.py)->>Snowflake: copy_into_job_table(..., patientId, jobId, ...)
Lambda (main.py)->>Snowflake: append_job_tables(cxId, patientId, jobId, rebuild_flag)
Lambda (main.py)->>Snowflake: set_patient_status(cxId, patientId, "completed")
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~35 minutes Possibly related PRs
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
npm error code ERR_SSL_WRONG_VERSION_NUMBER 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
🧰 Additional context used📓 Path-based instructions (3)**/*.{js,jsx,ts,tsx}📄 CodeRabbit Inference Engine (.cursorrules)
Files:
**/*.{ts,tsx}📄 CodeRabbit Inference Engine (.cursorrules)
Files:
**/*.ts⚙️ CodeRabbit Configuration File
Files:
🧠 Learnings (2)📓 Common learnings
packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv-transform.ts (16)Learnt from: thomasyopes Learnt from: leite08 Learnt from: keshavsaharia Learnt from: thomasyopes Learnt from: thomasyopes Learnt from: thomasyopes Learnt from: RamilGaripov Learnt from: leite08 Learnt from: thomasyopes Learnt from: RamilGaripov Learnt from: thomasyopes Learnt from: RamilGaripov Learnt from: thomasyopes Learnt from: leite08 Learnt from: thomasyopes Learnt from: thomasyopes ⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
🔇 Additional comments (1)
✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
||
output_bucket_and_file_keys_and_table_names = [] | ||
for file in local_output_files: | ||
output_file_key = f"{dwh}/{transform_name}/{cx_id}/{job_id}/{file.replace('/', '_')}" | ||
output_file_key = f"{dwh}/{transform_name}/{cx_id}/{patient_id}/{job_id}/{file.replace('/', '_')}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
important, added the pt ID on the S3 path
else: | ||
patient_ids = [patient_id] if patient_id is not None and patient_id != "" else get_patient_ids(api_url, cx_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't be used on the cx level, needs pt ID
raise ValueError("PATIENT_ID is not set") | ||
if not job_id: | ||
raise ValueError("JOB_ID is not set") | ||
job_id = job_id.replace(":", "-").replace(".", "-") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ISO date to a format that can use used in the DB
@@ -5,6 +5,7 @@ writemode = append | |||
anchor = condition | |||
|
|||
[Struct] | |||
familymemberhistory_id = id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allows to join by ID w/ the familymemberhistory
table
@@ -38,7 +38,7 @@ def parse(input_path: str, outputs_folder: str) -> list[str]: | |||
# Process each resource type separately | |||
for resource_type, config_files in config_groups.items(): | |||
# First, filter the input file for just this resource type | |||
filtered_input = f'{outputs_folder}/temp_{resource_type.lower()}.ndjson' | |||
filtered_input = f'{outputs_folder}/temp_{patient_id}_{resource_type.lower()}.ndjson' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
important, so we can remove existing data based on the pt ID
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This temp file gets deleted as a heads up. I believe that the "filename" column is the derived from the path + filename, so as long as new file is along the patient ID path, it'll appear in the column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the file name column is - normalized version of "output_file_path"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, reverted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🔭 Outside diff range comments (1)
packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py (1)
86-90
: Potential data integrity issue with LIKE pattern matchingThe DELETE query uses
LIKE '%{patient_id}%'
which could unintentionally match other patients' data if one patient_id is a substring of another (e.g., patient "123" would match "1234", "0123", etc.).Consider using a more precise matching pattern or storing patient_id in a dedicated column for exact matching.
🧹 Nitpick comments (1)
packages/data-transformation/fhir-to-csv/main.py (1)
48-53
: Improve exception handling with contextWhen re-raising exceptions, use
from
to preserve the exception chain for better debugging.except s3_client.exceptions.ClientError as e: if e.response['Error']['Code'] == '404': logging.warning(f"Bundle {bundle_key} not found in input bucket {input_bucket}") - raise ValueError("Bundle not found") + raise ValueError("Bundle not found") from e else: raise e
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
packages/api/src/routes/internal/analytics-platform/index.ts
(0 hunks)packages/core/src/command/analytics-platform/api/start-fhir-to-csv-transform.ts
(0 hunks)packages/data-transformation/fhir-to-csv/main.py
(2 hunks)packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_FamilyMemberHistory_condition.ini
(2 hunks)packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_MedicationAdministration.ini
(4 hunks)packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_MedicationDispense.ini
(1 hunks)packages/data-transformation/fhir-to-csv/src/parseNdjsonBundle/parseNdjsonBundle.py
(2 hunks)packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py
(3 hunks)packages/data-transformation/fhir-to-csv/src/utils/database.py
(1 hunks)packages/utils/src/snowflake/bulk-insert-patients.ts
(2 hunks)
💤 Files with no reviewable changes (2)
- packages/api/src/routes/internal/analytics-platform/index.ts
- packages/core/src/command/analytics-platform/api/start-fhir-to-csv-transform.ts
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{js,jsx,ts,tsx}
📄 CodeRabbit Inference Engine (.cursorrules)
**/*.{js,jsx,ts,tsx}
: Don’t use null inside the app, only on code interacting with external interfaces/services, like DB and HTTP; convert to undefined before sending inwards into the code
Useconst
whenever possible
Useasync/await
instead of.then()
Naming: classes, enums: PascalCase
Naming: constants, variables, functions: camelCase
Naming: file names: kebab-case
Naming: Don’t use negative names, likenotEnabled
, preferisDisabled
If possible, use decomposing objects for function parameters
Prefer Nullish Coalesce (??) than the OR operator (||) when you want to provide a default value
Avoid creating arrow functions
Use truthy syntax instead ofin
- i.e.,if (data.link)
notif ('link' in data)
While handling errors, keep the stack trace around: if you create a new Error (e.g., MetriportError), make sure to pass the original error as the new one’s cause so the stack trace is available upstream.
max column length is 100 chars
multi-line comments use/** */
top-level comments go after the import (save pre-import to basic file header, like license)
move literals to constants declared after imports when possible
Files:
packages/utils/src/snowflake/bulk-insert-patients.ts
**/*.{ts,tsx}
📄 CodeRabbit Inference Engine (.cursorrules)
Use types whenever possible
Files:
packages/utils/src/snowflake/bulk-insert-patients.ts
**/*.ts
⚙️ CodeRabbit Configuration File
**/*.ts
: - Use the Onion Pattern to organize a package's code in layers
- Try to use immutable code and avoid sharing state across different functions, objects, and systems
- Try to build code that's idempotent whenever possible
- Prefer functional programming style functions: small, deterministic, 1 input, 1 output
- Minimize coupling / dependencies
- Avoid modifying objects received as parameter
- Only add comments to code to explain why something was done, not how it works
- Naming
- classes, enums:
PascalCase
- constants, variables, functions:
camelCase
- file names:
kebab-case
- table and column names:
snake_case
- Use meaningful names, so whoever is reading the code understands what it means
- Don’t use negative names, like
notEnabled
, preferisDisabled
- For numeric values, if the type doesn’t convey the unit, add the unit to the name
- Typescript
- Use types
- Prefer
const
instead oflet
- Avoid
any
and casting fromany
to other types- Type predicates: only applicable to narrow down the type, not to force a complete type conversion
- Prefer deconstructing parameters for functions instead of multiple parameters that might be of
the same type- Don’t use
null
inside the app, only on code interacting with external interfaces/services,
like DB and HTTP; convert toundefined
before sending inwards into the code- Use
async/await
instead of.then()
- Use the strict equality operator
===
, don’t use abstract equality operator==
- When calling a Promise-returning function asynchronously (i.e., not awaiting), use
.catch()
to
handle errors (seeprocessAsyncError
andemptyFunction
depending on the case)- Date and Time
- Always use
buildDayjs()
to createdayjs
instances- Prefer
dayjs.duration(...)
to create duration consts and keep them asduration
- Prefer Nullish Coalesce (??) than the OR operator (||) to provide a default value
- Avoid creating arrow functions
- U...
Files:
packages/utils/src/snowflake/bulk-insert-patients.ts
🧠 Learnings (9)
📓 Common learnings
Learnt from: leite08
PR: metriport/metriport#3814
File: packages/api/src/routes/internal/medical/patient-consolidated.ts:141-174
Timestamp: 2025-05-20T21:26:26.804Z
Learning: The functionality introduced in packages/api/src/routes/internal/medical/patient-consolidated.ts is planned to be refactored in downstream PR #3857, including improvements to error handling and validation.
Learnt from: leite08
PR: metriport/metriport#3412
File: packages/api/src/models/medical/patient-import-mapping.ts:16-41
Timestamp: 2025-04-26T22:17:45.248Z
Learning: The patient_import_mapping table already has a unique constraint on the fields "job_id", "row_number", and "data_pipeline_request_id" defined in the migration file 2025-04-24_02_create-patient_import_mapping-table.ts, which ensures data integrity.
Learnt from: thomasyopes
PR: metriport/metriport#3427
File: packages/core/src/external/ehr/api/sync-patient.ts:16-55
Timestamp: 2025-03-11T20:42:46.516Z
Learning: In the patient synchronization architecture, the flow follows this pattern: (1) `ehr-sync-patient-cloud.ts` sends messages to an SQS queue, (2) the `ehr-sync-patient` Lambda consumes these messages, and (3) the Lambda uses the `syncPatient` function to make the API calls to process the patient data.
Learnt from: thomasyopes
PR: metriport/metriport#3882
File: packages/core/src/external/ehr/command/get-bundle-by-resource-type.ts:31-37
Timestamp: 2025-06-05T14:09:59.683Z
Learning: When EHR handlers in mapping objects are set to `undefined` and there are downstream PRs mentioned in the PR objectives, this typically indicates a staged rollout approach where the core structure is implemented first and specific handler implementations follow in subsequent PRs.
Learnt from: RamilGaripov
PR: metriport/metriport#3676
File: packages/core/src/command/hl7v2-subscriptions/hl7v2-to-fhir-conversion/shared.ts:1-10
Timestamp: 2025-04-16T00:25:25.196Z
Learning: The circular dependency between shared.ts (importing getPatientIdsOrFail) and adt/utils.ts (using unpackPidFieldOrFail) will be addressed in a follow-up PR.
Learnt from: thomasyopes
PR: metriport/metriport#3466
File: packages/api/src/routes/ehr/elation/chart.ts:20-37
Timestamp: 2025-03-17T15:30:34.647Z
Learning: The request body parsing in packages/api/src/routes/ehr/elation/chart.ts is tracked with TODO referencing issue #2170 and will be handled in a separate PR, not in PR #3466.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/api/src/routes/internal/analytics-platform/index.ts:28-28
Timestamp: 2025-07-18T09:33:29.581Z
Learning: In packages/api/src/routes/internal/analytics-platform/index.ts, the patientId parameter handling is intentionally inconsistent: the `/fhir-to-csv` and `/fhir-to-csv/transform` endpoints require patientId (using getFromQueryOrFail) for single patient processing, while the `/fhir-to-csv/batch` endpoint treats patientId as optional (using getFromQuery) because it's designed to run over all patients when no specific patient ID is provided.
Learnt from: leite08
PR: metriport/metriport#4193
File: packages/carequality-sdk/src/client/carequality-fhir.ts:198-208
Timestamp: 2025-07-16T00:54:56.156Z
Learning: User leite08 accepted the risk of parameter mutation in packages/carequality-sdk/src/client/carequality-fhir.ts when modifying org.active = false directly on input parameters, choosing to proceed despite the violation of the "Avoid modifying objects received as parameter" coding guideline.
Learnt from: leite08
PR: metriport/metriport#4215
File: packages/core/src/command/analytics-platform/config.ts:4-13
Timestamp: 2025-07-20T01:18:28.093Z
Learning: User leite08 prefers that code improvement suggestions (like consolidating duplicated schemas) be made on feature PRs rather than release PRs. Such improvements should be deferred to follow-up PRs during the release cycle.
Learnt from: leite08
PR: metriport/metriport#3648
File: packages/utils/src/shared/patient-create.ts:35-53
Timestamp: 2025-04-10T17:30:30.368Z
Learning: The function `mergePatients` in the patient-create.ts file is intentionally designed to take all properties from the first patient (p1) and only update address and contact fields, as the system supports duplicated patients specifically to allow multiple addresses across different records.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_DocumentReference.ini:1-188
Timestamp: 2025-07-17T19:57:25.199Z
Learning: In packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/ directory, configuration files like config_DocumentReference.ini should be kept as-is with their hard-coded field mappings, as confirmed by user thomasyopes. The team prefers to maintain these static configuration files rather than implementing dynamic mapping generation.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:93-97
Timestamp: 2025-07-19T14:04:34.757Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes accepts SQL injection risk in the append_job_tables function's DELETE query where patient_id is directly interpolated, based on their assessment that they control all inputs to the system.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/main.py:0-0
Timestamp: 2025-07-18T09:53:38.906Z
Learning: In packages/data-transformation/fhir-to-csv/main.py, S3 upload operations should be allowed to hard fail without try-catch blocks. The team prefers immediate failure over error handling and continuation for S3 upload operations in the FHIR-to-CSV transformation process.
Learnt from: leite08
PR: metriport/metriport#3412
File: packages/api/src/models/medical/patient-import-mapping.ts:16-41
Timestamp: 2025-04-26T22:17:45.248Z
Learning: The patient_import_mapping table already has a unique constraint on the fields "job_id", "row_number", and "data_pipeline_request_id" defined in the migration file, which prevents duplicates in the mapping between patient imports and data pipeline requests.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/main.py:0-0
Timestamp: 2025-07-18T09:53:47.448Z
Learning: In packages/data-transformation/fhir-to-csv/main.py, the user thomasyopes prefers to let Snowflake operations (create_temp_tables, copy_into_temp_table, append_temp_tables, rename_temp_tables) hard fail rather than adding explicit try-catch error handling blocks.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:0-0
Timestamp: 2025-07-18T09:54:43.598Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes prefers to log and swallow exceptions in the copy_into_temp_table function rather than re-raising them. This allows the data transformation job to continue processing other files even if one file fails to copy to Snowflake, supporting partial success scenarios in batch data processing.
Learnt from: keshavsaharia
PR: metriport/metriport#3885
File: packages/utils/src/surescripts/generate-patient-load-file.ts:92-94
Timestamp: 2025-05-30T01:11:35.300Z
Learning: In V0 of the Surescripts implementation (packages/utils/src/surescripts/generate-patient-load-file.ts), there is always only 1 facility ID, so the current facility_ids parsing logic with substring operations is sufficient and additional validation is not necessary.
Learnt from: leite08
PR: metriport/metriport#4013
File: packages/core/src/fhir-to-cda/cda-templates/components/procedures.ts:104-105
Timestamp: 2025-06-12T22:53:09.606Z
Learning: User leite08 prefers responses in English only.
Learnt from: leite08
PR: metriport/metriport#3991
File: packages/core/src/external/aws/s3.ts:621-630
Timestamp: 2025-06-10T22:20:21.203Z
Learning: User leite08 prefers that all review comments are written in English.
Learnt from: leite08
PR: metriport/metriport#3912
File: packages/core/src/command/consolidated/search/fhir-resource/search-consolidated.ts:74-83
Timestamp: 2025-06-04T00:14:38.041Z
Learning: User leite08 prefers incremental progress in refactoring scenarios, accepting some inconsistency as a stepping stone towards better architecture rather than requiring perfect consistency immediately.
packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_FamilyMemberHistory_condition.ini (1)
Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_DocumentReference.ini:1-188
Timestamp: 2025-07-17T19:57:25.199Z
Learning: In packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/ directory, configuration files like config_DocumentReference.ini should be kept as-is with their hard-coded field mappings, as confirmed by user thomasyopes. The team prefers to maintain these static configuration files rather than implementing dynamic mapping generation.
packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_MedicationDispense.ini (5)
Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_DocumentReference.ini:1-188
Timestamp: 2025-07-17T19:57:25.199Z
Learning: In packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/ directory, configuration files like config_DocumentReference.ini should be kept as-is with their hard-coded field mappings, as confirmed by user thomasyopes. The team prefers to maintain these static configuration files rather than implementing dynamic mapping generation.
Learnt from: keshavsaharia
PR: #4075
File: packages/core/src/external/surescripts/fhir/medication-request.ts:76-83
Timestamp: 2025-06-25T18:42:07.231Z
Learning: In packages/core/src/external/surescripts/fhir/medication-request.ts, the getDispenseNote and getDosageInstruction functions are intentionally identical because the dashboard requires both the note and dosageInstruction fields of MedicationRequest to be populated with detail.directions for proper note display functionality.
Learnt from: thomasyopes
PR: #3970
File: packages/api/src/external/ehr/athenahealth/command/write-back/medication.ts:17-17
Timestamp: 2025-06-06T16:45:31.832Z
Learning: The writeMedicationToChart function in packages/api/src/external/ehr/athenahealth/command/write-back/medication.ts returns a response that is not currently used by any consumers, so changes to its return type are not breaking changes in practice.
Learnt from: keshavsaharia
PR: #4045
File: packages/core/src/external/surescripts/fhir/coverage.ts:0-0
Timestamp: 2025-06-18T18:34:10.489Z
Learning: Coverage resources in Surescripts FHIR conversion are currently excluded from bundles to prevent skewing data lift metrics. The team plans to examine available insurance data thoroughly before including properly structured Coverage resources with mandatory FHIR R4 elements like beneficiary and payor references.
Learnt from: thomasyopes
PR: #3891
File: packages/api/src/routes/internal/ehr/patient.ts:142-142
Timestamp: 2025-06-20T15:35:00.546Z
Learning: In the EHR resource diff contribution system, there are no "invalid" resource types. The system is designed to gracefully handle any resource type string by attempting to find a corresponding bundle, and if no bundle exists for that resource type, it will simply exit early rather than throwing an error.
packages/utils/src/snowflake/bulk-insert-patients.ts (19)
Learnt from: keshavsaharia
PR: #3885
File: packages/core/src/external/sftp/surescripts/client.ts:1-134
Timestamp: 2025-05-28T19:23:20.179Z
Learning: In packages/core/src/external/sftp/surescripts/client.ts, the standalone getPatientLoadFileName function intentionally omits the transmission ID from the filename, which differs from the class method getPatientLoadFileName. This difference in filename generation is expected behavior.
Learnt from: leite08
PR: #3814
File: packages/api/src/routes/internal/medical/patient-consolidated.ts:141-174
Timestamp: 2025-05-20T21:26:26.804Z
Learning: The functionality introduced in packages/api/src/routes/internal/medical/patient-consolidated.ts is planned to be refactored in downstream PR #3857, including improvements to error handling and validation.
Learnt from: RamilGaripov
PR: #3962
File: packages/api/src/sequelize/migrations/2025-06-17_00_create-cohort-tables.ts:50-56
Timestamp: 2025-06-17T20:03:18.931Z
Learning: In the Metriport codebase, the patient table ID is defined as DataTypes.STRING in the database schema, not UUID. Foreign key references to patient.id should use DataTypes.STRING to maintain consistency.
Learnt from: thomasyopes
PR: #4165
File: packages/api/src/routes/internal/analytics-platform/index.ts:28-28
Timestamp: 2025-07-18T09:33:29.581Z
Learning: In packages/api/src/routes/internal/analytics-platform/index.ts, the patientId parameter handling is intentionally inconsistent: the /fhir-to-csv
and /fhir-to-csv/transform
endpoints require patientId (using getFromQueryOrFail) for single patient processing, while the /fhir-to-csv/batch
endpoint treats patientId as optional (using getFromQuery) because it's designed to run over all patients when no specific patient ID is provided.
Learnt from: RamilGaripov
PR: #3676
File: packages/core/src/command/hl7v2-subscriptions/hl7v2-to-fhir-conversion/shared.ts:1-10
Timestamp: 2025-04-16T00:25:25.196Z
Learning: The circular dependency between shared.ts (importing getPatientIdsOrFail) and adt/utils.ts (using unpackPidFieldOrFail) will be addressed in a follow-up PR.
Learnt from: leite08
PR: #3412
File: packages/api/src/models/medical/patient-import-mapping.ts:16-41
Timestamp: 2025-04-26T22:17:45.248Z
Learning: The patient_import_mapping table already has a unique constraint on the fields "job_id", "row_number", and "data_pipeline_request_id" defined in the migration file 2025-04-24_02_create-patient_import_mapping-table.ts, which ensures data integrity.
Learnt from: thomasyopes
PR: #3427
File: packages/core/src/external/ehr/api/sync-patient.ts:16-55
Timestamp: 2025-03-11T20:42:46.516Z
Learning: In the patient synchronization architecture, the flow follows this pattern: (1) ehr-sync-patient-cloud.ts
sends messages to an SQS queue, (2) the ehr-sync-patient
Lambda consumes these messages, and (3) the Lambda uses the syncPatient
function to make the API calls to process the patient data.
Learnt from: leite08
PR: #4034
File: packages/core/src/command/patient-import/patient-or-record-failed.ts:19-30
Timestamp: 2025-06-16T17:03:23.069Z
Learning: In patient import error handling functions like setPatientOrRecordFailed, the team prefers Promise.all over Promise.allSettled to ensure atomic all-or-nothing behavior - if either the job update or patient record update fails, the entire operation should fail rather than allowing partial success.
Learnt from: keshavsaharia
PR: #3885
File: packages/utils/src/surescripts/generate-patient-load-file.ts:92-94
Timestamp: 2025-05-30T01:11:35.300Z
Learning: In V0 of the Surescripts implementation (packages/utils/src/surescripts/generate-patient-load-file.ts), there is always only 1 facility ID, so the current facility_ids parsing logic with substring operations is sufficient and additional validation is not necessary.
Learnt from: leite08
PR: #3953
File: packages/core/src/command/consolidated/search/fhir-resource/tests/search-consolidated-setup.ts:28-30
Timestamp: 2025-06-03T21:02:23.374Z
Learning: The makePatient()
function returns PatientWithId
type which requires the id
field to be present, ensuring patient.id
is never undefined.
Learnt from: lucasdellabella
PR: #4098
File: packages/api/src/routes/medical/dtos/tcm-encounter-dto.ts:10-22
Timestamp: 2025-06-27T01:50:14.227Z
Learning: In packages/api/src/routes/medical/dtos/tcm-encounter-dto.ts, the patient fields patientData.firstName, patientData.lastName, and patientData.dob are guaranteed to be non-nullable values, so defensive null/undefined checks are not needed when accessing these fields.
Learnt from: leite08
PR: #3814
File: packages/core/src/command/consolidated/search/fhir-resource/search-semantic.ts:36-36
Timestamp: 2025-05-19T13:53:09.828Z
Learning: In the getConsolidatedPatientData
function from @metriport/core/command/consolidated/consolidated-get
, only the patient
parameter is required. Other parameters like requestId
, resources
, dateFrom
, dateTo
, fromDashboard
, and forceDataFromFhir
are all optional.
Learnt from: RamilGaripov
PR: #3976
File: packages/api/src/routes/medical/patient.ts:541-543
Timestamp: 2025-06-18T21:05:22.256Z
Learning: In packages/api/src/routes/medical/patient.ts, inline schema definitions like cohortIdSchema are acceptable and don't need to be moved to separate schema files when the user prefers to keep them inline.
Learnt from: thomasyopes
PR: #3383
File: packages/api/src/routes/ehr-internal/canvas/internal/patient.ts:10-24
Timestamp: 2025-03-05T21:42:32.929Z
Learning: The processPatientsFromAppointments endpoint in the Canvas API is intentionally designed to be asynchronous (fire-and-forget pattern). The endpoint returns a 200 OK response immediately after initiating the process, without waiting for it to complete. Error handling is handled through the .catch() method with processAsyncError rather than awaiting the promise.
Learnt from: thomasyopes
PR: #3788
File: packages/api/src/external/ehr/shared/command/create-resource-diff-bundles.ts:0-0
Timestamp: 2025-05-08T19:16:10.258Z
Learning: Avoid using Promise.all for parallel execution of functions that access the cxMapping table, as this may lead to database locking issues. Serial execution (using for loops with await) is preferred for such operations even if it might be slower for large datasets.
Learnt from: thomasyopes
PR: #4061
File: packages/api/src/external/ehr/shared/job/bundle/create-resource-diff-bundles/run-job.ts:34-47
Timestamp: 2025-06-19T22:44:49.393Z
Learning: In packages/api/src/external/ehr/shared/job/bundle/create-resource-diff-bundles/run-job.ts, the team prefers to keep refreshEhrBundles operations grouped using fire-and-forget pattern rather than awaiting all operations with Promise.allSettled(). This allows individual refresh operations to run independently without blocking the job runner.
Learnt from: keshavsaharia
PR: #4099
File: packages/utils/src/surescripts/find-largest.ts:84-84
Timestamp: 2025-06-26T06:26:17.643Z
Learning: In utility scripts for Surescripts analysis tools, when orgName parameter is undefined, prefer using date timestamps as fallbacks for directory naming rather than using cxId or other identifiers. This creates unique directories for each run and helps with organization.
Learnt from: thomasyopes
PR: #3427
File: packages/lambdas/src/ehr-sync-patient.ts:31-33
Timestamp: 2025-03-11T19:12:22.472Z
Learning: In this codebase, a specific logging pattern is used: console.log
is used for early logging before context is available, then a prefixed logger (created with prefixedLog
) is initialized after parsing data to include contextual information (like ehr, cxId, practiceId, patientId) in all subsequent log messages.
Learnt from: thomasyopes
PR: #3997
File: packages/api/src/command/job/patient/status/cancel.ts:30-34
Timestamp: 2025-06-12T23:25:08.139Z
Learning: cancelPatientJob
is intentionally non-idempotent: if the current job status is already cancelled
, the function must throw a BadRequestError
rather than silently succeeding.
packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_MedicationAdministration.ini (6)
Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_DocumentReference.ini:1-188
Timestamp: 2025-07-17T19:57:25.199Z
Learning: In packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/ directory, configuration files like config_DocumentReference.ini should be kept as-is with their hard-coded field mappings, as confirmed by user thomasyopes. The team prefers to maintain these static configuration files rather than implementing dynamic mapping generation.
Learnt from: thomasyopes
PR: #3970
File: packages/api/src/external/ehr/athenahealth/command/write-back/medication.ts:17-17
Timestamp: 2025-06-06T16:45:31.832Z
Learning: The writeMedicationToChart function in packages/api/src/external/ehr/athenahealth/command/write-back/medication.ts returns a response that is not currently used by any consumers, so changes to its return type are not breaking changes in practice.
Learnt from: keshavsaharia
PR: #3885
File: packages/core/src/external/surescripts/schema/load.ts:65-71
Timestamp: 2025-05-30T03:42:53.380Z
Learning: In Surescripts schema files like packages/core/src/external/surescripts/schema/load.ts
, the key
property in field definitions is only used for direct one-to-one field mappings. When multiple fields are derived from the same source property but with different transformations (like extracting date vs. time portions), they should not have key
properties as they represent transformations rather than direct mappings.
Learnt from: thomasyopes
PR: #3936
File: packages/core/src/external/ehr/athenahealth/index.ts:472-482
Timestamp: 2025-06-06T15:37:44.571Z
Learning: In AthenaHealth medication creation, even if medication statements have the same start/stop dates, they should not be deduplicated because there may be relevant differences beyond dates (dosage, administration routes, prescriber info, etc.) that make each statement unique and worth creating separately.
Learnt from: thomasyopes
PR: #3981
File: packages/core/src/external/ehr/canvas/index.ts:1376-1379
Timestamp: 2025-06-10T20:14:27.998Z
Learning: Canvas EHR only supports the MedicationStatement statuses "active", "stopped", and "entered-in-error"; it does not accept "completed".
Learnt from: keshavsaharia
PR: #4075
File: packages/core/src/external/surescripts/fhir/medication-request.ts:76-83
Timestamp: 2025-06-25T18:42:07.231Z
Learning: In packages/core/src/external/surescripts/fhir/medication-request.ts, the getDispenseNote and getDosageInstruction functions are intentionally identical because the dashboard requires both the note and dosageInstruction fields of MedicationRequest to be populated with detail.directions for proper note display functionality.
packages/data-transformation/fhir-to-csv/src/parseNdjsonBundle/parseNdjsonBundle.py (7)
Learnt from: thomasyopes
PR: #4165
File: packages/api/src/routes/internal/analytics-platform/index.ts:28-28
Timestamp: 2025-07-18T09:33:29.581Z
Learning: In packages/api/src/routes/internal/analytics-platform/index.ts, the patientId parameter handling is intentionally inconsistent: the /fhir-to-csv
and /fhir-to-csv/transform
endpoints require patientId (using getFromQueryOrFail) for single patient processing, while the /fhir-to-csv/batch
endpoint treats patientId as optional (using getFromQuery) because it's designed to run over all patients when no specific patient ID is provided.
Learnt from: leite08
PR: #3814
File: packages/api/src/routes/internal/medical/patient-consolidated.ts:141-174
Timestamp: 2025-05-20T21:26:26.804Z
Learning: The functionality introduced in packages/api/src/routes/internal/medical/patient-consolidated.ts is planned to be refactored in downstream PR #3857, including improvements to error handling and validation.
Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/parseFhir/parseFhir.py:94-175
Timestamp: 2025-07-17T19:58:54.738Z
Learning: The user thomasyopes prefers to keep the parseFhir.py file in packages/data-transformation/fhir-to-csv/src/parseFhir/ as-is, declining code style improvements suggested by static analysis tools like Ruff.
Learnt from: RamilGaripov
PR: #3962
File: packages/api/src/models/medical/cohort.ts:0-0
Timestamp: 2025-06-17T23:14:46.017Z
Learning: In the Metriport codebase, cohort deletion is handled with application-level validation that prevents deletion of cohorts if there are any patients assigned to them. This is why the cohortId foreign key constraint in patient_cohort table does not have cascade delete behavior - it's intentionally asymmetric. The patientId has cascade delete to clean up cohort assignments when patients are deleted, but cohortId lacks cascade delete because cohorts with patients should not be deletable at all according to business rules.
Learnt from: thomasyopes
PR: #3771
File: packages/api/src/external/ehr/healthie/command/process-patients-from-appointments.ts:121-128
Timestamp: 2025-05-01T16:15:20.823Z
Learning: In the Healthie EHR system, patient IDs are globally unique across practices, making patientId
alone sufficient for deduplication without needing to include practiceId
.
Learnt from: leite08
PR: #3412
File: packages/api/src/models/medical/patient-import-mapping.ts:16-41
Timestamp: 2025-04-26T22:17:45.248Z
Learning: The patient_import_mapping table already has a unique constraint on the fields "job_id", "row_number", and "data_pipeline_request_id" defined in the migration file 2025-04-24_02_create-patient_import_mapping-table.ts, which ensures data integrity.
Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_DocumentReference.ini:1-188
Timestamp: 2025-07-17T19:57:25.199Z
Learning: In packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/ directory, configuration files like config_DocumentReference.ini should be kept as-is with their hard-coded field mappings, as confirmed by user thomasyopes. The team prefers to maintain these static configuration files rather than implementing dynamic mapping generation.
packages/data-transformation/fhir-to-csv/main.py (18)
Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/main.py:0-0
Timestamp: 2025-07-18T09:53:38.906Z
Learning: In packages/data-transformation/fhir-to-csv/main.py, S3 upload operations should be allowed to hard fail without try-catch blocks. The team prefers immediate failure over error handling and continuation for S3 upload operations in the FHIR-to-CSV transformation process.
Learnt from: thomasyopes
PR: #4165
File: packages/api/src/routes/internal/analytics-platform/index.ts:28-28
Timestamp: 2025-07-18T09:33:29.581Z
Learning: In packages/api/src/routes/internal/analytics-platform/index.ts, the patientId parameter handling is intentionally inconsistent: the /fhir-to-csv
and /fhir-to-csv/transform
endpoints require patientId (using getFromQueryOrFail) for single patient processing, while the /fhir-to-csv/batch
endpoint treats patientId as optional (using getFromQuery) because it's designed to run over all patients when no specific patient ID is provided.
Learnt from: leite08
PR: #3814
File: packages/api/src/routes/internal/medical/patient-consolidated.ts:141-174
Timestamp: 2025-05-20T21:26:26.804Z
Learning: The functionality introduced in packages/api/src/routes/internal/medical/patient-consolidated.ts is planned to be refactored in downstream PR #3857, including improvements to error handling and validation.
Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/main.py:0-0
Timestamp: 2025-07-18T09:53:47.448Z
Learning: In packages/data-transformation/fhir-to-csv/main.py, the user thomasyopes prefers to let Snowflake operations (create_temp_tables, copy_into_temp_table, append_temp_tables, rename_temp_tables) hard fail rather than adding explicit try-catch error handling blocks.
Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:93-97
Timestamp: 2025-07-19T14:04:34.757Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes accepts SQL injection risk in the append_job_tables function's DELETE query where patient_id is directly interpolated, based on their assessment that they control all inputs to the system.
Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:0-0
Timestamp: 2025-07-18T09:54:43.598Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes prefers to log and swallow exceptions in the copy_into_temp_table function rather than re-raising them. This allows the data transformation job to continue processing other files even if one file fails to copy to Snowflake, supporting partial success scenarios in batch data processing.
Learnt from: leite08
PR: #3859
File: packages/utils/src/consolidated/check-patient-consolidated.ts:28-32
Timestamp: 2025-05-26T16:24:10.245Z
Learning: For utility scripts in the packages/utils directory, developers are expected to update the main constants at the top of the file (like bundleFilePath, cxId, patientId) rather than implementing command-line argument parsing or environment variable reading. This is the preferred pattern for configuring utility scripts in this codebase.
Learnt from: keshavsaharia
PR: #3885
File: packages/core/src/external/sftp/surescripts/client.ts:1-134
Timestamp: 2025-05-28T19:23:20.179Z
Learning: In packages/core/src/external/sftp/surescripts/client.ts, the standalone getPatientLoadFileName function intentionally omits the transmission ID from the filename, which differs from the class method getPatientLoadFileName. This difference in filename generation is expected behavior.
Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/parseFhir/parseFhir.py:94-175
Timestamp: 2025-07-17T19:58:54.738Z
Learning: The user thomasyopes prefers to keep the parseFhir.py file in packages/data-transformation/fhir-to-csv/src/parseFhir/ as-is, declining code style improvements suggested by static analysis tools like Ruff.
Learnt from: leite08
PR: #3412
File: packages/api/.env.test:12-12
Timestamp: 2025-04-24T01:57:35.097Z
Learning: The PATIENT_IMPORT_BUCKET_NAME environment variable is specific to the API package and doesn't need to be added to other packages' environment configurations.
Learnt from: lucasdellabella
PR: #3855
File: packages/core/src/command/hl7-notification/hl7-notification-webhook-sender-direct.ts:88-104
Timestamp: 2025-05-16T02:10:01.792Z
Learning: When logging file paths or other strings that might contain patient identifiers like patientId or cxId, these should be masked or redacted (e.g., using regex replacement) to avoid logging Protected Health Information (PHI) to CloudWatch or error tracking systems like Sentry, ensuring HIPAA compliance.
Learnt from: leite08
PR: #3612
File: packages/core/src/command/conversion-result/send-multiple-conversion-results.ts:48-62
Timestamp: 2025-04-05T01:03:51.989Z
Learning: Patient IDs in this codebase are UUIDs and don't contain PII, so they're safe to include in logs and error contexts.
Learnt from: RamilGaripov
PR: #3962
File: packages/api/src/sequelize/migrations/2025-06-17_00_create-cohort-tables.ts:50-56
Timestamp: 2025-06-17T20:03:18.931Z
Learning: In the Metriport codebase, the patient table ID is defined as DataTypes.STRING in the database schema, not UUID. Foreign key references to patient.id should use DataTypes.STRING to maintain consistency.
Learnt from: thomasyopes
PR: #3883
File: packages/api/src/external/ehr/shared/job/bundle/create-resource-diff-bundles/start-jobs-across-ehrs.ts:38-46
Timestamp: 2025-05-28T19:42:17.301Z
Learning: Athena patient IDs necessarily contain a "." character - this is a required format for all valid Athena patient IDs.
Learnt from: thomasyopes
PR: #3973
File: packages/core/src/external/ehr/athenahealth/shared.ts:8-16
Timestamp: 2025-06-06T18:17:23.577Z
Learning: For internal functions in the Metriport codebase where inputs are controlled by the development team, input validation for required parameters like cxId and practiceId is not needed and should be avoided to keep functions simple.
Learnt from: RamilGaripov
PR: #3675
File: packages/core/src/command/hl7v2-subscriptions/hl7v2-to-fhir-converter.ts:58-59
Timestamp: 2025-04-23T08:43:00.803Z
Learning: In the Metriport codebase, cxId is expected to be a UUID format (containing only alphanumeric characters and hyphens), which doesn't require URL encoding when used in query parameters.
Learnt from: leite08
PR: #3869
File: packages/core/src/external/fhir/export/string/resources/medication-request.ts:93-94
Timestamp: 2025-05-21T16:49:07.877Z
Learning: Raw ISO date strings in FHIR resources (like authoredOn in MedicationRequest) should be preserved as-is without additional formatting when converting resources to string representations.
Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:54-58
Timestamp: 2025-07-18T09:54:01.790Z
Learning: The user thomasyopes prefers to keep the setupSnowflake.py file in packages/data-transformation/fhir-to-csv/src/setupSnowflake/ as-is, declining suggestions for additional error handling around configuration file parsing.
packages/data-transformation/fhir-to-csv/src/utils/database.py (9)
Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:93-97
Timestamp: 2025-07-19T14:04:34.757Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes accepts SQL injection risk in the append_job_tables function's DELETE query where patient_id is directly interpolated, based on their assessment that they control all inputs to the system.
Learnt from: RamilGaripov
PR: #3962
File: packages/api/src/sequelize/migrations/2025-06-17_00_create-cohort-tables.ts:50-56
Timestamp: 2025-06-17T20:03:18.931Z
Learning: In the Metriport codebase, the patient table ID is defined as DataTypes.STRING in the database schema, not UUID. Foreign key references to patient.id should use DataTypes.STRING to maintain consistency.
Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/main.py:0-0
Timestamp: 2025-07-18T09:53:47.448Z
Learning: In packages/data-transformation/fhir-to-csv/main.py, the user thomasyopes prefers to let Snowflake operations (create_temp_tables, copy_into_temp_table, append_temp_tables, rename_temp_tables) hard fail rather than adding explicit try-catch error handling blocks.
Learnt from: thomasyopes
PR: #4165
File: packages/api/src/routes/internal/analytics-platform/index.ts:28-28
Timestamp: 2025-07-18T09:33:29.581Z
Learning: In packages/api/src/routes/internal/analytics-platform/index.ts, the patientId parameter handling is intentionally inconsistent: the /fhir-to-csv
and /fhir-to-csv/transform
endpoints require patientId (using getFromQueryOrFail) for single patient processing, while the /fhir-to-csv/batch
endpoint treats patientId as optional (using getFromQuery) because it's designed to run over all patients when no specific patient ID is provided.
Learnt from: leite08
PR: #3412
File: packages/api/src/models/medical/patient-import-mapping.ts:16-41
Timestamp: 2025-04-26T22:17:45.248Z
Learning: The patient_import_mapping table already has a unique constraint on the fields "job_id", "row_number", and "data_pipeline_request_id" defined in the migration file 2025-04-24_02_create-patient_import_mapping-table.ts, which ensures data integrity.
Learnt from: leite08
PR: #3814
File: packages/api/src/routes/internal/medical/patient-consolidated.ts:141-174
Timestamp: 2025-05-20T21:26:26.804Z
Learning: The functionality introduced in packages/api/src/routes/internal/medical/patient-consolidated.ts is planned to be refactored in downstream PR #3857, including improvements to error handling and validation.
Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:0-0
Timestamp: 2025-07-18T09:54:43.598Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes prefers to log and swallow exceptions in the copy_into_temp_table function rather than re-raising them. This allows the data transformation job to continue processing other files even if one file fails to copy to Snowflake, supporting partial success scenarios in batch data processing.
Learnt from: leite08
PR: #3412
File: packages/api/src/models/medical/patient-import-mapping.ts:16-41
Timestamp: 2025-04-26T22:17:45.248Z
Learning: The patient_import_mapping table already has a unique constraint on the fields "job_id", "row_number", and "data_pipeline_request_id" defined in the migration file, which prevents duplicates in the mapping between patient imports and data pipeline requests.
Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/parseFhir/parseFhir.py:94-175
Timestamp: 2025-07-17T19:58:54.738Z
Learning: The user thomasyopes prefers to keep the parseFhir.py file in packages/data-transformation/fhir-to-csv/src/parseFhir/ as-is, declining code style improvements suggested by static analysis tools like Ruff.
packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py (10)
Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:93-97
Timestamp: 2025-07-19T14:04:34.757Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes accepts SQL injection risk in the append_job_tables function's DELETE query where patient_id is directly interpolated, based on their assessment that they control all inputs to the system.
Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:0-0
Timestamp: 2025-07-18T09:54:43.598Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes prefers to log and swallow exceptions in the copy_into_temp_table function rather than re-raising them. This allows the data transformation job to continue processing other files even if one file fails to copy to Snowflake, supporting partial success scenarios in batch data processing.
Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/main.py:0-0
Timestamp: 2025-07-18T09:53:47.448Z
Learning: In packages/data-transformation/fhir-to-csv/main.py, the user thomasyopes prefers to let Snowflake operations (create_temp_tables, copy_into_temp_table, append_temp_tables, rename_temp_tables) hard fail rather than adding explicit try-catch error handling blocks.
Learnt from: thomasyopes
PR: #4165
File: packages/api/src/routes/internal/analytics-platform/index.ts:28-28
Timestamp: 2025-07-18T09:33:29.581Z
Learning: In packages/api/src/routes/internal/analytics-platform/index.ts, the patientId parameter handling is intentionally inconsistent: the /fhir-to-csv
and /fhir-to-csv/transform
endpoints require patientId (using getFromQueryOrFail) for single patient processing, while the /fhir-to-csv/batch
endpoint treats patientId as optional (using getFromQuery) because it's designed to run over all patients when no specific patient ID is provided.
Learnt from: RamilGaripov
PR: #3962
File: packages/api/src/sequelize/migrations/2025-06-17_00_create-cohort-tables.ts:50-56
Timestamp: 2025-06-17T20:03:18.931Z
Learning: In the Metriport codebase, the patient table ID is defined as DataTypes.STRING in the database schema, not UUID. Foreign key references to patient.id should use DataTypes.STRING to maintain consistency.
Learnt from: leite08
PR: #3412
File: packages/api/src/models/medical/patient-import-mapping.ts:16-41
Timestamp: 2025-04-26T22:17:45.248Z
Learning: The patient_import_mapping table already has a unique constraint on the fields "job_id", "row_number", and "data_pipeline_request_id" defined in the migration file 2025-04-24_02_create-patient_import_mapping-table.ts, which ensures data integrity.
Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:54-58
Timestamp: 2025-07-18T09:54:01.790Z
Learning: The user thomasyopes prefers to keep the setupSnowflake.py file in packages/data-transformation/fhir-to-csv/src/setupSnowflake/ as-is, declining suggestions for additional error handling around configuration file parsing.
Learnt from: leite08
PR: #3814
File: packages/api/src/routes/internal/medical/patient-consolidated.ts:141-174
Timestamp: 2025-05-20T21:26:26.804Z
Learning: The functionality introduced in packages/api/src/routes/internal/medical/patient-consolidated.ts is planned to be refactored in downstream PR #3857, including improvements to error handling and validation.
Learnt from: keshavsaharia
PR: #3885
File: packages/core/src/external/sftp/surescripts/client.ts:1-134
Timestamp: 2025-05-28T19:23:20.179Z
Learning: In packages/core/src/external/sftp/surescripts/client.ts, the standalone getPatientLoadFileName function intentionally omits the transmission ID from the filename, which differs from the class method getPatientLoadFileName. This difference in filename generation is expected behavior.
Learnt from: leite08
PR: #3412
File: packages/api/src/models/medical/patient-import-mapping.ts:16-41
Timestamp: 2025-04-26T22:17:45.248Z
Learning: The patient_import_mapping table already has a unique constraint on the fields "job_id", "row_number", and "data_pipeline_request_id" defined in the migration file, which prevents duplicates in the mapping between patient imports and data pipeline requests.
🧬 Code Graph Analysis (2)
packages/utils/src/snowflake/bulk-insert-patients.ts (2)
packages/utils/src/patient/get-ids.ts (1)
getAllPatientIds
(22-29)packages/utils/src/shared/duration.ts (1)
elapsedTimeAsStr
(8-13)
packages/data-transformation/fhir-to-csv/src/utils/database.py (1)
packages/data-transformation/fhir-to-csv/src/utils/file.py (1)
strip_config_file_name
(6-7)
🪛 Ruff (0.12.2)
packages/data-transformation/fhir-to-csv/main.py
31-31: Local variable patient_ids_and_bundle_keys
is assigned to but never used
Remove assignment to unused variable patient_ids_and_bundle_keys
(F841)
51-51: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: check-pr / lint-build-test
- GitHub Check: check-pr / lint-build-test
- GitHub Check: check-pr / lint-build-test
- GitHub Check: Analyze (javascript)
🔇 Additional comments (16)
packages/utils/src/snowflake/bulk-insert-patients.ts (3)
58-58
: Good enhancement for traceability.Adding jobId to the log messages improves observability and debugging capabilities for bulk operations.
Also applies to: 93-93
65-65
: Downstream handling of shared jobId is verifiedAll downstream Lambdas parse and use both jobId and patientId, so they correctly distinguish records when multiple patients share the same jobId. Specifically:
- packages/lambdas/src/analytics-platform/fhir-to-csv.ts
- packages/lambdas/src/patient-monitoring/discharge-requery.ts
- packages/lambdas/src/sqs-to-converter.ts
- packages/lambdas/src/patient-import-query.ts
No further changes required.
50-50
: jobId format verified as compatible with downstream processingAll references to
jobId
across SQS, Lambda consumers, Python transformations, and Snowflake table naming handle the timestamp format correctly:
- The Python fhir-to-csv code replaces only
:
and.
(none remain after our.replace(/[:.]/g, "-")
) and then maps hyphens to underscores for table names.- S3 key paths and HTTP route parameters accept hyphens and the “T” separator without issue.
No changes required.
packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_FamilyMemberHistory_condition.ini (1)
8-8
: LGTM: Consistent identifier mapping added.The addition of
familymemberhistory_id = id
mapping in both sections supports better data identification and joining capabilities, aligning with the patient-level processing refactor.Also applies to: 49-49
packages/data-transformation/fhir-to-csv/src/utils/database.py (2)
3-5
: LGTM: Temporary database naming change.The "ANALYTICS2_" prefix with TODO comment indicates a planned migration strategy. This approach allows for safe testing of the new patient-level processing without affecting existing data.
10-11
: LGTM: Function signature updated for patient-level processing.The addition of
patient_id
parameter and its incorporation into the table naming patternjob_{job_id}_pt_{patient_id}_{table_name}
properly supports the shift to patient-specific data processing and storage segregation.packages/data-transformation/fhir-to-csv/src/parseNdjsonBundle/parseNdjsonBundle.py (2)
27-27
: LGTM: Function signature updated for patient-level processing.The addition of
patient_id
parameter aligns with the broader refactoring to handle single-patient, single-bundle processing per job.
41-41
: LGTM: Patient-specific temporary file naming.Including
patient_id
in temporary file names (temp_{patient_id}_{resource_type.lower()}.ndjson
) provides better isolation and supports the data cleanup capabilities mentioned in previous reviews.packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_MedicationAdministration.ini (2)
17-19
: LGTM: New subject and context mappings added.The addition of
subject_type
,context_reference
, andcontext_type
fields expands the data extraction capabilities for MedicationAdministration resources, with consistent mapping across both configuration sections.Also applies to: 89-91
28-28
: LGTM: Field reordering maintained consistency.The repositioning of
effectiveperiod_start
maintains the logical grouping of related fields without affecting parsing functionality.Also applies to: 100-100
packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_MedicationDispense.ini (2)
7-104
: LGTM: Comprehensive MedicationDispense field mappings added.The extensive expansion of field mappings covers a wide range of FHIR MedicationDispense resource fields including metadata, identifiers, status information, performers, dosage instructions, substitution details, and event history. All mappings are consistently applied across both configuration sections and follow proper naming conventions.
109-206
: LGTM: Consistent root path mappings.The
[root_paths]
section properly mirrors all the structural mappings, ensuring consistent data extraction across the parsing pipeline.packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py (3)
43-61
: Good addition of patient-level traceabilityThe inclusion of
patient_id
in the function signature and job table naming is well-implemented. The added logging will help with debugging and monitoring patient-specific transformations.
63-74
: Improved connection management and parameter consistencyGood refactoring:
- Using context manager for automatic connection cleanup
- Consistent parameter ordering (cx_id, patient_id, job_id)
- Simplified transaction handling by removing explicit autocommit
94-124
: Well-structured S3 to Snowflake copy operationGood improvements:
- Clear logging of S3 URLs and operations
- Proper stage cleanup in the normal flow
- Exception handling allows partial success (as per team's design decision)
packages/data-transformation/fhir-to-csv/main.py (1)
78-120
: Well-implemented single-patient processing logicThe handler properly enforces single-patient processing with:
- Mandatory patient_id validation
- Consistent parameter ordering in Snowflake function calls
- Clear rebuild logic based on input_bundle presence
- Proper job_id sanitization for compatibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
packages/data-transformation/fhir-to-csv/main.py (2)
58-60
: Fix inconsistent return type.The function should return an empty list instead of
None
for consistency with the return type annotation.if entries is None or len(entries) < 1: logging.warning(f"Bundle {bundle_key} has no entries") - return + return []
69-69
: S3 path structure correctly updated with patient ID.The S3 key structure now includes patient_id as intended, supporting the patient-level data organization.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
packages/data-transformation/fhir-to-csv/main.py
(3 hunks)packages/data-transformation/fhir-to-csv/requirements.txt
(0 hunks)packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py
(4 hunks)packages/data-transformation/fhir-to-csv/src/utils/database.py
(1 hunks)
💤 Files with no reviewable changes (1)
- packages/data-transformation/fhir-to-csv/requirements.txt
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/data-transformation/fhir-to-csv/src/utils/database.py
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: leite08
PR: metriport/metriport#3814
File: packages/api/src/routes/internal/medical/patient-consolidated.ts:141-174
Timestamp: 2025-05-20T21:26:26.804Z
Learning: The functionality introduced in packages/api/src/routes/internal/medical/patient-consolidated.ts is planned to be refactored in downstream PR #3857, including improvements to error handling and validation.
Learnt from: leite08
PR: metriport/metriport#3412
File: packages/api/src/models/medical/patient-import-mapping.ts:16-41
Timestamp: 2025-04-26T22:17:45.248Z
Learning: The patient_import_mapping table already has a unique constraint on the fields "job_id", "row_number", and "data_pipeline_request_id" defined in the migration file 2025-04-24_02_create-patient_import_mapping-table.ts, which ensures data integrity.
Learnt from: thomasyopes
PR: metriport/metriport#3427
File: packages/core/src/external/ehr/api/sync-patient.ts:16-55
Timestamp: 2025-03-11T20:42:46.516Z
Learning: In the patient synchronization architecture, the flow follows this pattern: (1) `ehr-sync-patient-cloud.ts` sends messages to an SQS queue, (2) the `ehr-sync-patient` Lambda consumes these messages, and (3) the Lambda uses the `syncPatient` function to make the API calls to process the patient data.
Learnt from: thomasyopes
PR: metriport/metriport#3882
File: packages/core/src/external/ehr/command/get-bundle-by-resource-type.ts:31-37
Timestamp: 2025-06-05T14:09:59.683Z
Learning: When EHR handlers in mapping objects are set to `undefined` and there are downstream PRs mentioned in the PR objectives, this typically indicates a staged rollout approach where the core structure is implemented first and specific handler implementations follow in subsequent PRs.
Learnt from: thomasyopes
PR: metriport/metriport#3466
File: packages/api/src/routes/ehr/elation/chart.ts:20-37
Timestamp: 2025-03-17T15:30:34.647Z
Learning: The request body parsing in packages/api/src/routes/ehr/elation/chart.ts is tracked with TODO referencing issue #2170 and will be handled in a separate PR, not in PR #3466.
Learnt from: RamilGaripov
PR: metriport/metriport#3676
File: packages/core/src/command/hl7v2-subscriptions/hl7v2-to-fhir-conversion/shared.ts:1-10
Timestamp: 2025-04-16T00:25:25.196Z
Learning: The circular dependency between shared.ts (importing getPatientIdsOrFail) and adt/utils.ts (using unpackPidFieldOrFail) will be addressed in a follow-up PR.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/api/src/routes/internal/analytics-platform/index.ts:28-28
Timestamp: 2025-07-18T09:33:29.581Z
Learning: In packages/api/src/routes/internal/analytics-platform/index.ts, the patientId parameter handling is intentionally inconsistent: the `/fhir-to-csv` and `/fhir-to-csv/transform` endpoints require patientId (using getFromQueryOrFail) for single patient processing, while the `/fhir-to-csv/batch` endpoint treats patientId as optional (using getFromQuery) because it's designed to run over all patients when no specific patient ID is provided.
Learnt from: leite08
PR: metriport/metriport#4193
File: packages/carequality-sdk/src/client/carequality-fhir.ts:198-208
Timestamp: 2025-07-16T00:54:56.156Z
Learning: User leite08 accepted the risk of parameter mutation in packages/carequality-sdk/src/client/carequality-fhir.ts when modifying org.active = false directly on input parameters, choosing to proceed despite the violation of the "Avoid modifying objects received as parameter" coding guideline.
Learnt from: leite08
PR: metriport/metriport#4215
File: packages/core/src/command/analytics-platform/config.ts:4-13
Timestamp: 2025-07-20T01:18:28.093Z
Learning: User leite08 prefers that code improvement suggestions (like consolidating duplicated schemas) be made on feature PRs rather than release PRs. Such improvements should be deferred to follow-up PRs during the release cycle.
Learnt from: leite08
PR: metriport/metriport#3648
File: packages/utils/src/shared/patient-create.ts:35-53
Timestamp: 2025-04-10T17:30:30.368Z
Learning: The function `mergePatients` in the patient-create.ts file is intentionally designed to take all properties from the first patient (p1) and only update address and contact fields, as the system supports duplicated patients specifically to allow multiple addresses across different records.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_DocumentReference.ini:1-188
Timestamp: 2025-07-17T19:57:25.199Z
Learning: In packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/ directory, configuration files like config_DocumentReference.ini should be kept as-is with their hard-coded field mappings, as confirmed by user thomasyopes. The team prefers to maintain these static configuration files rather than implementing dynamic mapping generation.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:93-97
Timestamp: 2025-07-19T14:04:34.757Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes accepts SQL injection risk in the append_job_tables function's DELETE query where patient_id is directly interpolated, based on their assessment that they control all inputs to the system.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/main.py:0-0
Timestamp: 2025-07-18T09:53:38.906Z
Learning: In packages/data-transformation/fhir-to-csv/main.py, S3 upload operations should be allowed to hard fail without try-catch blocks. The team prefers immediate failure over error handling and continuation for S3 upload operations in the FHIR-to-CSV transformation process.
Learnt from: leite08
PR: metriport/metriport#3412
File: packages/api/src/models/medical/patient-import-mapping.ts:16-41
Timestamp: 2025-04-26T22:17:45.248Z
Learning: The patient_import_mapping table already has a unique constraint on the fields "job_id", "row_number", and "data_pipeline_request_id" defined in the migration file, which prevents duplicates in the mapping between patient imports and data pipeline requests.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/main.py:0-0
Timestamp: 2025-07-18T09:53:47.448Z
Learning: In packages/data-transformation/fhir-to-csv/main.py, the user thomasyopes prefers to let Snowflake operations (create_temp_tables, copy_into_temp_table, append_temp_tables, rename_temp_tables) hard fail rather than adding explicit try-catch error handling blocks.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:0-0
Timestamp: 2025-07-18T09:54:43.598Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes prefers to log and swallow exceptions in the copy_into_temp_table function rather than re-raising them. This allows the data transformation job to continue processing other files even if one file fails to copy to Snowflake, supporting partial success scenarios in batch data processing.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:54-58
Timestamp: 2025-07-18T09:54:01.790Z
Learning: The user thomasyopes prefers to keep the setupSnowflake.py file in packages/data-transformation/fhir-to-csv/src/setupSnowflake/ as-is, declining suggestions for additional error handling around configuration file parsing.
Learnt from: leite08
PR: metriport/metriport#4013
File: packages/core/src/fhir-to-cda/cda-templates/components/procedures.ts:104-105
Timestamp: 2025-06-12T22:53:09.606Z
Learning: User leite08 prefers responses in English only.
Learnt from: leite08
PR: metriport/metriport#3991
File: packages/core/src/external/aws/s3.ts:621-630
Timestamp: 2025-06-10T22:20:21.203Z
Learning: User leite08 prefers that all review comments are written in English.
Learnt from: leite08
PR: metriport/metriport#3912
File: packages/core/src/command/consolidated/search/fhir-resource/search-consolidated.ts:74-83
Timestamp: 2025-06-04T00:14:38.041Z
Learning: User leite08 prefers incremental progress in refactoring scenarios, accepting some inconsistency as a stepping stone towards better architecture rather than requiring perfect consistency immediately.
packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py (11)
Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:93-97
Timestamp: 2025-07-19T14:04:34.757Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes accepts SQL injection risk in the append_job_tables function's DELETE query where patient_id is directly interpolated, based on their assessment that they control all inputs to the system.
Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:0-0
Timestamp: 2025-07-18T09:54:43.598Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes prefers to log and swallow exceptions in the copy_into_temp_table function rather than re-raising them. This allows the data transformation job to continue processing other files even if one file fails to copy to Snowflake, supporting partial success scenarios in batch data processing.
Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/main.py:0-0
Timestamp: 2025-07-18T09:53:47.448Z
Learning: In packages/data-transformation/fhir-to-csv/main.py, the user thomasyopes prefers to let Snowflake operations (create_temp_tables, copy_into_temp_table, append_temp_tables, rename_temp_tables) hard fail rather than adding explicit try-catch error handling blocks.
Learnt from: RamilGaripov
PR: #3962
File: packages/api/src/sequelize/migrations/2025-06-17_00_create-cohort-tables.ts:50-56
Timestamp: 2025-06-17T20:03:18.931Z
Learning: In the Metriport codebase, the patient table ID is defined as DataTypes.STRING in the database schema, not UUID. Foreign key references to patient.id should use DataTypes.STRING to maintain consistency.
Learnt from: thomasyopes
PR: #4165
File: packages/api/src/routes/internal/analytics-platform/index.ts:28-28
Timestamp: 2025-07-18T09:33:29.581Z
Learning: In packages/api/src/routes/internal/analytics-platform/index.ts, the patientId parameter handling is intentionally inconsistent: the /fhir-to-csv
and /fhir-to-csv/transform
endpoints require patientId (using getFromQueryOrFail) for single patient processing, while the /fhir-to-csv/batch
endpoint treats patientId as optional (using getFromQuery) because it's designed to run over all patients when no specific patient ID is provided.
Learnt from: leite08
PR: #3412
File: packages/api/src/models/medical/patient-import-mapping.ts:16-41
Timestamp: 2025-04-26T22:17:45.248Z
Learning: The patient_import_mapping table already has a unique constraint on the fields "job_id", "row_number", and "data_pipeline_request_id" defined in the migration file 2025-04-24_02_create-patient_import_mapping-table.ts, which ensures data integrity.
Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:54-58
Timestamp: 2025-07-18T09:54:01.790Z
Learning: The user thomasyopes prefers to keep the setupSnowflake.py file in packages/data-transformation/fhir-to-csv/src/setupSnowflake/ as-is, declining suggestions for additional error handling around configuration file parsing.
Learnt from: leite08
PR: #3814
File: packages/api/src/routes/internal/medical/patient-consolidated.ts:141-174
Timestamp: 2025-05-20T21:26:26.804Z
Learning: The functionality introduced in packages/api/src/routes/internal/medical/patient-consolidated.ts is planned to be refactored in downstream PR #3857, including improvements to error handling and validation.
Learnt from: leite08
PR: #3412
File: packages/api/src/models/medical/patient-import-mapping.ts:16-41
Timestamp: 2025-04-26T22:17:45.248Z
Learning: The patient_import_mapping table already has a unique constraint on the fields "job_id", "row_number", and "data_pipeline_request_id" defined in the migration file, which prevents duplicates in the mapping between patient imports and data pipeline requests.
Learnt from: keshavsaharia
PR: #3885
File: packages/core/src/external/sftp/surescripts/client.ts:1-134
Timestamp: 2025-05-28T19:23:20.179Z
Learning: In packages/core/src/external/sftp/surescripts/client.ts, the standalone getPatientLoadFileName function intentionally omits the transmission ID from the filename, which differs from the class method getPatientLoadFileName. This difference in filename generation is expected behavior.
Learnt from: thomasyopes
PR: #3427
File: packages/core/src/external/ehr/api/sync-patient.ts:16-55
Timestamp: 2025-03-11T20:42:46.516Z
Learning: In the patient synchronization architecture, the flow follows this pattern: (1) ehr-sync-patient-cloud.ts
sends messages to an SQS queue, (2) the ehr-sync-patient
Lambda consumes these messages, and (3) the Lambda uses the syncPatient
function to make the API calls to process the patient data.
packages/data-transformation/fhir-to-csv/main.py (24)
Learnt from: thomasyopes
PR: #4165
File: packages/api/src/routes/internal/analytics-platform/index.ts:28-28
Timestamp: 2025-07-18T09:33:29.581Z
Learning: In packages/api/src/routes/internal/analytics-platform/index.ts, the patientId parameter handling is intentionally inconsistent: the /fhir-to-csv
and /fhir-to-csv/transform
endpoints require patientId (using getFromQueryOrFail) for single patient processing, while the /fhir-to-csv/batch
endpoint treats patientId as optional (using getFromQuery) because it's designed to run over all patients when no specific patient ID is provided.
Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/main.py:0-0
Timestamp: 2025-07-18T09:53:38.906Z
Learning: In packages/data-transformation/fhir-to-csv/main.py, S3 upload operations should be allowed to hard fail without try-catch blocks. The team prefers immediate failure over error handling and continuation for S3 upload operations in the FHIR-to-CSV transformation process.
Learnt from: leite08
PR: #3814
File: packages/api/src/routes/internal/medical/patient-consolidated.ts:141-174
Timestamp: 2025-05-20T21:26:26.804Z
Learning: The functionality introduced in packages/api/src/routes/internal/medical/patient-consolidated.ts is planned to be refactored in downstream PR #3857, including improvements to error handling and validation.
Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/main.py:0-0
Timestamp: 2025-07-18T09:53:47.448Z
Learning: In packages/data-transformation/fhir-to-csv/main.py, the user thomasyopes prefers to let Snowflake operations (create_temp_tables, copy_into_temp_table, append_temp_tables, rename_temp_tables) hard fail rather than adding explicit try-catch error handling blocks.
Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:93-97
Timestamp: 2025-07-19T14:04:34.757Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes accepts SQL injection risk in the append_job_tables function's DELETE query where patient_id is directly interpolated, based on their assessment that they control all inputs to the system.
Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:0-0
Timestamp: 2025-07-18T09:54:43.598Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes prefers to log and swallow exceptions in the copy_into_temp_table function rather than re-raising them. This allows the data transformation job to continue processing other files even if one file fails to copy to Snowflake, supporting partial success scenarios in batch data processing.
Learnt from: keshavsaharia
PR: #3885
File: packages/core/src/external/sftp/surescripts/client.ts:1-134
Timestamp: 2025-05-28T19:23:20.179Z
Learning: In packages/core/src/external/sftp/surescripts/client.ts, the standalone getPatientLoadFileName function intentionally omits the transmission ID from the filename, which differs from the class method getPatientLoadFileName. This difference in filename generation is expected behavior.
Learnt from: leite08
PR: #3412
File: packages/api/.env.test:12-12
Timestamp: 2025-04-24T01:57:35.097Z
Learning: The PATIENT_IMPORT_BUCKET_NAME environment variable is specific to the API package and doesn't need to be added to other packages' environment configurations.
Learnt from: leite08
PR: #3814
File: packages/core/src/command/consolidated/search/fhir-resource/search-semantic.ts:36-36
Timestamp: 2025-05-19T13:53:09.828Z
Learning: In the getConsolidatedPatientData
function from @metriport/core/command/consolidated/consolidated-get
, only the patient
parameter is required. Other parameters like requestId
, resources
, dateFrom
, dateTo
, fromDashboard
, and forceDataFromFhir
are all optional.
Learnt from: thomasyopes
PR: #3427
File: packages/core/src/external/ehr/api/sync-patient.ts:16-55
Timestamp: 2025-03-11T20:42:46.516Z
Learning: In the patient synchronization architecture, the flow follows this pattern: (1) ehr-sync-patient-cloud.ts
sends messages to an SQS queue, (2) the ehr-sync-patient
Lambda consumes these messages, and (3) the Lambda uses the syncPatient
function to make the API calls to process the patient data.
Learnt from: lucasdellabella
PR: #3855
File: packages/core/src/command/hl7-notification/hl7-notification-webhook-sender-direct.ts:88-104
Timestamp: 2025-05-16T02:10:01.792Z
Learning: When logging file paths or other strings that might contain patient identifiers like patientId or cxId, these should be masked or redacted (e.g., using regex replacement) to avoid logging Protected Health Information (PHI) to CloudWatch or error tracking systems like Sentry, ensuring HIPAA compliance.
Learnt from: leite08
PR: #3612
File: packages/core/src/command/conversion-result/send-multiple-conversion-results.ts:48-62
Timestamp: 2025-04-05T01:03:51.989Z
Learning: Patient IDs in this codebase are UUIDs and don't contain PII, so they're safe to include in logs and error contexts.
Learnt from: RamilGaripov
PR: #3962
File: packages/api/src/sequelize/migrations/2025-06-17_00_create-cohort-tables.ts:50-56
Timestamp: 2025-06-17T20:03:18.931Z
Learning: In the Metriport codebase, the patient table ID is defined as DataTypes.STRING in the database schema, not UUID. Foreign key references to patient.id should use DataTypes.STRING to maintain consistency.
Learnt from: thomasyopes
PR: #3883
File: packages/api/src/external/ehr/shared/job/bundle/create-resource-diff-bundles/start-jobs-across-ehrs.ts:38-46
Timestamp: 2025-05-28T19:42:17.301Z
Learning: Athena patient IDs necessarily contain a "." character - this is a required format for all valid Athena patient IDs.
Learnt from: thomasyopes
PR: #3973
File: packages/core/src/external/ehr/athenahealth/shared.ts:8-16
Timestamp: 2025-06-06T18:17:23.577Z
Learning: For internal functions in the Metriport codebase where inputs are controlled by the development team, input validation for required parameters like cxId and practiceId is not needed and should be avoided to keep functions simple.
Learnt from: RamilGaripov
PR: #3675
File: packages/core/src/command/hl7v2-subscriptions/hl7v2-to-fhir-converter.ts:58-59
Timestamp: 2025-04-23T08:43:00.803Z
Learning: In the Metriport codebase, cxId is expected to be a UUID format (containing only alphanumeric characters and hyphens), which doesn't require URL encoding when used in query parameters.
Learnt from: leite08
PR: #3869
File: packages/core/src/external/fhir/export/string/resources/medication-request.ts:93-94
Timestamp: 2025-05-21T16:49:07.877Z
Learning: Raw ISO date strings in FHIR resources (like authoredOn in MedicationRequest) should be preserved as-is without additional formatting when converting resources to string representations.
Learnt from: leite08
PR: #4194
File: packages/api/src/command/medical/patient/get-patient-facilities.ts:36-44
Timestamp: 2025-07-16T01:38:14.080Z
Learning: In packages/api/src/command/medical/patient/get-patient-facilities.ts, the patient.facilityIds field is strongly typed as array of strings (non-nullable), so null/undefined checks are not needed when accessing this property.
Learnt from: leite08
PR: #4194
File: packages/api/src/command/medical/patient/get-patient-facilities.ts:36-44
Timestamp: 2025-07-16T01:38:14.080Z
Learning: In packages/api/src/command/medical/patient/get-patient-facilities.ts, the patient.facilityIds field is strongly typed as array of strings (non-nullable), so null/undefined checks are not needed when accessing this property.
Learnt from: RamilGaripov
PR: #3676
File: packages/core/src/command/hl7v2-subscriptions/hl7v2-to-fhir-conversion/shared.ts:1-10
Timestamp: 2025-04-16T00:25:25.196Z
Learning: The circular dependency between shared.ts (importing getPatientIdsOrFail) and adt/utils.ts (using unpackPidFieldOrFail) will be addressed in a follow-up PR.
Learnt from: leite08
PR: #4095
File: packages/commonwell-cert-runner/src/single-commands/patient-create.ts:35-36
Timestamp: 2025-07-16T18:51:08.416Z
Learning: In packages/commonwell-cert-runner and the CommonWell SDK, PatientCollection.Patients is strongly typed as an array (z.array(patientCollectionItemSchema)), so resp.Patients will always exist and be an array. The getMetriportPatientIdOrFail utility function is designed to handle undefined/null input with proper error messages, making additional bounds checking unnecessary.
Learnt from: keshavsaharia
PR: #3885
File: packages/utils/src/surescripts/generate-patient-load-file.ts:92-94
Timestamp: 2025-05-30T01:11:35.300Z
Learning: In V0 of the Surescripts implementation (packages/utils/src/surescripts/generate-patient-load-file.ts), there is always only 1 facility ID, so the current facility_ids parsing logic with substring operations is sufficient and additional validation is not necessary.
Learnt from: lucasdellabella
PR: #3608
File: packages/core/src/external/ehr/resource-diff/steps/start/ehr-start-resource-diff-local.ts:0-0
Timestamp: 2025-04-21T03:47:54.332Z
Learning: In the EHR resource diff implementation, bundles should always have an entry
property, and code should be allowed to throw if entry
is missing rather than using null-coalescing operators, as this represents an unexpected state that should be caught and addressed.
Learnt from: thomasyopes
PR: #4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:54-58
Timestamp: 2025-07-18T09:54:01.790Z
Learning: The user thomasyopes prefers to keep the setupSnowflake.py file in packages/data-transformation/fhir-to-csv/src/setupSnowflake/ as-is, declining suggestions for additional error handling around configuration file parsing.
🧬 Code Graph Analysis (1)
packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py (1)
packages/data-transformation/fhir-to-csv/src/utils/database.py (6)
format_patient_status_table_name
(13-14)format_table_name_from_config_file_name
(7-8)format_job_table_name
(10-11)get_data_type
(20-26)format_database_name
(4-5)format_stage_name
(17-18)
🪛 Ruff (0.12.2)
packages/data-transformation/fhir-to-csv/main.py
52-52: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: check-pr / lint-build-test
- GitHub Check: check-pr / lint-build-test
- GitHub Check: check-pr / lint-build-test
- GitHub Check: Analyze (javascript)
🔇 Additional comments (14)
packages/data-transformation/fhir-to-csv/main.py (7)
8-8
: LGTM! New imports align with patient status tracking.The addition of
setup_database
andset_patient_status
imports correctly supports the new patient status tracking functionality mentioned in the PR objectives.Also applies to: 12-12
25-32
: LGTM! Function signature correctly updated for single-patient processing.The addition of
patient_id
as a required parameter aligns with the PR's shift from cohort-level to patient-level processing.
33-37
: LGTM! Bundle key creation logic simplified appropriately.The simplified logic for creating a single bundle key aligns well with the single-patient processing approach.
89-93
: LGTM! Parameter validation correctly enforces single-patient processing.The mandatory
patient_id
validation andjob_id
sanitization properly support the refactored single-patient processing approach.
117-118
: LGTM! Patient status tracking properly implemented.The addition of patient status tracking with "processing" and "completed" states provides good visibility into the transformation pipeline state and aligns with the PR objectives.
Also applies to: 127-128
120-125
: LGTM! Snowflake function calls correctly updated with patient_id.All Snowflake operations now properly include the
patient_id
parameter, and therebuild_patient
flag logic appropriately handles different processing scenarios.
102-102
: LGTM! Logging improvements enhance traceability.The updated log messages with
patient_id
context provide better debugging and monitoring capabilities for the single-patient processing flow.Also applies to: 113-113, 116-116, 128-128
packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py (7)
10-10
: LGTM! Import supports new patient status tracking.The addition of
format_patient_status_table_name
import correctly supports the new patient status tracking functionality.
44-52
: LGTM! Function correctly updated for patient-centric table naming.The addition of
patient_id
parameter and logging enhances the function's alignment with single-patient processing and provides better traceability.
64-69
: LGTM! Well-implemented database setup function.The new
setup_database
function properly ensures database existence before table operations and uses appropriate connection management with context managers.
70-79
: LGTM! Function correctly updated with patient_id parameter.The
create_job_tables
function properly incorporates thepatient_id
parameter and maintains good logging and connection management practices.
81-97
: LGTM! Function correctly implements patient-centric table operations.The
append_job_tables
function properly incorporatespatient_id
andrebuild_patient
parameters with appropriate logging. The SQL injection risk in the DELETE query was previously assessed and accepted by the team based on controlled inputs.
99-129
: LGTM! Function correctly updated with enhanced logging and patient_id support.The
copy_into_job_table
function properly incorporates thepatient_id
parameter and maintains the team's preferred error handling approach for partial success scenarios in batch processing.
130-140
: LGTM! Well-implemented patient status tracking function.The new
set_patient_status
function properly implements patient status tracking with appropriate table management, upsert logic, and consistent naming conventions. The direct SQL interpolation follows the team's accepted approach for controlled inputs.
026bf0a
to
12eb691
Compare
with open(local_bundle_key, "rb") as f: | ||
bundle = json.load(f) | ||
entries = bundle["entry"] | ||
if entries is None or len(entries) < 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we want to check if entries is an array too before checking length?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
damn, there's a lot of diff ways to check for arrays in Python :/
I'll leave as is for now, it seems low impact given we always generate FHIR bundles, where entry
is array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we flag putting these changes back into the fhir inferno repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, added it as an item here
429bd0b
to
e0ebf4f
Compare
Ref eng-722 Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
Ref eng-722 Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
Ref eng-722 Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
- hybrid tables - single DB conn per pt - load config files once before DB conn Ref eng-722 Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
Ref eng-722 Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
0781b15
to
e64efcf
Compare
Ref eng-722 Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
Ref eng-722 Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
Dependencies
none
Description
/fhir-to-csv/transform
endpointTesting
bulk-insert-patients.ts
scriptbulk-insert-patients.ts
scriptRelease Plan
Summary by CodeRabbit
New Features
Improvements
Chores
Bug Fixes