Skip to content

Conversation

leite08
Copy link
Member

@leite08 leite08 commented Jul 19, 2025

Issues:

Dependencies

Description

Testing

Check each PR.

Release Plan

  • ⚠️ Points to master
  • Merge upstream
  • Merge this

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Introduced a comprehensive FHIR-to-CSV data transformation pipeline with extensive configuration files for numerous FHIR resources.
    • Added new internal API endpoints to initiate FHIR-to-CSV processing jobs.
    • Integrated AWS Batch alongside Lambda and SQS for scalable analytics platform workflows.
    • Enabled automated deployments of analytics platform components via reusable GitHub Actions workflows for staging and production.
    • Provided Snowflake integration for loading and managing transformed data with dynamic table creation and data ingestion.
    • Added scripts and utilities for bulk patient data processing and transformation, including concurrency control and error handling.
  • Documentation

    • Added detailed README and configuration documentation for the FHIR-to-CSV transformation package.
  • Refactor

    • Modularized and restructured infrastructure code for improved resource management, including batch job setup and enhanced permissions.
    • Updated API stack to integrate analytics platform assets and permissions.
  • Chores

    • Added .gitignore and requirements files to support development and deployment environments.
    • Added new Dockerfiles and Docker Compose configuration for the data transformation service.

Thomas Yopes added 30 commits July 9, 2025 05:39
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
…triport/metriport into eng-602-analytics-platform-v0

Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
…om:metriport/metriport into eng-602-analytics-platform-v0

Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
…om:metriport/metriport into eng-602-analytics-platform-v0

Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
…-analytics-platform-v0

Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
…-analytics-platform-v0

Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602

Ref: #1040
Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
thomasyopes
thomasyopes previously approved these changes Jul 19, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (1)
.github/workflows/_deploy-analytics-platform.yml (1)

41-82: Workflow structure follows established patterns.

The permissions are handled in the caller workflow as previously discussed. The action versions are maintained for consistency across workflows per team preferences.

🧹 Nitpick comments (2)
packages/data-transformation/fhir-to-csv/Dockerfile (1)

4-5: Redundant COPY operation.

Line 5 copies requirements.txt again after line 4 already copied the entire current directory (which includes requirements.txt).

 WORKDIR /app
 COPY . /app
-COPY requirements.txt .
 RUN pip install --no-cache-dir -r requirements.txt
packages/data-transformation/fhir-to-csv/Dockerfile.dev (1)

4-5: Redundant COPY operation (same as main Dockerfile).

Same issue as in the main Dockerfile - line 5 redundantly copies requirements.txt after the entire directory was already copied.

 WORKDIR /app
 COPY . /app
-COPY requirements.txt .
 RUN pip install --no-cache-dir -r requirements.txt

Comment on lines +46 to +51
try:
resource = json.loads(line)
if resource['resource'].get('resourceType') == resource_type:
outfile.write(json.dumps(resource['resource']) + '\n')
except json.JSONDecodeError:
continue
Copy link

@coderabbitai coderabbitai bot Jul 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Handle JSON parsing errors appropriately.

The continue statement silently skips malformed JSON lines, which could lead to data loss without proper logging or error tracking.

Consider adding logging for skipped lines:

                except json.JSONDecodeError:
+                    # Log the malformed line for debugging
+                    print(f"Warning: Skipping malformed JSON line: {line[:100]}...")
                    continue
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
resource = json.loads(line)
if resource['resource'].get('resourceType') == resource_type:
outfile.write(json.dumps(resource['resource']) + '\n')
except json.JSONDecodeError:
continue
try:
resource = json.loads(line)
if resource['resource'].get('resourceType') == resource_type:
outfile.write(json.dumps(resource['resource']) + '\n')
except json.JSONDecodeError:
# Log the malformed line for debugging
print(f"Warning: Skipping malformed JSON line: {line[:100]}...")
continue
🤖 Prompt for AI Agents
In
packages/data-transformation/fhir-to-csv/src/parseNdjsonBundle/parseNdjsonBundle.py
around lines 46 to 51, the current code silently skips malformed JSON lines
using continue, which can cause unnoticed data loss. Modify the except block to
log a warning or error message indicating the line that failed to parse,
including the exception details. This will help track and diagnose issues with
input data without stopping the process.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leite08 Thanks for agreeing! The suggested logging improvement will help track any malformed JSON lines during the FHIR bundle processing, making it easier to diagnose data issues without stopping the entire process.

🐰✨ (happy rabbit - agreement achieved!)


🧠 Learnings used
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/src/parseFhir/parseFhir.py:94-175
Timestamp: 2025-07-17T19:58:54.703Z
Learning: The user thomasyopes prefers to keep the parseFhir.py file in packages/data-transformation/fhir-to-csv/src/parseFhir/ as-is, declining code style improvements suggested by static analysis tools like Ruff.

Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_DocumentReference.ini:1-188
Timestamp: 2025-07-17T19:57:25.159Z
Learning: In packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/ directory, configuration files like config_DocumentReference.ini should be kept as-is with their hard-coded field mappings, as confirmed by user thomasyopes. The team prefers to maintain these static configuration files rather than implementing dynamic mapping generation.

Learnt from: thomasyopes
PR: metriport/metriport#4090
File: packages/core/src/command/conversion-fhir/conversion-fhir-cloud.ts:30-30
Timestamp: 2025-06-25T01:55:42.627Z
Learning: In packages/core/src/command/conversion-fhir/conversion-fhir-cloud.ts, the FHIR converter Lambda endpoint is controlled by the team and guaranteed to return valid JSON in the expected Bundle<Resource> format, so JSON.parse() error handling and type validation are not necessary for this specific endpoint.

Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/main.py:0-0
Timestamp: 2025-07-18T09:53:38.884Z
Learning: In packages/data-transformation/fhir-to-csv/main.py, S3 upload operations should be allowed to hard fail without try-catch blocks. The team prefers immediate failure over error handling and continuation for S3 upload operations in the FHIR-to-CSV transformation process.

Learnt from: leite08
PR: metriport/metriport#3857
File: packages/core/src/command/consolidated/search/fhir-resource/search-lexical.ts:67-82
Timestamp: 2025-05-28T02:51:35.779Z
Learning: In the search-lexical.ts file, the user prefers to bubble up JSON parsing errors rather than catching and logging them. When processing FHIR resources from OpenSearch results, errors should be thrown and allowed to propagate up the call stack instead of being caught and silently ignored.

Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:0-0
Timestamp: 2025-07-18T09:54:43.558Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes prefers to log and swallow exceptions in the copy_into_temp_table function rather than re-raising them. This allows the data transformation job to continue processing other files even if one file fails to copy to Snowflake, supporting partial success scenarios in batch data processing.

Learnt from: leite08
PR: metriport/metriport#3857
File: packages/core/src/command/consolidated/search/fhir-resource/search-consolidated-lambda.ts:41-43
Timestamp: 2025-05-27T23:51:18.594Z
Learning: In the codebase, the preference is to let JSON parsing errors bubble up naturally rather than wrapping them in try-catch blocks for additional error handling.

Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/main.py:0-0
Timestamp: 2025-07-18T09:53:47.427Z
Learning: In packages/data-transformation/fhir-to-csv/main.py, the user thomasyopes prefers to let Snowflake operations (create_temp_tables, copy_into_temp_table, append_temp_tables, rename_temp_tables) hard fail rather than adding explicit try-catch error handling blocks.

Learnt from: thomasyopes
PR: metriport/metriport#3608
File: packages/lambdas/src/ehr-compute-resource-diff-bundles.ts:58-62
Timestamp: 2025-04-23T19:00:49.707Z
Learning: For Lambda functions in the metriport codebase, the team prefers to let the default Lambda error handling catch JSON parsing errors rather than adding explicit try/catch blocks.

Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:54-58
Timestamp: 2025-07-18T09:54:01.771Z
Learning: The user thomasyopes prefers to keep the setupSnowflake.py file in packages/data-transformation/fhir-to-csv/src/setupSnowflake/ as-is, declining suggestions for additional error handling around configuration file parsing.

Learnt from: thomasyopes
PR: metriport/metriport#3870
File: packages/core/src/external/ehr/bundle/job/create-resource-diff-bundles/steps/compute/ehr-compute-resource-diff-bundles-local.ts:55-91
Timestamp: 2025-05-19T18:24:41.632Z
Learning: In EHR bundles related to resource diff computations, bundle creation operations that are for auditing purposes should not stop the main job flow if they fail. These operations typically use try/catch blocks that log errors without rethrowing them, which is the intended behavior.

Learnt from: lucasdellabella
PR: metriport/metriport#3866
File: packages/core/src/command/hl7v2-subscriptions/hl7v2-roster-generator.ts:63-63
Timestamp: 2025-05-27T16:10:53.952Z
Learning: User lucasdellabella is okay with using JSON.stringify in log statements in TypeScript files, even though the coding guidelines suggest avoiding multi-line logs with JSON.stringify. They prefer to keep such logging as is when it serves the purpose effectively.

Learnt from: leite08
PR: metriport/metriport#4013
File: packages/core/src/fhir-to-cda/cda-templates/components/procedures.ts:104-105
Timestamp: 2025-06-12T22:53:09.606Z
Learning: User leite08 prefers responses in English only.

Learnt from: leite08
PR: metriport/metriport#3991
File: packages/core/src/external/aws/s3.ts:621-630
Timestamp: 2025-06-10T22:20:21.203Z
Learning: User leite08 prefers that all review comments are written in English.

Learnt from: leite08
PR: metriport/metriport#3912
File: packages/core/src/command/consolidated/search/fhir-resource/search-consolidated.ts:74-83
Timestamp: 2025-06-04T00:14:38.041Z
Learning: User leite08 prefers incremental progress in refactoring scenarios, accepting some inconsistency as a stepping stone towards better architecture rather than requiring perfect consistency immediately.

Learnt from: leite08
PR: metriport/metriport#0
File: :0-0
Timestamp: 2025-03-05T18:43:30.389Z
Learning: For backmerge PRs at metriport/metriport, only verify two things: (1) that the source branch is `master` and destination branch is `develop`, and (2) that there's a link to at least one PR (usually a "patch" PR) in the description. No need for detailed review comments or updates to the PR description unless there's an issue with these criteria.

Learnt from: leite08
PR: metriport/metriport#3611
File: packages/utils/src/fhir-converter/s3-to-lambda.ts:29-47
Timestamp: 2025-04-05T16:02:02.517Z
Learning: For release PRs (where head branch is `develop` and base branch is `master`), don't suggest code changes as these have already been reviewed in previous PRs. Instead, focus only on checking if the PR description follows the template and practices, and if the PR is pointing to the correct base branch (`master`).

Learnt from: lucasdellabella
PR: metriport/metriport#3907
File: packages/terminology/src/seed/seed-ndc-from-fda.ts:91-113
Timestamp: 2025-05-28T05:28:26.896Z
Learning: On PRs with "release" in the title, only provide very high priority feedback and avoid lower priority suggestions like refactoring or optimization improvements.

Learnt from: leite08
PR: metriport/metriport#3749
File: packages/core/src/command/patient-import/record/create-headers-file.ts:16-27
Timestamp: 2025-04-28T22:57:36.763Z
Learning: Focus improvement suggestions on feature branches rather than release PRs. Release PRs should be treated with higher caution as they're focused on getting tested code into production.

Learnt from: lucasdellabella
PR: metriport/metriport#0
File: :0-0
Timestamp: 2025-03-14T13:44:44.651Z
Learning: Avoid adding AI-generated summaries to PRs with "RELEASE" in the title.

import os
import configparser
import logging
import json
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove unused import.

The json module is imported but never used.

-import json
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import json
🧰 Tools
🪛 Ruff (0.12.2)

5-5: json imported but unused

Remove unused import: json

(F401)

🤖 Prompt for AI Agents
In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py
at line 5, the json module is imported but not used anywhere in the file. Remove
the line importing json to clean up unused imports.

Comment on lines +240 to +244
try:
config.read(configPath)
except:
logging.exception(f"Failed to read or parse the configuration file: {configPath}. Error: {e}")
return
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix undefined variable in exception handler.

The variable e is not defined in the except block, which will cause a NameError at runtime.

 try:
     config.read(configPath)
-except:
-    logging.exception(f"Failed to read or parse the configuration file: {configPath}. Error: {e}")
+except Exception as e:
+    logging.exception(f"Failed to read or parse the configuration file: {configPath}. Error: {e}")
     return
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
config.read(configPath)
except:
logging.exception(f"Failed to read or parse the configuration file: {configPath}. Error: {e}")
return
try:
config.read(configPath)
except Exception as e:
logging.exception(f"Failed to read or parse the configuration file: {configPath}. Error: {e}")
return
🧰 Tools
🪛 Ruff (0.12.2)

242-242: Do not use bare except

(E722)


243-243: Undefined name e

(F821)

🤖 Prompt for AI Agents
In packages/data-transformation/fhir-to-csv/src/parseFhir/parseFhir.py around
lines 240 to 244, the except block references an undefined variable 'e' in the
logging.exception call. Modify the except clause to capture the exception as 'e'
by changing it to 'except Exception as e:', so that the error variable is
defined and can be logged properly.

Ref eng-602

Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
leite08 added 2 commits July 19, 2025 22:22
Ref eng-602

Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
ENG-602 FhirToCsv lambda calls the transform one directly
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants