-
Notifications
You must be signed in to change notification settings - Fork 72
RELEASE ENG-602 MVP Snowflake #4215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
…triport/metriport into eng-602-analytics-platform-v0 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
…om:metriport/metriport into eng-602-analytics-platform-v0 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
…om:metriport/metriport into eng-602-analytics-platform-v0 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
…-analytics-platform-v0 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
…-analytics-platform-v0 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
Ref: ENG-602 Ref: #1040 Signed-off-by: Thomas Yopes <thomasyopes@Mac.attlocal.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
♻️ Duplicate comments (1)
.github/workflows/_deploy-analytics-platform.yml (1)
41-82
: Workflow structure follows established patterns.The permissions are handled in the caller workflow as previously discussed. The action versions are maintained for consistency across workflows per team preferences.
🧹 Nitpick comments (2)
packages/data-transformation/fhir-to-csv/Dockerfile (1)
4-5
: Redundant COPY operation.Line 5 copies
requirements.txt
again after line 4 already copied the entire current directory (which includesrequirements.txt
).WORKDIR /app COPY . /app -COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt
packages/data-transformation/fhir-to-csv/Dockerfile.dev (1)
4-5
: Redundant COPY operation (same as main Dockerfile).Same issue as in the main Dockerfile - line 5 redundantly copies
requirements.txt
after the entire directory was already copied.WORKDIR /app COPY . /app -COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt
try: | ||
resource = json.loads(line) | ||
if resource['resource'].get('resourceType') == resource_type: | ||
outfile.write(json.dumps(resource['resource']) + '\n') | ||
except json.JSONDecodeError: | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Handle JSON parsing errors appropriately.
The continue
statement silently skips malformed JSON lines, which could lead to data loss without proper logging or error tracking.
Consider adding logging for skipped lines:
except json.JSONDecodeError:
+ # Log the malformed line for debugging
+ print(f"Warning: Skipping malformed JSON line: {line[:100]}...")
continue
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
try: | |
resource = json.loads(line) | |
if resource['resource'].get('resourceType') == resource_type: | |
outfile.write(json.dumps(resource['resource']) + '\n') | |
except json.JSONDecodeError: | |
continue | |
try: | |
resource = json.loads(line) | |
if resource['resource'].get('resourceType') == resource_type: | |
outfile.write(json.dumps(resource['resource']) + '\n') | |
except json.JSONDecodeError: | |
# Log the malformed line for debugging | |
print(f"Warning: Skipping malformed JSON line: {line[:100]}...") | |
continue |
🤖 Prompt for AI Agents
In
packages/data-transformation/fhir-to-csv/src/parseNdjsonBundle/parseNdjsonBundle.py
around lines 46 to 51, the current code silently skips malformed JSON lines
using continue, which can cause unnoticed data loss. Modify the except block to
log a warning or error message indicating the line that failed to parse,
including the exception details. This will help track and diagnose issues with
input data without stopping the process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leite08 Thanks for agreeing! The suggested logging improvement will help track any malformed JSON lines during the FHIR bundle processing, making it easier to diagnose data issues without stopping the entire process.
🐰✨ (happy rabbit - agreement achieved!)
🧠 Learnings used
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/src/parseFhir/parseFhir.py:94-175
Timestamp: 2025-07-17T19:58:54.703Z
Learning: The user thomasyopes prefers to keep the parseFhir.py file in packages/data-transformation/fhir-to-csv/src/parseFhir/ as-is, declining code style improvements suggested by static analysis tools like Ruff.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/config_DocumentReference.ini:1-188
Timestamp: 2025-07-17T19:57:25.159Z
Learning: In packages/data-transformation/fhir-to-csv/src/parseFhir/configurations/ directory, configuration files like config_DocumentReference.ini should be kept as-is with their hard-coded field mappings, as confirmed by user thomasyopes. The team prefers to maintain these static configuration files rather than implementing dynamic mapping generation.
Learnt from: thomasyopes
PR: metriport/metriport#4090
File: packages/core/src/command/conversion-fhir/conversion-fhir-cloud.ts:30-30
Timestamp: 2025-06-25T01:55:42.627Z
Learning: In packages/core/src/command/conversion-fhir/conversion-fhir-cloud.ts, the FHIR converter Lambda endpoint is controlled by the team and guaranteed to return valid JSON in the expected Bundle<Resource> format, so JSON.parse() error handling and type validation are not necessary for this specific endpoint.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/main.py:0-0
Timestamp: 2025-07-18T09:53:38.884Z
Learning: In packages/data-transformation/fhir-to-csv/main.py, S3 upload operations should be allowed to hard fail without try-catch blocks. The team prefers immediate failure over error handling and continuation for S3 upload operations in the FHIR-to-CSV transformation process.
Learnt from: leite08
PR: metriport/metriport#3857
File: packages/core/src/command/consolidated/search/fhir-resource/search-lexical.ts:67-82
Timestamp: 2025-05-28T02:51:35.779Z
Learning: In the search-lexical.ts file, the user prefers to bubble up JSON parsing errors rather than catching and logging them. When processing FHIR resources from OpenSearch results, errors should be thrown and allowed to propagate up the call stack instead of being caught and silently ignored.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:0-0
Timestamp: 2025-07-18T09:54:43.558Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes prefers to log and swallow exceptions in the copy_into_temp_table function rather than re-raising them. This allows the data transformation job to continue processing other files even if one file fails to copy to Snowflake, supporting partial success scenarios in batch data processing.
Learnt from: leite08
PR: metriport/metriport#3857
File: packages/core/src/command/consolidated/search/fhir-resource/search-consolidated-lambda.ts:41-43
Timestamp: 2025-05-27T23:51:18.594Z
Learning: In the codebase, the preference is to let JSON parsing errors bubble up naturally rather than wrapping them in try-catch blocks for additional error handling.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/main.py:0-0
Timestamp: 2025-07-18T09:53:47.427Z
Learning: In packages/data-transformation/fhir-to-csv/main.py, the user thomasyopes prefers to let Snowflake operations (create_temp_tables, copy_into_temp_table, append_temp_tables, rename_temp_tables) hard fail rather than adding explicit try-catch error handling blocks.
Learnt from: thomasyopes
PR: metriport/metriport#3608
File: packages/lambdas/src/ehr-compute-resource-diff-bundles.ts:58-62
Timestamp: 2025-04-23T19:00:49.707Z
Learning: For Lambda functions in the metriport codebase, the team prefers to let the default Lambda error handling catch JSON parsing errors rather than adding explicit try/catch blocks.
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:54-58
Timestamp: 2025-07-18T09:54:01.771Z
Learning: The user thomasyopes prefers to keep the setupSnowflake.py file in packages/data-transformation/fhir-to-csv/src/setupSnowflake/ as-is, declining suggestions for additional error handling around configuration file parsing.
Learnt from: thomasyopes
PR: metriport/metriport#3870
File: packages/core/src/external/ehr/bundle/job/create-resource-diff-bundles/steps/compute/ehr-compute-resource-diff-bundles-local.ts:55-91
Timestamp: 2025-05-19T18:24:41.632Z
Learning: In EHR bundles related to resource diff computations, bundle creation operations that are for auditing purposes should not stop the main job flow if they fail. These operations typically use try/catch blocks that log errors without rethrowing them, which is the intended behavior.
Learnt from: lucasdellabella
PR: metriport/metriport#3866
File: packages/core/src/command/hl7v2-subscriptions/hl7v2-roster-generator.ts:63-63
Timestamp: 2025-05-27T16:10:53.952Z
Learning: User lucasdellabella is okay with using JSON.stringify in log statements in TypeScript files, even though the coding guidelines suggest avoiding multi-line logs with JSON.stringify. They prefer to keep such logging as is when it serves the purpose effectively.
Learnt from: leite08
PR: metriport/metriport#4013
File: packages/core/src/fhir-to-cda/cda-templates/components/procedures.ts:104-105
Timestamp: 2025-06-12T22:53:09.606Z
Learning: User leite08 prefers responses in English only.
Learnt from: leite08
PR: metriport/metriport#3991
File: packages/core/src/external/aws/s3.ts:621-630
Timestamp: 2025-06-10T22:20:21.203Z
Learning: User leite08 prefers that all review comments are written in English.
Learnt from: leite08
PR: metriport/metriport#3912
File: packages/core/src/command/consolidated/search/fhir-resource/search-consolidated.ts:74-83
Timestamp: 2025-06-04T00:14:38.041Z
Learning: User leite08 prefers incremental progress in refactoring scenarios, accepting some inconsistency as a stepping stone towards better architecture rather than requiring perfect consistency immediately.
Learnt from: leite08
PR: metriport/metriport#0
File: :0-0
Timestamp: 2025-03-05T18:43:30.389Z
Learning: For backmerge PRs at metriport/metriport, only verify two things: (1) that the source branch is `master` and destination branch is `develop`, and (2) that there's a link to at least one PR (usually a "patch" PR) in the description. No need for detailed review comments or updates to the PR description unless there's an issue with these criteria.
Learnt from: leite08
PR: metriport/metriport#3611
File: packages/utils/src/fhir-converter/s3-to-lambda.ts:29-47
Timestamp: 2025-04-05T16:02:02.517Z
Learning: For release PRs (where head branch is `develop` and base branch is `master`), don't suggest code changes as these have already been reviewed in previous PRs. Instead, focus only on checking if the PR description follows the template and practices, and if the PR is pointing to the correct base branch (`master`).
Learnt from: lucasdellabella
PR: metriport/metriport#3907
File: packages/terminology/src/seed/seed-ndc-from-fda.ts:91-113
Timestamp: 2025-05-28T05:28:26.896Z
Learning: On PRs with "release" in the title, only provide very high priority feedback and avoid lower priority suggestions like refactoring or optimization improvements.
Learnt from: leite08
PR: metriport/metriport#3749
File: packages/core/src/command/patient-import/record/create-headers-file.ts:16-27
Timestamp: 2025-04-28T22:57:36.763Z
Learning: Focus improvement suggestions on feature branches rather than release PRs. Release PRs should be treated with higher caution as they're focused on getting tested code into production.
Learnt from: lucasdellabella
PR: metriport/metriport#0
File: :0-0
Timestamp: 2025-03-14T13:44:44.651Z
Learning: Avoid adding AI-generated summaries to PRs with "RELEASE" in the title.
packages/core/src/command/analytics-platform/fhir-to-csv/batch/fhir-to-csv.ts
Show resolved
Hide resolved
import os | ||
import configparser | ||
import logging | ||
import json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused import.
The json
module is imported but never used.
-import json
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
import json |
🧰 Tools
🪛 Ruff (0.12.2)
5-5: json
imported but unused
Remove unused import: json
(F401)
🤖 Prompt for AI Agents
In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py
at line 5, the json module is imported but not used anywhere in the file. Remove
the line importing json to clean up unused imports.
try: | ||
config.read(configPath) | ||
except: | ||
logging.exception(f"Failed to read or parse the configuration file: {configPath}. Error: {e}") | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix undefined variable in exception handler.
The variable e
is not defined in the except block, which will cause a NameError at runtime.
try:
config.read(configPath)
-except:
- logging.exception(f"Failed to read or parse the configuration file: {configPath}. Error: {e}")
+except Exception as e:
+ logging.exception(f"Failed to read or parse the configuration file: {configPath}. Error: {e}")
return
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
try: | |
config.read(configPath) | |
except: | |
logging.exception(f"Failed to read or parse the configuration file: {configPath}. Error: {e}") | |
return | |
try: | |
config.read(configPath) | |
except Exception as e: | |
logging.exception(f"Failed to read or parse the configuration file: {configPath}. Error: {e}") | |
return |
🧰 Tools
🪛 Ruff (0.12.2)
242-242: Do not use bare except
(E722)
243-243: Undefined name e
(F821)
🤖 Prompt for AI Agents
In packages/data-transformation/fhir-to-csv/src/parseFhir/parseFhir.py around
lines 240 to 244, the except block references an undefined variable 'e' in the
logging.exception call. Modify the except clause to capture the exception as 'e'
by changing it to 'except Exception as e:', so that the error variable is
defined and can be logged properly.
Ref eng-602 Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
Ref eng-602 Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
ENG-602 FhirToCsv lambda calls the transform one directly
Issues:
Dependencies
Description
Testing
Check each PR.
Release Plan
master
Summary by CodeRabbit
Summary by CodeRabbit
New Features
Documentation
Refactor
Chores
.gitignore
and requirements files to support development and deployment environments.