Skip to content

Conversation

leite08
Copy link
Member

@leite08 leite08 commented Aug 11, 2025

Dependencies

Description

  • Creates scripts for each step of the bulk ingestion
    • 1-fhir-to-csv.ts
      • For each patient, posts a message on the FhirToCsv queue, that converts the pt's consolidated bundle/JSON into CSV files (one per table, which is close to FHIR resources)
    • 2-merge-csvs.ts
      • Takes a group of patient from a FhirToCsv job (or just that job ID and get all pts there) and puts N messages on the MergeCsv queue (max 1,200 pts per message)
      • The respective lambda will group/merge all of those patients's resource/table CSVs into a few resource files (max 800MB uncompressed)
    • 3-ingest-from-merged-csvs.ts
      • For now, this is still all run from local, soon it will be moved to a lambda as well.
      • Connects to Snowflake and, for each table: create the table, ingest data from S3 into it.
  • Creates the Merge CSV queue and lambda
  • Removes AWS Batch infra
  • Adds util scripts used to debug bundles and pipeline ingestion

Testing

  • Local
    • Run pipeline using scripts and local version for 22K pts
    • Run util scripts to debug that ^
    • Branch to staging to test the ingestion in the cloud
  • Staging
    • Converts pts' JSON into CSV
    • Merges pts' CSVs
    • Ingest pts merged CSVs into Snowflake
  • Sandbox
    • none
  • Production
    • Run bulk ingestion of 22K pts

Release Plan

  • Merge this

Summary by CodeRabbit

  • New Features

    • Serverless CSV merge: SQS-triggered Lambda to group/merge per‑patient CSVs into per‑table gzipped outputs; new CLI lambdas/scripts to enqueue FHIR→CSV work and dispatch merge jobs; Snowflake ingest helper.
  • Refactor

    • Replaced Batch/ECS merge flow with Lambda+SQS; removed runtime Snowflake wiring; standardized S3 key prefixes; added AWS Lambda v3 support and expanded Lambda memory options; improved S3 listing.
  • Tests

    • Added unit tests validating CSV grouping and sizing behavior.
  • Documentation

    • Added deprecation notice for legacy lambda logic.
  • Chores

    • Updated dependencies (incl. AWS SDK v3, snowflake SDK) and added utilities.
  • Bug Fixes

    • Removed extraneous log when skipping non‑ingestible items.

Copy link

linear bot commented Aug 11, 2025

Copy link

coderabbitai bot commented Aug 11, 2025

Walkthrough

Replaces a Batch/ECS Snowflake-backed merge flow with an SQS-triggered Lambda merge pipeline; adds grouping/merge logic, S3/file-key helpers, CLI utilities and tests; removes Batch starters and Snowflake orchestration and runtime secret injection; upgrades Lambda helpers to AWS SDK v3 and adds S3 listing enhancements.

Changes

Cohort / File(s) Summary
Dependency updates
packages/core/package.json, packages/lambdas/package.json, packages/utils/package.json
Add @aws-sdk/client-lambda, @smithy/node-http-handler, ini, snowflake-sdk; add @types/ini, @types/snowflake-sdk; remove @langchain/core from utils.
Merge CSV core & grouping
packages/core/src/command/analytics-platform/merge-csvs/index.ts, .../file-name.ts, .../__tests__/index.test.ts
New public API: list per-patient S3 files, group by table/type/size, merge groups into per-table .csv.gz with retries/parallelism; persist metadata; includes grouping tests.
New Lambda: MergeCsvs (SQS)
packages/lambdas/src/analytics-platform/merge-csvs.ts
New SQS-triggered Lambda that validates payload and calls groupAndMergeCSVs using analytics bucket as source/destination; exports input schema and param type.
Remove Batch starters & Snowflake runtime injection
packages/core/.../batch/fhir-to-csv.ts, .../fhir-to-csv-transform.ts, packages/lambdas/src/analytics-platform/fhir-to-csv.ts, packages/api/src/routes/internal/analytics-platform/index.ts, packages/core/src/util/config.ts
Delete Batch job starter, remove Batch ARN getters and runtime Snowflake secret fetching/injection; API route now invokes handler directly.
Remove Snowflake orchestration & helpers
packages/data-transformation/fhir-to-csv/main.py, .../setupSnowflake.py (removed), .../utils/database.py (removed)
Remove Snowflake setup/ETL orchestration and supporting helpers from FHIR→CSV pipeline.
Infra: Batch → Lambda+SQS & types
packages/infra/lib/analytics-platform/analytics-platform-stack.ts, .../types.ts, .../shared/settings.ts
Replace Batch/ECR/Secrets constructs with MergeCsvs Queue+Lambda; update assets/types (remove Batch assets, add mergeCsvsLambda/mergeCsvsQueue); extend Lambda memory union and tie nested memory type to LambdaSettings["memory"].
API stack wiring cleanup
packages/infra/lib/api-stack.ts, packages/infra/lib/api-stack/api-service.ts
Remove cross-stack grants, env vars, and invoke/submit grants related to Batch/transform; retain queue-based FHIR-to-CSV wiring.
AWS Lambda v3 helpers
packages/core/src/external/aws/lambda.ts, packages/core/src/external/aws/lambda-logic/README.md
Add makeLambdaClientV3, v3 result/error/payload helpers (getLambdaResultPayloadV3, isLambdaErrorV3, etc.), payload conversion utilities; deprecate v2 client; add deprecation README.
S3 utility enhancements
packages/core/src/external/aws/s3.ts
Add listFirstLevelSubdirectories (uses Delimiter/CommonPrefixes) and extend listObjects to accept retry options.
Snowflake SDK wrappers
packages/core/src/external/snowflake/commands.ts
Add Promise-wrappers: promisifyConnect, promisifyDestroy, promisifyExecute for Snowflake callback APIs.
CLI / analytics scripts
packages/utils/src/analytics-platform/1-fhir-to-csv.ts, 2-merge-csvs.ts, .../snowflake/3-ingest-from-merged-csvs.ts, 3.readme.md
Add per-patient FIFO SQS sender, chunked merge dispatcher, Snowflake ingest-from-merged-csvs script, and step-3 README pointer.
Analysis & tooling scripts
packages/utils/src/analytics-platform/util/check-duplicate-patient-ids.ts, .../check-merge-groups.ts, packages/utils/src/consolidated/*, packages/utils/src/fhir/bundle/*
Add duplicate-patient-id checker, duplicate-merge-group-key checker, consolidated-download helper, and several FHIR bundle shell utilities.
File-key helpers & small utils
packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts, .../merge-csvs/file-name.ts, packages/data-transformation/fhir-to-csv/src/utils/file.py
Add helpers to build/parse FHIR-to-CSV and merged CSV S3 keys; update Python output prefix format to cx=…/f2c=…/pt=…; remove unused helpers.
Lambda invocation example → v3
packages/utils/src/lambda/execute-lambda.ts
Example utility migrated to AWS SDK v3 (InvokeCommand) and uses getLambdaResultPayloadV3.
Minor behavior/type tweaks
packages/core/src/command/consolidated/search/document-reference/ingest.ts, packages/core/src/util/config.ts, packages/infra/lib/api-stack.ts, packages/infra/lib/shared/settings.ts
Remove a log in a non-ingestible branch; remove two Config getters for Batch ARNs; infra doc/comment adjustments; extend Lambda memory type union.
New helpers & tests
packages/shared/src/common/numbers.ts, packages/utils/src/shared/ids.ts
Add numeric formatting helpers (abbreviateNumber, numberToString) and getIdsFromFile.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant SQS as SQS (MergeCsvs Queue)
  participant Lambda as MergeCsvs Lambda
  participant Core as core.groupAndMergeCSVs
  participant S3 as S3

  SQS->>Lambda: SQSEvent (cxId, fhirToCsvJobId, mergeCsvJobId, patientIds, targetGroupSizeMB)
  Lambda->>Core: groupAndMergeCSVs(params { bucket, region, ... })
  Core->>S3: List per-patient objects (concurrent, capped)
  S3-->>Core: paginated object listings
  Core->>Core: Group files by tableName & target size
  loop Merge groups (parallel, limited concurrency)
    Core->>S3: GetObject streams for each file
    Core->>Core: stream -> gzip merge (in-memory buffer)
    Core->>S3: PutObject merged .csv.gz
  end
  Core-->>Lambda: MergeResult[] / error
  Lambda-->>SQS: message deleted on success / retried on failure
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch eng-743-bulk-ingest

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@leite08 leite08 changed the title ENG-xxx ...WIP ENG-743 Analytics bulk ingest Aug 15, 2025
name: "MergeCsvs",
entry: "analytics-platform/merge-csvs",
lambda: {
memory: 4096,
Copy link
Member Author

@leite08 leite08 Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran under 35% for the 1,200 pts, max 800 MB uncompressed files, 5 files being merged in parallel - link

Comment on lines +10 to +22
// Too much can result in continuous rate limit errors from S3 (we try up to 5 times)
const numberOfParallelListObjectsFromS3 = 20;
// Too much can result in an OOM error at the lambda
const numberOfParallelMergeFileGroups = 5;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should prob move this to env vars via CDK so we can adjust in runtime, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea

@leite08 leite08 marked this pull request as ready for review August 15, 2025 03:37
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

🔭 Outside diff range comments (1)
packages/core/package.json (1)

128-129: aws-sdk v2 usage detected in core — retain dependency and track migration

We still import from aws-sdk v2 across multiple modules, so you should keep the v2 dependency for now. Please add TODOs or file a tracking issue to migrate each of these to the v3 client:

• packages/core/src/command/consolidated/get-snapshot-lambda.ts
• packages/core/src/external/aws/s3.ts
• packages/core/src/external/aws/dynamodb.ts
• packages/core/src/external/aws/lambda.ts
• packages/core/src/external/commonwell/document/document-downloader-local.ts
• packages/core/src/external/carequality/ihe-gateway-v2/saml/saml-client.ts

…(and other aws-sdk imports identified by your grep)

Once migration is complete, remove the aws-sdk v2 entry from package.json to slim installs.

♻️ Duplicate comments (1)
packages/core/src/command/analytics-platform/merge-csvs/index.ts (1)

10-13: Consider making concurrency limits configurable via environment variables

The hard-coded concurrency limits numberOfParallelListObjectsFromS3 and numberOfParallelMergeFileGroups should be configurable via environment variables to allow runtime tuning without code changes, especially given the performance and memory implications mentioned in the comments.

Based on the past review comment by leite08, this has been noted as something that should probably be moved to environment variables via CDK.

🧹 Nitpick comments (30)
packages/utils/src/fhir/bundle/count_resources.sh (3)

78-83: Ensure .entry is an array before processing

The current check passes when .entry is any non-null value. Tighten it to ensure it’s an array to avoid jq errors and unnecessary stderr handling downstream.

Apply this diff:

-    # Check if file is valid JSON and has .entry array
-    if ! jq -e '.entry' "$file" >/dev/null 2>&1; then
+    # Check if file is valid JSON and has .entry array
+    if ! jq -e 'has("entry") and (.entry | type == "array")' "$file" >/dev/null 2>&1; then
         echo "  No .entry array found in this file (or invalid JSON)"
         continue
     fi

109-110: Nit: remove trailing whitespace

Minor formatting nit: there is a trailing space after "Done.".

Apply this diff:

-echo "Done." 
+echo "Done."

47-51: Optional: compute all counts in a single jq pass for performance on large files

Current approach invokes jq once per resource type. For very large bundles, a single-pass jq can be significantly faster.

If you want, I can provide a drop-in jq that emits the same per-type counts and total in one call per file.

Also applies to: 92-106

packages/utils/src/fhir/bundle/find_with_custom_filter.sh (1)

112-118: Portability and space-safe iteration; avoid find -maxdepth on macOS

  • find ... -maxdepth is not supported by BSD/macOS find.
  • Iterating over a whitespace-delimited variable breaks on filenames with spaces.

Use globbing for depth=1 and fall back to find for deeper depths; iterate over an array.

Please validate behavior for MAXDEPTH=1 on macOS and for deeper depths on Linux.

Apply this diff:

-# Find all JSON files in the directory and subdirectories up to maxdepth
-json_files=$(find "$DIRECTORY" -maxdepth "$MAXDEPTH" -name "*.json" -type f | sort)
-
-if [ -z "$json_files" ]; then
-    echo "No JSON files found in directory: $DIRECTORY"
-    exit 0
-fi
+# Build file list (portable and space-safe)
+shopt -s nullglob
+files=()
+if [ "$MAXDEPTH" = "1" ]; then
+    files=("$DIRECTORY"/*.json)
+else
+    # Note: -maxdepth may not be available on BSD/macOS find; this is best-effort.
+    # On such systems, consider setting --maxdepth 1 or installing GNU find (gfind).
+    while IFS= read -r -d '' f; do files+=("$f"); done < <(find "$DIRECTORY" -name "*.json" -type f -print0 2>/dev/null)
+    IFS=$'\n' files=($(sort <<<"${files[*]}")); unset IFS
+fi
+
+if [ ${#files[@]} -eq 0 ]; then
+    echo "No JSON files found in directory: $DIRECTORY"
+    exit 0
+fi
@@
-# Process each JSON file
-for file in $json_files; do
+# Process each JSON file
+for file in "${files[@]}"; do
     # Get relative path from the search directory for better identification
     if [ "$DIRECTORY" = "." ]; then
         filename="$file"
     else
         filename="${file#$DIRECTORY/}"
     fi
@@
-    # Check if file is valid JSON and has .entry array
-    if ! jq -e '.entry' "$file" >/dev/null 2>&1; then
+    # Check if file is valid JSON and has .entry array
+    if ! jq -e 'has("entry") and (.entry | type == "array")' "$file" >/dev/null 2>&1; then
         continue
     fi

Also applies to: 129-147

packages/utils/src/consolidated/bulk-recreate-consolidated.ts (2)

25-28: Comment suggests ≥1s minimum delay, but code allows 500ms — align wording or constants.

Docs say we “likely don't want to get the delay/sleep under 1 second,” while minimumDelayTime is 500ms. Either bump the minimum delay to 1s, or soften/clarify the wording here to reflect the current minimum.

Here’s a wording tweak that keeps flexibility and guides readers to the constants:

- * IMPORTANT: while the endpoint doesn't seem to do much work with shared resources (mostly S3 and
- * lambda), when consolidated get re-created we re-ingest the patient's data into OpenSearch (OS).
- * Be mindful about that, so we likely don't want to get the delay/sleep under 1 second.
- *
+ * IMPORTANT: while the endpoint doesn't do much work with shared resources (mostly S3 and Lambda),
+ * when consolidated are re-created we re-ingest the patient's data into OpenSearch (OS).
+ * Be mindful: we generally shouldn't go below ~1 second between requests. If you need to tweak the
+ * rate, review the minimum/default delay constants below.
+ *

58-60: Consider raising the minimum delay to 1s to match the caution around OpenSearch load.

Given the note about OS re-ingestion and the default of 2s, a 1s minimum would better enforce the operational safety margin you described. If you prefer to keep flexibility, the comment in Lines 25–28 should be updated instead (see prior comment).

-const minimumDelayTime = dayjs.duration(500, "milliseconds");
+const minimumDelayTime = dayjs.duration(1, "second");
 const defaultDelayTime = dayjs.duration(2, "seconds");
packages/lambdas/package.json (2)

21-23: Align Lambda Node.js engine to supported runtimes (>=18, ideally 20).

Lambda no longer supports Node.js 16. Given the addition of AWS SDK v3 and snowflake-sdk, bump the engines requirement to >=18 (preferably >=20) to match deployed runtimes and avoid accidental local usage of unsupported versions.

Apply this diff:

-  "engines": {
-    "node": ">=16"
-  },
+  "engines": {
+    "node": ">=18"
+  },

31-31: Mixing AWS SDK v3 minor versions across packages; consider harmonizing to reduce duplication.

You’re adding @aws-sdk/client-lambda at ^3.865.0 while other AWS SDK v3 deps are ~3.800.x. Not a blocker, but aligning versions across v3 packages helps dedupe and shrinks bundles.

Would you like a follow-up PR to harmonize v3 package versions across the repo?

packages/utils/package.json (2)

74-74: ini addition looks fine; ensure consistent parsing and sanitization at call sites.

No action needed here; just a reminder to avoid writing parsed INI objects back to untrusted sinks.


81-83: snowflake-sdk and matching type packages are correctly added for the Step 3 CLI.

Good call adding @types/snowflake-sdk. Ensure any OCSP/network settings used by the CLI are documented for local runs.

packages/core/src/external/aws/lambda-logic/README.md (1)

3-7: Fix grammar and tighten the deprecation message.

Polish wording and clarify guidance to point developers to per-use-case folders.

Apply this diff:

-This folder should not exist. If we put all "lambda logic" in this folder, we will have a hard time
-maintaining it, all lambdas' logic will be in a single place.
-
-What we've been doing for all other lambdas, is to call commands/functions that are stored on the
-proper folder for each use case/business logic.
+This folder should not exist. Centralizing all “lambda logic” here makes maintenance harder because
+every lambda’s implementation ends up in a single place.
+
+Instead, follow the established pattern: place code under the appropriate per–use-case folder and
+invoke commands/functions from there.
packages/utils/src/analytics-platform/3.readme.md (1)

1-1: Tighten wording and fix hyphenation.

Use “subfolder” (closed compound) and clarify punctuation.

Apply this diff:

-The 3rd step is platform-specific, check the respective sub-folder for a script starting with "3-".
+The 3rd step is platform-specific; check the respective subfolder for a script starting with "3-".
packages/data-transformation/fhir-to-csv/main.py (1)

127-130: Follow through: remove or lazy-load Snowflake bits and update messaging.

Since Snowflake interaction is disabled here:

  • Consider removing or lazy-loading snowflake.connector and src.setupSnowflake imports to cut cold-start time and reduce unused dependencies.
  • Update the log on Line 131 to avoid “for Snowflake” wording (it’s misleading now).

Example (outside this hunk) for the log message:

print(f">>> Parsing data and uploading it to S3 - {cx_id}, patient_id {patient_id}, job_id {job_id}")
packages/infra/lib/shared/settings.ts (1)

6-13: DRY the memory union via a shared type.

To avoid future divergence, factor the memory union into a single alias and reuse it in both LambdaSettings and QueueAndLambdaSettings.lambda.

Example:

+type LambdaMemory = 512 | 1024 | 2048 | 4096 | 6144 | 8192 | 10240;
 export type LambdaSettings = {
-  memory: 512 | 1024 | 2048 | 4096 | 6144 | 8192 | 10240;
+  memory: LambdaMemory;
   /** How long can the lambda run for, max is 900 seconds (15 minutes)  */
   timeout: Duration;
   reservedConcurrentExecutions?: number;
   ephemeralStorageSize?: Size;
   runtime: Runtime;
 };

 // TODO Unify w/ LambdaSettings
   lambda: {
-    memory: 512 | 1024 | 2048 | 4096 | 6144 | 8192 | 10240;
+    memory: LambdaMemory;
     /** How long can the lambda run for, max is 900 seconds (15 minutes)  */
     timeout: Duration;
     reservedConcurrentExecutions?: number;
     ephemeralStorageSize?: Size;
   };

Also applies to: 21-26

packages/core/src/external/aws/s3.ts (1)

571-599: Normalize prefix to ensure true “first-level” semantics and avoid surprises.

The pagination and Delimiter usage look good. To make results consistent when callers pass a prefix without a trailing delimiter, normalize it before listing. That avoids mixing “sibling” prefixes with similarly named directories.

Apply this diff to sanitize the prefix:

   async listFirstLevelSubdirectories({
     bucket,
     prefix,
     delimiter = "/",
   }: {
     bucket: string;
     prefix: string;
     delimiter?: string | undefined;
   }): Promise<AWS.S3.CommonPrefixList> {
-    const allObjects: AWS.S3.CommonPrefix[] = [];
+    const allObjects: AWS.S3.CommonPrefix[] = [];
+    const sanitizedPrefix = prefix.endsWith(delimiter) ? prefix : `${prefix}${delimiter}`;
     let continuationToken: string | undefined;
     do {
       const res = await executeWithRetriesS3(() =>
         this._s3
           .listObjectsV2({
             Bucket: bucket,
-            Prefix: prefix,
+            Prefix: sanitizedPrefix,
             ...(delimiter ? { Delimiter: delimiter } : {}),
             ...(continuationToken ? { ContinuationToken: continuationToken } : {}),
           })
           .promise()
       );
packages/utils/src/analytics-platform/1-fhir-to-csv.ts (1)

1-1: Move the header comment below imports per scripts guideline.

Our scripts guideline says top-level comments go after imports. Consider moving the “1-fhir-to-csv.ts” header comment below the import block.

packages/core/src/external/snowflake/commands.ts (2)

3-15: Consider adding connection state validation and timeout configuration.

The promisified connection functions don't validate the connection state before attempting operations. Additionally, there's no timeout configuration for these operations which could lead to indefinite hangs.

Consider adding connection state validation and optional timeout support:

-export function promisifyConnect(s: snowflake.Connection) {
-  return function (): Promise<snowflake.Connection> {
+export function promisifyConnect(s: snowflake.Connection, timeoutMs?: number) {
+  return function (): Promise<snowflake.Connection> {
     return new Promise((resolve, reject) => {
+      const timeout = timeoutMs ? setTimeout(() => {
+        reject(new Error(`Connection timeout after ${timeoutMs}ms`));
+      }, timeoutMs) : undefined;
+      
       s.connect((error: unknown, result: snowflake.Connection) => {
+        if (timeout) clearTimeout(timeout);
         if (error) {
           reject(error);
           return;
         }
         resolve(result);
       });
     });
   };
 }

31-53: Consider making the promisified execute function more type-safe.

The function uses any[] for rows which reduces type safety. Consider using generics to allow callers to specify the expected row type.

Apply this diff to improve type safety:

-export function promisifyExecute(s: snowflake.Connection) {
-  return function (sqlText: string): Promise<{
+export function promisifyExecute<T = any>(s: snowflake.Connection) {
+  return function (sqlText: string): Promise<{
     statement: snowflake.RowStatement;
-    rows: any[] | undefined; // eslint-disable-line @typescript-eslint/no-explicit-any
+    rows: T[] | undefined;
   }> {
     return new Promise((resolve, reject) => {
       s.execute({
         sqlText,
-        // eslint-disable-next-line @typescript-eslint/no-explicit-any
-        complete: (error: unknown, statement: snowflake.RowStatement, rows: any[] | undefined) => {
+        complete: (error: unknown, statement: snowflake.RowStatement, rows: T[] | undefined) => {
           if (error) {
             reject(error);
             return;
           }
           resolve({
             statement,
             rows: rows || undefined,
           });
         },
       });
     });
   };
 }
packages/utils/src/analytics-platform/util/check-duplicate-patient-ids.ts (1)

261-288: Consider using parseArgs for command-line argument handling.

For better CLI argument handling and validation, consider using Node.js's built-in parseArgs utility or a library like commander for more robust command-line parsing.

Example improvement using Node.js's parseArgs:

+import { parseArgs } from "node:util";
+
 async function main() {
   try {
-    // Get folder path from command line arguments
-    const args = process.argv.slice(2);
-
-    if (args.length === 0) {
+    const { values, positionals } = parseArgs({
+      allowPositionals: true,
+      strict: true
+    });
+    
+    if (positionals.length === 0) {
       console.log("Usage: ts-node check-duplicate-patient-ids.ts <folder-path>");
       // ... rest of usage
       process.exit(1);
     }
     
-    const folderPath = args[0];
+    const folderPath = positionals[0];
packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts (1)

85-132: Consider implementing connection pooling and retry logic.

The Snowflake connection is created once per execution without retry logic. Network issues or transient failures could cause the entire ingestion to fail.

Consider implementing:

  1. Connection retry logic with exponential backoff
  2. Connection pooling if multiple concurrent operations are planned
  3. Checkpoint/resume capability to handle partial failures gracefully

This would make the ingestion process more resilient to transient failures and network issues.

packages/core/src/external/aws/lambda.ts (1)

100-100: Consider decoding the Payload using TextDecoder for V3.

In AWS SDK v3, the Payload is typically a Uint8Array that should be decoded using TextDecoder instead of toString().

Apply this diff for proper V3 payload decoding:

 export function getLambdaErrorV3(result: InvokeCommandOutput): LambdaError | undefined {
   if (!result.Payload) return undefined;
   if (!isLambdaErrorV3(result)) return undefined;
-  const response = JSON.parse(result.Payload.toString());
+  const decoder = new TextDecoder();
+  const response = JSON.parse(decoder.decode(result.Payload));
   return {
     errorType: response.errorType,
     errorMessage: response.errorMessage,
     log: logResultToString(result.LogResult),
   };
 }

Similarly for getLambdaResultPayloadV3:

-  return result.Payload.toString();
+  const decoder = new TextDecoder();
+  return decoder.decode(result.Payload);
packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv-transform.ts (1)

5-5: Clean up commented import

Remove the commented import statement as it creates code clutter. If this import is no longer needed, it should be deleted entirely.

-// import { getSnowflakeCreds } from "../../config";
packages/core/src/command/analytics-platform/merge-csvs/index.ts (1)

238-297: Greedy algorithm may produce suboptimal distribution for certain file sizes

The current greedy algorithm for distributing small files may lead to imbalanced groups when file sizes vary significantly. Consider implementing a more sophisticated bin packing algorithm (like First Fit Decreasing or Best Fit Decreasing) for better balance.

For example, when you have files of sizes [90MB, 10MB, 10MB, 10MB] with a 100MB target, the current algorithm might put 90MB+10MB in one group and 10MB+10MB in another, resulting in 100MB and 20MB groups. A better distribution would be [90MB, 10MB] and [10MB, 10MB] for two 100MB groups.

Consider using a more balanced approach:

-    // First pass: distribute large files evenly
-    for (let i = 0; i < sortedFiles.length; i++) {
-      const file = sortedFiles[i];
-      if (!file) continue;
-
-      // Find the group with the smallest current size
-      let smallestGroupIndex = 0;
-      let smallestGroupSize = groupSizes[0] || 0;
-
-      for (let j = 1; j < optimalGroupCount; j++) {
-        const currentSize = groupSizes[j] || 0;
-        if (currentSize < smallestGroupSize) {
-          smallestGroupSize = currentSize;
-          smallestGroupIndex = j;
-        }
-      }
-
-      // Add file to the smallest group
-      const targetGroup = groups[smallestGroupIndex];
-      if (targetGroup) {
-        targetGroup.push(file);
-        groupSizes[smallestGroupIndex] = (groupSizes[smallestGroupIndex] || 0) + file.size;
-      }
-    }
+    // Improved distribution using best-fit approach
+    for (const file of sortedFiles) {
+      // Find the best group: smallest that can still fit this file
+      let bestGroupIndex = -1;
+      let bestGroupSpace = Infinity;
+      
+      for (let j = 0; j < optimalGroupCount; j++) {
+        const currentSize = groupSizes[j] || 0;
+        const remainingSpace = targetGroupSizeBytes - currentSize;
+        
+        // If file fits and this group has less wasted space
+        if (file.size <= remainingSpace && remainingSpace < bestGroupSpace) {
+          bestGroupSpace = remainingSpace;
+          bestGroupIndex = j;
+        }
+      }
+      
+      // If no group can fit, add to smallest group
+      if (bestGroupIndex === -1) {
+        bestGroupIndex = groupSizes.indexOf(Math.min(...groupSizes));
+      }
+      
+      const targetGroup = groups[bestGroupIndex];
+      if (targetGroup) {
+        targetGroup.push(file);
+        groupSizes[bestGroupIndex] = (groupSizes[bestGroupIndex] || 0) + file.size;
+      }
+    }
packages/core/src/command/analytics-platform/merge-csvs/__tests__/index.test.ts (1)

1-1: Remove eslint-disable comment

The blanket disable of @typescript-eslint/no-non-null-assertion reduces type safety. Instead, handle the specific cases with proper null checks or use type guards.

-/* eslint-disable @typescript-eslint/no-non-null-assertion */
 import { FileInfo, groupFilesByTypeAndSize } from "../index";

Then update the specific assertions:

-      expect(largeFileGroup!.totalSize).toBe(largeFileSize);
+      expect(largeFileGroup?.totalSize).toBe(largeFileSize);
packages/infra/lib/api-stack.ts (1)

800-803: Fix JSDoc typo and clarify the deprecation note

There’s an extra backtick and the note could be clearer about what should move where.

Apply this diff:

-    /**
-     * @deprecated This should be done on the code where it's used.
-     * @see `setupFhirToCsvLambda()`` in the analytics-platform-stack.ts
-     */
+    /**
+     * @deprecated Move these bucket grants to the code/stack that owns the resource.
+     * @see setupFhirToCsvLambda() in analytics-platform-stack.ts
+     */
packages/infra/lib/analytics-platform/analytics-platform-stack.ts (5)

61-82: MergeCsvs concurrency/timeout are sensible; consider reserved concurrency and ephemeral storage (optional)

The 14m50s timeout and 4GB memory with batchSize=1 and maxConcurrency=20 looks fine for merging large CSVs. Two optional improvements:

  • Set reservedConcurrentExecutions to match maxConcurrency to guard against other triggers or accidental invokes.
  • If the implementation uses /tmp for intermediate files, consider ephemeralStorageSize (e.g., 2–10 GiB) to avoid ENOSPC under peak loads. If fully streaming, you’re fine as-is.

Apply one or both diffs:

Option A: restrict concurrency at the function level (tie to 20)

   const mergeCsvs: QueueAndLambdaSettings = {
     name: "MergeCsvs",
     entry: "analytics-platform/merge-csvs",
     lambda: {
       memory: 4096,
       timeout: mergeCsvsLambdaTimeout,
+      reservedConcurrentExecutions: 20,
     },

Option B: provision larger ephemeral storage (only if you write to /tmp)

   const mergeCsvs: QueueAndLambdaSettings = {
     name: "MergeCsvs",
     entry: "analytics-platform/merge-csvs",
     lambda: {
       memory: 4096,
       timeout: mergeCsvsLambdaTimeout,
+      ephemeralStorageSize: cdk.Size.gibibytes(2),
     },

109-114: Snowflake creds TODO: track it explicitly

“ENG-858 reintroduce this” is fine, but easy to lose. Suggest adding a link to the ticket in the comment and/or opening a follow-up issue to ensure the reintroduction is not forgotten.

Want me to open a follow-up item and annotate the code with the issue URL?


241-243: Comment drift: it says “medical document bucket” but you grant on the analytics bucket

The comment doesn’t match the code. You’re granting read/write to the analytics platform bucket here.

Apply this diff:

-    // Grant read to medical document bucket set on the api-stack
-    ownProps.analyticsPlatformBucket.grantReadWrite(fhirToCsvTransformLambda);
+    // Grant read/write to the analytics platform bucket
+    ownProps.analyticsPlatformBucket.grantReadWrite(fhirToCsvTransformLambda);

173-182: Drop unused props from setupMergeCsvsLambda call (optional)

awsRegion and config aren’t used by setupMergeCsvsLambda. Keeping only what’s needed reduces noise.

Apply this diff:

-    const { mergeCsvsLambda, queue: mergeCsvsQueue } = this.setupMergeCsvsLambda({
-      config: props.config,
-      envType: props.config.environmentType,
-      awsRegion: props.config.region,
-      lambdaLayers: props.lambdaLayers,
-      vpc: props.vpc,
-      sentryDsn: props.config.sentryDSN,
-      alarmAction: props.alarmAction,
-      bucket: analyticsPlatformBucket,
-    });
+    const { mergeCsvsLambda, queue: mergeCsvsQueue } = this.setupMergeCsvsLambda({
+      envType: props.config.environmentType,
+      lambdaLayers: props.lambdaLayers,
+      vpc: props.vpc,
+      sentryDsn: props.config.sentryDSN,
+      alarmAction: props.alarmAction,
+      bucket: analyticsPlatformBucket,
+    });

If you proceed, also update the method signature below.


290-299: Prune unused parameters from setupMergeCsvsLambda signature (optional)

config and awsRegion aren’t used. Easing the signature makes intent clearer.

Apply this diff:

-  private setupMergeCsvsLambda(ownProps: {
-    config: EnvConfigNonSandbox;
-    envType: EnvType;
-    awsRegion: string;
-    lambdaLayers: LambdaLayers;
-    vpc: ec2.IVpc;
-    sentryDsn: string | undefined;
-    alarmAction: SnsAction | undefined;
-    bucket: s3.Bucket;
-  }): {
+  private setupMergeCsvsLambda(ownProps: {
+    envType: EnvType;
+    lambdaLayers: LambdaLayers;
+    vpc: ec2.IVpc;
+    sentryDsn: string | undefined;
+    alarmAction: SnsAction | undefined;
+    bucket: s3.Bucket;
+  }): {
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled
  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between de1643f and 23280f4.

⛔ Files ignored due to path filters (2)
  • package-lock.json is excluded by !**/package-lock.json
  • packages/lambdas/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (29)
  • packages/core/package.json (4 hunks)
  • packages/core/src/command/analytics-platform/fhir-to-csv/batch/fhir-to-csv.ts (0 hunks)
  • packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv-transform.ts (1 hunks)
  • packages/core/src/command/analytics-platform/merge-csvs/__tests__/index.test.ts (1 hunks)
  • packages/core/src/command/analytics-platform/merge-csvs/index.ts (1 hunks)
  • packages/core/src/external/aws/lambda-logic/README.md (1 hunks)
  • packages/core/src/external/aws/lambda.ts (7 hunks)
  • packages/core/src/external/aws/s3.ts (1 hunks)
  • packages/core/src/external/snowflake/commands.ts (1 hunks)
  • packages/core/src/util/config.ts (0 hunks)
  • packages/data-transformation/fhir-to-csv/main.py (2 hunks)
  • packages/infra/lib/analytics-platform/analytics-platform-stack.ts (8 hunks)
  • packages/infra/lib/analytics-platform/types.ts (1 hunks)
  • packages/infra/lib/api-stack.ts (1 hunks)
  • packages/infra/lib/api-stack/api-service.ts (0 hunks)
  • packages/infra/lib/shared/settings.ts (1 hunks)
  • packages/lambdas/package.json (4 hunks)
  • packages/lambdas/src/analytics-platform/fhir-to-csv.ts (0 hunks)
  • packages/lambdas/src/analytics-platform/merge-csvs.ts (1 hunks)
  • packages/utils/package.json (3 hunks)
  • packages/utils/src/analytics-platform/1-fhir-to-csv.ts (5 hunks)
  • packages/utils/src/analytics-platform/2-merge-csvs.ts (1 hunks)
  • packages/utils/src/analytics-platform/3.readme.md (1 hunks)
  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts (1 hunks)
  • packages/utils/src/analytics-platform/util/check-duplicate-patient-ids.ts (1 hunks)
  • packages/utils/src/analytics-platform/util/check-merge-groups.ts (1 hunks)
  • packages/utils/src/consolidated/bulk-recreate-consolidated.ts (4 hunks)
  • packages/utils/src/fhir/bundle/count_resources.sh (1 hunks)
  • packages/utils/src/fhir/bundle/find_with_custom_filter.sh (1 hunks)
💤 Files with no reviewable changes (4)
  • packages/core/src/util/config.ts
  • packages/core/src/command/analytics-platform/fhir-to-csv/batch/fhir-to-csv.ts
  • packages/lambdas/src/analytics-platform/fhir-to-csv.ts
  • packages/infra/lib/api-stack/api-service.ts
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{js,jsx,ts,tsx}

📄 CodeRabbit Inference Engine (.cursorrules)

**/*.{js,jsx,ts,tsx}: Don’t use null inside the app, only on code interacting with external interfaces/services, like DB and HTTP; convert to undefined before sending inwards into the code
Use const whenever possible
Use async/await instead of .then()
Naming: classes, enums: PascalCase
Naming: constants, variables, functions: camelCase
Naming: file names: kebab-case
Naming: Don’t use negative names, like notEnabled, prefer isDisabled
If possible, use decomposing objects for function parameters
Prefer Nullish Coalesce (??) than the OR operator (||) when you want to provide a default value
Avoid creating arrow functions
Use truthy syntax instead of in - i.e., if (data.link) not if ('link' in data)
While handling errors, keep the stack trace around: if you create a new Error (e.g., MetriportError), make sure to pass the original error as the new one’s cause so the stack trace is available upstream.
max column length is 100 chars
multi-line comments use /** */
top-level comments go after the import (save pre-import to basic file header, like license)
move literals to constants declared after imports when possible

Files:

  • packages/core/src/external/aws/s3.ts
  • packages/lambdas/src/analytics-platform/merge-csvs.ts
  • packages/core/src/command/analytics-platform/merge-csvs/__tests__/index.test.ts
  • packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv-transform.ts
  • packages/infra/lib/api-stack.ts
  • packages/infra/lib/shared/settings.ts
  • packages/utils/src/analytics-platform/2-merge-csvs.ts
  • packages/utils/src/analytics-platform/util/check-duplicate-patient-ids.ts
  • packages/core/src/external/snowflake/commands.ts
  • packages/utils/src/analytics-platform/util/check-merge-groups.ts
  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
  • packages/infra/lib/analytics-platform/types.ts
  • packages/utils/src/consolidated/bulk-recreate-consolidated.ts
  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
  • packages/utils/src/analytics-platform/1-fhir-to-csv.ts
  • packages/core/src/external/aws/lambda.ts
  • packages/infra/lib/analytics-platform/analytics-platform-stack.ts
**/*.{ts,tsx}

📄 CodeRabbit Inference Engine (.cursorrules)

Use types whenever possible

Files:

  • packages/core/src/external/aws/s3.ts
  • packages/lambdas/src/analytics-platform/merge-csvs.ts
  • packages/core/src/command/analytics-platform/merge-csvs/__tests__/index.test.ts
  • packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv-transform.ts
  • packages/infra/lib/api-stack.ts
  • packages/infra/lib/shared/settings.ts
  • packages/utils/src/analytics-platform/2-merge-csvs.ts
  • packages/utils/src/analytics-platform/util/check-duplicate-patient-ids.ts
  • packages/core/src/external/snowflake/commands.ts
  • packages/utils/src/analytics-platform/util/check-merge-groups.ts
  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
  • packages/infra/lib/analytics-platform/types.ts
  • packages/utils/src/consolidated/bulk-recreate-consolidated.ts
  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
  • packages/utils/src/analytics-platform/1-fhir-to-csv.ts
  • packages/core/src/external/aws/lambda.ts
  • packages/infra/lib/analytics-platform/analytics-platform-stack.ts
**/*.ts

⚙️ CodeRabbit Configuration File

**/*.ts: - Use the Onion Pattern to organize a package's code in layers

  • Try to use immutable code and avoid sharing state across different functions, objects, and systems
  • Try to build code that's idempotent whenever possible
  • Prefer functional programming style functions: small, deterministic, 1 input, 1 output
  • Minimize coupling / dependencies
  • Avoid modifying objects received as parameter
  • Only add comments to code to explain why something was done, not how it works
  • Naming
    • classes, enums: PascalCase
    • constants, variables, functions: camelCase
    • file names: kebab-case
    • table and column names: snake_case
    • Use meaningful names, so whoever is reading the code understands what it means
    • Don’t use negative names, like notEnabled, prefer isDisabled
    • For numeric values, if the type doesn’t convey the unit, add the unit to the name
  • Typescript
    • Use types
    • Prefer const instead of let
    • Avoid any and casting from any to other types
    • Type predicates: only applicable to narrow down the type, not to force a complete type conversion
    • Prefer deconstructing parameters for functions instead of multiple parameters that might be of
      the same type
    • Don’t use null inside the app, only on code interacting with external interfaces/services,
      like DB and HTTP; convert to undefined before sending inwards into the code
    • Use async/await instead of .then()
    • Use the strict equality operator ===, don’t use abstract equality operator ==
    • When calling a Promise-returning function asynchronously (i.e., not awaiting), use .catch() to
      handle errors (see processAsyncError and emptyFunction depending on the case)
    • Date and Time
      • Always use buildDayjs() to create dayjs instances
      • Prefer dayjs.duration(...) to create duration consts and keep them as duration
  • Prefer Nullish Coalesce (??) than the OR operator (||) to provide a default value
  • Avoid creating arrow functions
  • U...

Files:

  • packages/core/src/external/aws/s3.ts
  • packages/lambdas/src/analytics-platform/merge-csvs.ts
  • packages/core/src/command/analytics-platform/merge-csvs/__tests__/index.test.ts
  • packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv-transform.ts
  • packages/infra/lib/api-stack.ts
  • packages/infra/lib/shared/settings.ts
  • packages/utils/src/analytics-platform/2-merge-csvs.ts
  • packages/utils/src/analytics-platform/util/check-duplicate-patient-ids.ts
  • packages/core/src/external/snowflake/commands.ts
  • packages/utils/src/analytics-platform/util/check-merge-groups.ts
  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
  • packages/infra/lib/analytics-platform/types.ts
  • packages/utils/src/consolidated/bulk-recreate-consolidated.ts
  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
  • packages/utils/src/analytics-platform/1-fhir-to-csv.ts
  • packages/core/src/external/aws/lambda.ts
  • packages/infra/lib/analytics-platform/analytics-platform-stack.ts
🧠 Learnings (12)
📓 Common learnings
Learnt from: leite08
PR: metriport/metriport#4255
File: packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts:19-20
Timestamp: 2025-07-27T00:40:32.149Z
Learning: User leite08 acknowledged the duplicate inputBundle spread issue in packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts and will fix it in a follow-up PR rather than in the current patch release PR #4255.
Learnt from: leite08
PR: metriport/metriport#4013
File: packages/core/src/fhir-to-cda/cda-templates/components/procedures.ts:104-105
Timestamp: 2025-06-12T22:53:09.606Z
Learning: User leite08 prefers responses in English only.
Learnt from: leite08
PR: metriport/metriport#3991
File: packages/core/src/external/aws/s3.ts:621-630
Timestamp: 2025-06-10T22:20:21.203Z
Learning: User leite08 prefers that all review comments are written in English.
📚 Learning: 2025-08-13T16:08:03.688Z
Learnt from: RadmirGari
PR: metriport/metriport#4324
File: packages/core/src/domain/address.ts:7-7
Timestamp: 2025-08-13T16:08:03.688Z
Learning: In the Metriport codebase, packages/core legitimately depends on metriport/api-sdk and imports from it in multiple places. This is an established architectural pattern and is properly declared as a dependency in core's package.json.

Applied to files:

  • packages/core/package.json
  • packages/utils/package.json
📚 Learning: 2025-07-18T09:53:47.448Z
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/main.py:0-0
Timestamp: 2025-07-18T09:53:47.448Z
Learning: In packages/data-transformation/fhir-to-csv/main.py, the user thomasyopes prefers to let Snowflake operations (create_temp_tables, copy_into_temp_table, append_temp_tables, rename_temp_tables) hard fail rather than adding explicit try-catch error handling blocks.

Applied to files:

  • packages/data-transformation/fhir-to-csv/main.py
📚 Learning: 2025-07-18T09:54:43.598Z
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:0-0
Timestamp: 2025-07-18T09:54:43.598Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes prefers to log and swallow exceptions in the copy_into_temp_table function rather than re-raising them. This allows the data transformation job to continue processing other files even if one file fails to copy to Snowflake, supporting partial success scenarios in batch data processing.

Applied to files:

  • packages/data-transformation/fhir-to-csv/main.py
📚 Learning: 2025-07-18T09:54:01.790Z
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:54-58
Timestamp: 2025-07-18T09:54:01.790Z
Learning: The user thomasyopes prefers to keep the setupSnowflake.py file in packages/data-transformation/fhir-to-csv/src/setupSnowflake/ as-is, declining suggestions for additional error handling around configuration file parsing.

Applied to files:

  • packages/data-transformation/fhir-to-csv/main.py
📚 Learning: 2025-07-19T14:04:34.757Z
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:93-97
Timestamp: 2025-07-19T14:04:34.757Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes accepts SQL injection risk in the append_job_tables function's DELETE query where patient_id is directly interpolated, based on their assessment that they control all inputs to the system.

Applied to files:

  • packages/data-transformation/fhir-to-csv/main.py
📚 Learning: 2025-07-18T09:53:38.906Z
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/main.py:0-0
Timestamp: 2025-07-18T09:53:38.906Z
Learning: In packages/data-transformation/fhir-to-csv/main.py, S3 upload operations should be allowed to hard fail without try-catch blocks. The team prefers immediate failure over error handling and continuation for S3 upload operations in the FHIR-to-CSV transformation process.

Applied to files:

  • packages/data-transformation/fhir-to-csv/main.py
📚 Learning: 2025-07-09T20:41:08.549Z
Learnt from: thomasyopes
PR: metriport/metriport#4167
File: packages/infra/lib/analytics-platform/analytics-platform-stack.ts:56-64
Timestamp: 2025-07-09T20:41:08.549Z
Learning: In the analytics platform Snowflake integration (packages/infra/lib/analytics-platform/analytics-platform-stack.ts), the `integrationUserArn` field in the config temporarily contains an AWS account ID rather than an ARN. This is planned to be updated to an actual ARN in a second pass, at which point the IAM Role's assumedBy should be changed from AccountPrincipal to ArnPrincipal.

Applied to files:

  • packages/infra/lib/api-stack.ts
  • packages/infra/lib/analytics-platform/analytics-platform-stack.ts
📚 Learning: 2025-05-20T21:26:26.804Z
Learnt from: leite08
PR: metriport/metriport#3814
File: packages/api/src/routes/internal/medical/patient-consolidated.ts:141-174
Timestamp: 2025-05-20T21:26:26.804Z
Learning: The functionality introduced in packages/api/src/routes/internal/medical/patient-consolidated.ts is planned to be refactored in downstream PR #3857, including improvements to error handling and validation.

Applied to files:

  • packages/utils/src/analytics-platform/2-merge-csvs.ts
  • packages/utils/src/analytics-platform/util/check-duplicate-patient-ids.ts
📚 Learning: 2025-07-18T09:33:29.581Z
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/api/src/routes/internal/analytics-platform/index.ts:28-28
Timestamp: 2025-07-18T09:33:29.581Z
Learning: In packages/api/src/routes/internal/analytics-platform/index.ts, the patientId parameter handling is intentionally inconsistent: the `/fhir-to-csv` and `/fhir-to-csv/transform` endpoints require patientId (using getFromQueryOrFail) for single patient processing, while the `/fhir-to-csv/batch` endpoint treats patientId as optional (using getFromQuery) because it's designed to run over all patients when no specific patient ID is provided.

Applied to files:

  • packages/utils/src/analytics-platform/2-merge-csvs.ts
  • packages/utils/src/analytics-platform/1-fhir-to-csv.ts
📚 Learning: 2025-06-18T17:24:31.650Z
Learnt from: keshavsaharia
PR: metriport/metriport#4045
File: packages/core/src/external/surescripts/__tests__/file-names.test.ts:21-21
Timestamp: 2025-06-18T17:24:31.650Z
Learning: The team does all testing and deploying in PST local time, which helps avoid issues with date-dependent code in tests like buildRequestFileName that use dayjs().format('YYYYMMDD').

Applied to files:

  • packages/utils/src/consolidated/bulk-recreate-consolidated.ts
📚 Learning: 2025-03-11T20:42:46.516Z
Learnt from: thomasyopes
PR: metriport/metriport#3427
File: packages/core/src/external/ehr/api/sync-patient.ts:16-55
Timestamp: 2025-03-11T20:42:46.516Z
Learning: In the patient synchronization architecture, the flow follows this pattern: (1) `ehr-sync-patient-cloud.ts` sends messages to an SQS queue, (2) the `ehr-sync-patient` Lambda consumes these messages, and (3) the Lambda uses the `syncPatient` function to make the API calls to process the patient data.

Applied to files:

  • packages/utils/src/analytics-platform/1-fhir-to-csv.ts
🧬 Code Graph Analysis (9)
packages/lambdas/src/analytics-platform/merge-csvs.ts (5)
packages/shared/src/common/env-var.ts (1)
  • getEnvVarOrFail (14-20)
packages/core/src/command/analytics-platform/merge-csvs/index.ts (2)
  • GroupAndMergeCSVsParams (15-25)
  • groupAndMergeCSVs (56-129)
packages/lambdas/src/shared/capture.ts (1)
  • capture (19-118)
packages/lambdas/src/shared/sqs.ts (1)
  • getSingleMessageOrFail (62-83)
packages/lambdas/src/shared/parse-body.ts (1)
  • parseBody (43-52)
packages/core/src/command/analytics-platform/merge-csvs/__tests__/index.test.ts (1)
packages/core/src/command/analytics-platform/merge-csvs/index.ts (2)
  • groupFilesByTypeAndSize (238-314)
  • FileInfo (27-31)
packages/utils/src/analytics-platform/2-merge-csvs.ts (4)
packages/shared/src/common/date.ts (1)
  • buildDayjs (93-95)
packages/shared/src/common/env-var.ts (1)
  • getEnvVarOrFail (14-20)
packages/lambdas/src/analytics-platform/merge-csvs.ts (1)
  • GroupAndMergeCSVsParamsLambda (19-22)
packages/utils/src/shared/duration.ts (1)
  • elapsedTimeAsStr (8-13)
packages/core/src/command/analytics-platform/merge-csvs/index.ts (2)
packages/shared/src/util/uuid-v7.ts (1)
  • uuidv4 (384-386)
packages/core/src/external/aws/s3.ts (2)
  • S3Utils (140-600)
  • executeWithRetriesS3 (94-111)
packages/utils/src/consolidated/bulk-recreate-consolidated.ts (1)
packages/utils/src/shared/duration.ts (1)
  • elapsedTimeAsStr (8-13)
packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts (4)
packages/shared/src/common/env-var.ts (1)
  • getEnvVarOrFail (14-20)
packages/shared/src/common/date.ts (1)
  • buildDayjs (93-95)
packages/utils/src/shared/duration.ts (1)
  • elapsedTimeAsStr (8-13)
packages/core/src/external/snowflake/commands.ts (3)
  • promisifyConnect (3-15)
  • promisifyExecute (31-53)
  • promisifyDestroy (17-29)
packages/utils/src/analytics-platform/1-fhir-to-csv.ts (5)
packages/shared/src/common/date.ts (1)
  • buildDayjs (93-95)
packages/shared/src/common/env-var.ts (1)
  • getEnvVarOrFail (14-20)
packages/shared/src/index.ts (1)
  • sleep (13-13)
packages/utils/src/patient/get-ids.ts (1)
  • getAllPatientIds (22-29)
packages/utils/src/shared/duration.ts (1)
  • elapsedTimeAsStr (8-13)
packages/core/src/external/aws/lambda.ts (2)
packages/shared/src/index.ts (3)
  • MetriportError (43-43)
  • BadRequestError (42-42)
  • NotFoundError (44-44)
packages/core/src/external/sftp/client.ts (1)
  • log (261-263)
packages/infra/lib/analytics-platform/analytics-platform-stack.ts (5)
packages/infra/lib/shared/settings.ts (1)
  • QueueAndLambdaSettings (15-49)
packages/infra/lib/analytics-platform/types.ts (1)
  • AnalyticsPlatformsAssets (4-9)
packages/infra/config/env-config.ts (1)
  • EnvConfigNonSandbox (303-316)
packages/core/src/external/aws/s3.ts (1)
  • s3 (155-157)
packages/infra/lib/shared/lambda.ts (1)
  • createLambda (85-158)
🪛 LanguageTool
packages/core/src/external/aws/lambda-logic/README.md

[grammar] ~3-~3: There might be a mistake here.
Context: ...in this folder, we will have a hard time maintaining it, all lambdas' logic will ...

(QB_NEW_EN)


[grammar] ~6-~6: There might be a mistake here.
Context: ...ommands/functions that are stored on the proper folder for each use case/business...

(QB_NEW_EN)


[grammar] ~7-~7: There might be a mistake here.
Context: ...folder for each use case/business logic.

(QB_NEW_EN)

packages/utils/src/analytics-platform/3.readme.md

[grammar] ~1-~1: There might be a mistake here.
Context: ...-folder for a script starting with "3-".

(QB_NEW_EN)

🪛 Shellcheck (0.10.0)
packages/utils/src/fhir/bundle/find_with_custom_filter.sh

[warning] 103-103: Declare and assign separately to avoid masking return values.

(SC2155)

🔇 Additional comments (23)
packages/utils/src/fhir/bundle/find_with_custom_filter.sh (1)

120-126: LGTM: helpful, clear CLI UX

Good, concise header showing the effective filter, directory, and depth.

packages/utils/src/consolidated/bulk-recreate-consolidated.ts (2)

128-132: LGTM: clearer final timing via elapsedTimeAsStr.

Nice improvement for operator readability compared to hand-rolled ms calculations.


13-13: No action needed—getDelayTime returns milliseconds and the duration plugin is already extended

  • getDelayTime(…) explicitly returns a number (using .asMilliseconds() and Math.max), so interpolating ${delayTime} ms and passing it to sleep is safe.
  • packages/utils/src/shared/duration.ts calls dayjs.extend(duration) at the top, so elapsedTimeAsStr will always have the plugin initialized when it’s imported.
packages/lambdas/package.json (1)

51-51: No ini imports detected in packages/lambdas – no action needed

I searched for both import … from 'ini' and require('ini') throughout packages/lambdas and found no occurrences. Since you’re not importing ini in any TypeScript files there, you don’t need to add @types/ini to devDependencies right now. If you begin importing ini in TS, please add the corresponding type package.

packages/utils/package.json (2)

55-55: LGTM on introducing @aws-sdk/client-lambda for v3 usage in utils.

This is consistent with the new orchestration scripts and the broader shift to AWS SDK v3.


65-65: Confirmed aws-sdk v2 usage in utils scripts

Multiple utils scripts import AWS v2 clients, so the "aws-sdk": "2.1243" dependency is required:

  • packages/utils/src/test-s3.ts
  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
  • packages/utils/src/sqs/populate-sqs.ts
  • packages/utils/src/query-to-s3.ts
  • packages/utils/src/initialize-cw-test-env/index.ts
  • packages/utils/src/customer-requests/medical-records-create-from-s3.ts
  • packages/utils/src/fhir/validate/process-ndjson-output.ts

No changes needed.

packages/core/package.json (3)

109-109: LGTM on adding @aws-sdk/client-lambda to core for v3 utilities.

Consistent with the new v3 Lambda helpers under external/aws/lambda.ts.


137-137: ini addition is fine; confirm type definitions exist where imported.

You added @types/ini under devDependencies—good. No further action.


150-150: snowflake-sdk import scope is limited to dedicated modules

Verified that snowflake-sdk is only imported by:
packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
packages/core/src/external/snowflake/commands.ts

Neither of these lives in the core package’s public exports (see packages/core/package.json “exports” field), so consumers of @metriport/core will not pull in snowflake-sdk unless they explicitly deep-import the external Snowflake commands module.

packages/data-transformation/fhir-to-csv/main.py (1)

115-115: Snowflake creds removal aligns with the new flow.

Commenting out SNOWFLAKE_CREDS fetch here matches the PR objective to decouple FHIR-to-CSV from Snowflake. No concerns with this change.

packages/infra/lib/analytics-platform/types.ts (1)

7-9: LGTM: assets surface matches the new MergeCsvs workflow.

Adding mergeCsvsLambda and mergeCsvsQueue aligns with the infra shift away from Batch. No issues spotted.

packages/utils/src/analytics-platform/1-fhir-to-csv.ts (2)

35-39: Nice: stable jobId and increased parallelism.

The F2C-prefixed jobId and 30-way concurrency should help traceability and throughput. The jitter settings are reasonable.


71-81: Include fhirToCsvJobId in SQS FIFO deduplication ID

• In packages/utils/src/analytics-platform/1-fhir-to-csv.ts (around lines 71–81), change:

await sqsClient.sendMessageToQueue(queueUrl, payload, {
  fifo: true,
- messageDeduplicationId: patientId,
+ messageDeduplicationId: `${patientId}-${fhirToCsvJobId}`,
  messageGroupId: patientId,
});

• Verify that the FHIR_TO_CSV queue is indeed a FIFO queue (its name ends with “.fifo” and it’s defined with fifo: true in packages/infra).

packages/utils/src/analytics-platform/util/check-duplicate-patient-ids.ts (1)

36-95: LGTM! Well-structured class with clear responsibilities.

The DuplicatePatientIdChecker class is well-designed with appropriate error handling and clear separation of concerns. The methods are focused and the error handling ensures the script continues processing even if individual files have issues.

packages/utils/src/analytics-platform/util/check-merge-groups.ts (1)

41-103: LGTM! Clean implementation with good error isolation.

The DuplicateKeyChecker class is well-structured with appropriate error handling that allows the script to continue processing even when individual files fail.

packages/core/src/external/aws/lambda.ts (1)

180-250: Good implementation of V3 support with proper backwards compatibility.

The parallel V3 implementation alongside V2 is well-structured, maintaining the same error handling patterns and functionality. The deprecation notices guide users toward the newer implementation while preserving existing functionality.

packages/lambdas/src/analytics-platform/merge-csvs.ts (1)

24-58: LGTM! Clean Lambda implementation with proper error handling

The Lambda handler is well-structured with:

  • Proper single message validation
  • Input schema validation using Zod
  • Comprehensive error context capture for debugging
  • Validation to prevent source/destination collision
packages/core/src/command/analytics-platform/merge-csvs/__tests__/index.test.ts (1)

4-290: Comprehensive test coverage!

Excellent test suite covering all the critical aspects of the grouping algorithm:

  • Edge cases (empty input, single file, zero-size files)
  • Distribution strategies for large and small files
  • Group ID generation and uniqueness
  • Total size calculations
  • Cross-table consistency

The tests provide good confidence in the grouping logic.

packages/utils/src/analytics-platform/2-merge-csvs.ts (1)

70-150: Well-structured orchestration script with robust error handling

The script demonstrates good practices:

  • Clear progress logging and error reporting
  • Batch processing with configurable chunk sizes
  • Parallel execution with jitter to avoid throttling
  • Comprehensive failure tracking with output files for debugging
  • User confirmation before potentially large operations
packages/infra/lib/analytics-platform/analytics-platform-stack.ts (4)

26-27: Interface update looks good

Adding mergeCsvs to AnalyticsPlatformsSettings aligns with the new workflow.


313-323: Queue config looks solid; confirm alarm thresholds are appropriate for expected throughput

FIFO + DLQ with age-of-oldest and backlog alarms is good. Ensure 1,000 message backlog alarms and 2h age align with expected ingestion velocity so operational signals remain meaningful.


324-335: Env vars for mergeCsvs are minimal and appropriate

WAIT_TIME_IN_MILLIS and ANALYTICS_BUCKET_NAME are sufficient for the lambda’s responsibilities. No issues.


150-157: IntegrationUserArn Verified
The example config now defines integrationUserArn as a full ARN (arn:aws:iam::000000000000:role/SnowflakeIntegrationRole), so using new iam.ArnPrincipal(...) here is correct. No changes required.

@leite08 leite08 force-pushed the eng-743-bulk-ingest branch from 5e6f72e to 85b6fad Compare August 15, 2025 03:52
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🔭 Outside diff range comments (3)
packages/data-transformation/fhir-to-csv/main.py (2)

71-76: Guard against missing "entry" in the bundle

Indexing with bundle["entry"] will raise if the key is absent. Using .get() avoids the KeyError and keeps the function idempotent.

Apply this diff:

-with open(local_bundle_key, "rb") as f:
-    bundle = json.load(f)
-    entries = bundle["entry"]
-    if entries is None or len(entries) < 1:
+with open(local_bundle_key, "rb") as f:
+    bundle = json.load(f)
+    entries = bundle.get("entry")
+    if not entries:
         print(f"Bundle {bundle_key} has no entries")
         return []

141-144: Do not call exit() inside a Lambda handler; return instead

Calling exit(0) in a Lambda can terminate the runtime process and degrade warm-start reuse. Return gracefully from the handler.

Apply this diff:

 if len(output_bucket_and_file_keys_and_table_names) < 1:
     print("No files were uploaded")
-    exit(0)
+    return
packages/utils/src/analytics-platform/1-fhir-to-csv.ts (1)

71-81: Include jobId in FIFO messageDeduplicationId to avoid cross-run collisions

Using only patientId for messageDeduplicationId can cause messages from successive runs to be dropped within SQS’s 5-minute dedup window. Including the per-run fhirToCsvJobId ensures deduplication is scoped to each job.

• File: packages/utils/src/analytics-platform/1-fhir-to-csv.ts
• Lines: 71–81

Apply this diff:

- await sqsClient.sendMessageToQueue(queueUrl, payload, {
-   fifo: true,
-   messageDeduplicationId: patientId,
-   messageGroupId: patientId,
- });
+ await sqsClient.sendMessageToQueue(queueUrl, payload, {
+   fifo: true,
+   messageDeduplicationId: `${fhirToCsvJobId}:${patientId}`,
+   messageGroupId: patientId,
+ });

Follow-up: if you require per-run ordering isolation instead of cross-run ordering by patient, you can also set

messageGroupId: `${fhirToCsvJobId}:${patientId}`,
♻️ Duplicate comments (6)
packages/utils/src/fhir/bundle/find_with_custom_filter.sh (1)

97-110: Use jq’s exit code instead of string comparison; fix SC2155 (duplicate of earlier feedback).

Rely on jq -e’s exit status and avoid capturing output into a variable. This fixes ShellCheck SC2155 and removes brittle string comparisons.

Apply this diff:

 # Function to check if a file matches the custom filter
 check_file_with_filter() {
     local file="$1"
     local filter="$2"
     
-    # Apply the custom jq filter to the file
-    local result=$(jq -e "$filter" "$file" 2>/dev/null)
-    
-    if [ "$result" = "true" ]; then
-        return 0  # File matches the filter
-    fi
-    
-    return 1  # File doesn't match the filter
+    # Apply the custom jq filter; truthy -> exit 0, false/null -> non-zero
+    if jq -e "$filter" "$file" >/dev/null 2>&1; then
+        return 0
+    fi
+    return 1
 }
packages/utils/src/analytics-platform/util/check-merge-groups.ts (1)

249-255: Fix inconsistent script name in usage message.

Usage/help still references check-duplicate-keys-simple.ts while the file is check-merge-groups.ts.

Apply this diff:

     if (args.length === 0) {
-      console.log("Usage: ts-node check-duplicate-keys-simple.ts <folder-path>");
-      console.log("Example: ts-node check-duplicate-keys-simple.ts /path/to/your/folder");
+      console.log("Usage: ts-node check-merge-groups.ts <folder-path>");
+      console.log("Example: ts-node check-merge-groups.ts /path/to/your/folder");
       console.log("\nOr use current directory:");
-      console.log("ts-node check-duplicate-keys-simple.ts .");
+      console.log("ts-node check-merge-groups.ts .");
       process.exit(1);
     }
packages/core/src/external/aws/lambda.ts (1)

27-33: Use NodeHttpHandler for v3 client timeouts; httpOptions is not supported in AWS SDK v3

The v3 Lambda client ignores httpOptions. Configure timeouts via a custom NodeHttpHandler passed as requestHandler.

Apply this diff:

 export function makeLambdaClientV3(region: string, timeoutInMillis?: number): LambdaClientV3 {
-  const lambdaClient = new LambdaClientV3({
-    region,
-    ...(timeoutInMillis ? { httpOptions: { timeout: timeoutInMillis } } : {}),
-  });
-  return lambdaClient;
+  const requestHandler =
+    timeoutInMillis !== undefined
+      ? new NodeHttpHandler({
+          connectionTimeout: timeoutInMillis,
+          socketTimeout: timeoutInMillis,
+        })
+      : undefined;
+
+  return new LambdaClientV3({
+    region,
+    ...(requestHandler ? { requestHandler } : {}),
+  });
 }

Add this import near the top of the file:

import { NodeHttpHandler } from "@smithy/node-http-handler";
packages/utils/src/analytics-platform/2-merge-csvs.ts (3)

111-116: Make SQS deduplication ID deterministic per job and chunk.

Include cxId + mergeCsvJobId and sort IDs to avoid collisions and order-dependence during the FIFO 5-minute dedup window.

-        await sqsClient.sendMessageToQueue(queueUrl, payloadString, {
+        await sqsClient.sendMessageToQueue(queueUrl, payloadString, {
           fifo: true,
-          messageDeduplicationId: createUuidFromText(ptIdsOfThisRun.join(",")),
+          messageDeduplicationId: createUuidFromText(
+            JSON.stringify({ cxId, mergeCsvJobId, ids: [...ptIdsOfThisRun].sort() })
+          ),
           messageGroupId,
         });

39-41: Add validation for required CLI argument (fhirToCsvJobId).

Without this, downstream code may run with an undefined job ID and produce confusing failures.

 const fhirToCsvJobId = process.argv[2];
+if (!fhirToCsvJobId) {
+  console.error("ERROR: Missing required argument: fhirToCsvJobId");
+  console.error("Usage: ts-node src/analytics-platform/2-merge-csvs.ts <fhirToCsvJobId>");
+  process.exit(1);
+}

152-167: Fail fast if no patient IDs are found for the provided job ID.

Avoid silently proceeding with an empty set; this is likely a misconfiguration or wrong prefix.

 async function getPatientIdsFromFhirToCsvJob({
   fhirToCsvJobId,
 }: {
   fhirToCsvJobId: string;
 }): Promise<string[]> {
   const basePrefix = `${defaultPayload.sourcePrefix}/${fhirToCsvJobId}/`;
   const files = await s3Utils.listFirstLevelSubdirectories({
     bucket: bucketName,
     prefix: basePrefix,
   });
+  if (files.length === 0) {
+    throw new Error(
+      `No patient data found for fhirToCsvJobId=${fhirToCsvJobId} at prefix=${basePrefix} (bucket=${bucketName})`
+    );
+  }
   const patientIds = files.flatMap(file => {
     const item = file.Prefix?.endsWith("/") ? file.Prefix.slice(0, -1) : file.Prefix;
     return item?.split("/").pop() ?? [];
   });
-  return patientIds;
+  const validPatientIds = patientIds.filter(id => id && id.trim().length > 0);
+  if (validPatientIds.length === 0) {
+    throw new Error(
+      `No valid patient IDs extracted for fhirToCsvJobId=${fhirToCsvJobId} from prefix=${basePrefix}`
+    );
+  }
+  return validPatientIds;
 }
🧹 Nitpick comments (23)
packages/utils/src/analytics-platform/3.readme.md (1)

1-1: Polish grammar and clarity (comma splice, hyphenation, and wording).

Suggest rephrasing to avoid the comma splice, use “third” instead of “3rd,” prefer “subfolder,” and clarify “starting with ‘3-’” as a “prefix”.

Apply this diff:

-The 3rd step is platform-specific, check the respective sub-folder for a script starting with "3-".
+The third step is platform-specific; check the respective subfolder for a script whose name starts with the prefix "3-".
+Example (Snowflake): packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
packages/utils/src/fhir/bundle/find_with_custom_filter.sh (3)

81-86: Normalize DIRECTORY to avoid trailing-slash edge cases in relative path computation.

If the provided directory ends with a trailing slash, ${file#$DIRECTORY/} won’t strip the prefix as intended. Normalize once after validating the directory.

Apply this diff:

 if [ ! -d "$DIRECTORY" ]; then
     echo "Error: Directory '$DIRECTORY' does not exist"
     exit 1
 fi
+
+# Normalize DIRECTORY: remove trailing slash for consistent prefix stripping
+DIRECTORY="${DIRECTORY%/}"

17-18: Update jq homepage URL.

The jq project’s canonical site is now under jqlang.

-#   - jq must be installed (https://stedolan.github.io/jq/)
+#   - jq must be installed (https://jqlang.github.io/jq/)

1-3: Optional: Enable safer Bash mode early.

set -euo pipefail helps catch unset vars and early failures; setting IFS minimizes accidental word-splitting.

 #!/bin/bash
-
+# Safer bash defaults
+set -euo pipefail
+IFS=$'\n\t'
packages/utils/src/analytics-platform/util/check-merge-groups.ts (2)

60-61: Avoid passing objects directly to console logging; keep stack traces; comply with logging guidelines.

Don’t send objects as a second param to logs; format to string and include stack to retain context.

Apply this diff:

   } catch (error) {
-      console.error(`Error reading directory ${this.targetFolder}:`, error);
+      const msg = error instanceof Error ? error.stack ?? error.message : String(error);
+      console.error(`Error reading directory ${this.targetFolder}: ${msg}`);
       return [];
     }
     } catch (error) {
-      console.error(`Error processing ${filePath}:`, error);
+      const msg = error instanceof Error ? error.stack ?? error.message : String(error);
+      console.error(`Error processing ${filePath}: ${msg}`);
     }

Also applies to: 85-87


229-231: Deterministic processing and guideline-friendly loops.

For reproducible outputs and to avoid inline arrow callbacks (per guidelines), sort the files and use a for-of loop.

Apply this diff:

-    // Process each file
-    groupFiles.forEach(file => this.processFile(file));
+    // Process each file (sorted for deterministic output)
+    for (const file of groupFiles.sort()) {
+      this.processFile(file);
+    }
packages/core/src/external/aws/lambda-logic/README.md (1)

3-7: Tighten grammar and prepositions in the deprecation note

Small copy edits improve clarity and fix minor grammar issues (comma splice, preposition choice).

Apply this diff:

-This folder should not exist. If we put all "lambda logic" in this folder, we will have a hard time
-maintaining it, all lambdas' logic will be in a single place.
+This folder should not exist. If we put all "lambda logic" in this folder, we will have a hard time
+maintaining it, since all lambdas' logic would be concentrated in a single place.

-What we've been doing for all other lambdas, is to call commands/functions that are stored on the
-proper folder for each use case/business logic.
+What we've been doing for all other lambdas is to call commands/functions that live in the
+proper folder for each use case or business logic.
packages/data-transformation/fhir-to-csv/main.py (2)

60-71: Avoid opening the file when using s3_client.download_file()

download_file takes a destination path and manages the file itself. The with open(..., "wb") wrapper is unnecessary and can be removed.

Apply this diff:

-with open(local_bundle_key, "wb") as f:
-    try:
-        print(f"Downloading bundle {bundle_key} from {input_bucket} to {local_bundle_key}")
-        s3_client.download_file(input_bucket, bundle_key, local_bundle_key)
-        print(f"Downloaded bundle {bundle_key} from {input_bucket} to {local_bundle_key}")
-    except s3_client.exceptions.ClientError as e:
-        if e.response['Error']['Code'] == '404':
-            print(f"Bundle {bundle_key} not found in input bucket {input_bucket}")
-            raise ValueError("Bundle not found") from e
-        else:
-            raise e
+try:
+    print(f"Downloading bundle {bundle_key} from {input_bucket} to {local_bundle_key}")
+    s3_client.download_file(input_bucket, bundle_key, local_bundle_key)
+    print(f"Downloaded bundle {bundle_key} from {input_bucket} to {local_bundle_key}")
+except s3_client.exceptions.ClientError as e:
+    if e.response['Error']['Code'] == '404':
+        print(f"Bundle {bundle_key} not found in input bucket {input_bucket}")
+        raise ValueError("Bundle not found") from e
+    else:
+        raise e

1-25: Note: Many Snowflake-related imports are now unused

Given ENG-813 plans to remove the old Snowflake ingestion path, it’s fine short-term, but these imports will be flagged by linters. Consider removing them once ENG-813 lands.

packages/utils/src/analytics-platform/1-fhir-to-csv.ts (2)

21-24: Doc nit: pluralize “CSV file”

Each patient conversion produces multiple CSV files (one per table/resource). Adjust docs accordingly.

Apply this diff:

- * It sends a message to SQS per patient, consumed by a Lambda function that generates the
- * CSV file.
+ * It sends a message to SQS per patient, consumed by a Lambda function that generates the
+ * CSV files.

98-103: File write path robustness

fs.writeFileSync(outputFile, ...) writes to the CWD. If the script is run from a directory without write permission, this fails. Consider writing to a temp or known output dir, or at least logging the absolute path.

packages/infra/lib/analytics-platform/analytics-platform-stack.ts (1)

241-243: Fix misleading comment: bucket being granted here is analytics, not medical documents

The code grants read/write on analyticsPlatformBucket, but the comment mentions the medical documents bucket.

Apply this diff:

-    // Grant read to medical document bucket set on the api-stack
-    ownProps.analyticsPlatformBucket.grantReadWrite(fhirToCsvTransformLambda);
+    // Grant read/write to the analytics platform bucket (Snowflake landing zone)
+    ownProps.analyticsPlatformBucket.grantReadWrite(fhirToCsvTransformLambda);
packages/utils/src/analytics-platform/2-merge-csvs.ts (11)

91-96: Avoid multi-line logs and fix label to reflect SQS operations.

Current log uses a multi-line string and mentions “Lambda invocations” while we’re putting SQS messages.

-  log(
-    `>>> Running it for ${uniquePatientIds.length} patients (${patientIdChunks.length} chunks of ` +
-      `${maxPatientCountPerLambda} patients each)...\n- mergeCsvJobId: ${mergeCsvJobId}\n- fhirToCsvJobId: ${fhirToCsvJobId}` +
-      `\n- maxPatientCountPerLambda: ${maxPatientCountPerLambda}\n- numberOfParallelLambdaInvocations: ${numberOfParallelPutSqsOperations}` +
-      `\n- maxUncompressedSizePerFileInMB: ${maxUncompressedSizePerFileInMB}`
-  );
+  log(
+    `>>> Running it for ${uniquePatientIds.length} patients (${patientIdChunks.length} chunks of ` +
+      `${maxPatientCountPerLambda} patients each)...`
+  );
+  log(`- mergeCsvJobId: ${mergeCsvJobId}`);
+  log(`- fhirToCsvJobId: ${fhirToCsvJobId}`);
+  log(`- maxPatientCountPerLambda: ${maxPatientCountPerLambda}`);
+  log(`- numberOfParallelPutSqsOperations: ${numberOfParallelPutSqsOperations}`);
+  log(`- maxUncompressedSizePerFileInMB: ${maxUncompressedSizePerFileInMB}`);

118-121: Fix log wording: it’s one message per chunk and index is off-by-one.

Prevents logs like “Put 0 messages on queue”.

-        log(
-          `>>> Put ${index} messages on queue (${amountOfPatientsProcessed}/${uniquePatientIds.length} patients)`
-        );
+        log(
+          `>>> Put message ${index + 1} on queue (${amountOfPatientsProcessed}/${uniquePatientIds.length} patients)`
+        );

104-129: Sanitize timestamp for filenames (Windows-safe).

File names with : are invalid on Windows. Sanitize the per-chunk failure filename.

-      const runTimestamp = buildDayjs().toISOString();
+      const runTimestamp = buildDayjs().toISOString();
+      const runTimestampForFileName = runTimestamp.replace(/[:.]/g, "-");
@@
-        const outputFile = `2-merge-csvs_failed-patient-ids_${mergeCsvJobId}_${runTimestamp}.txt`;
+        const outputFile = `2-merge-csvs_failed-patient-ids_${mergeCsvJobId}_${runTimestampForFileName}.txt`;

50-53: Extract jitter values to constants (avoid magic numbers).

Improves readability and configurability per guidelines.

 const confirmationTime = dayjs.duration(1, "seconds");
-const messageGroupId = "merge-csvs";
 const numberOfParallelPutSqsOperations = 20;
+const minJitterMillis = 20;
+const maxJitterMillis = 100;

Note: Message group advice in a separate comment.


133-136: Use the jitter constants instead of literals.

     {
       numberOfParallelExecutions: numberOfParallelPutSqsOperations,
-      minJitterMillis: 20,
-      maxJitterMillis: 100,
+      minJitterMillis,
+      maxJitterMillis,
     }

87-89: Sort patient IDs to make chunking deterministic.

This ensures reproducible chunk composition across runs and environments.

-  const uniquePatientIds = [...new Set(patientsToMerge)];
+  const uniquePatientIds = [...new Set(patientsToMerge)].sort();

51-51: Confirm FIFO MessageGroupId strategy (single group serializes all jobs).

Using a static messageGroupId = "merge-csvs" will serialize all messages on the queue, across all CXs and jobs, potentially creating head-of-line blocking. If you want ordering only within a single run, consider a per-job group ID (e.g., merge-csvs:${cxId}:${mergeCsvJobId}) and define it after cxId/mergeCsvJobId are initialized.

If you choose to change it, move the declaration below the env variables and set:

const messageGroupId = `merge-csvs:${cxId}:${mergeCsvJobId}`;

184-185: Add top-level error handling for unhandled rejections.

Prevents silent failures and exits with a non-zero code for CI or scripting.

-main();
+main().catch(err => {
+  const { log } = out("");
+  log(`>>> FAILED: ${errorToString(err)}`);
+  process.exit(1);
+});

71-72: Remove unnecessary 100ms sleep at start of main.

No dependency requires this delay; it only slows execution.

-  await sleep(100);

127-145: Patient ID failure files: ensure they’re ignored and handled securely.

These files contain identifiers; ensure they’re .gitignored and stored in a secure location when running in shared environments.

Would you like me to add a .gitignore entry like:

2-merge-csvs_failed-patient-ids_*.txt

and/or make the output directory configurable via env?


15-15: Avoid cross-package relative import of Lambda types (tight coupling).

Importing types from packages/lambdas ties utils to an app package and breaks layering. Move the shared payload type to a shared module (e.g., @metriport/core or @metriport/shared) and import from there.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled
  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 23280f4 and 85b6fad.

⛔ Files ignored due to path filters (2)
  • package-lock.json is excluded by !**/package-lock.json
  • packages/lambdas/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (29)
  • packages/core/package.json (4 hunks)
  • packages/core/src/command/analytics-platform/fhir-to-csv/batch/fhir-to-csv.ts (0 hunks)
  • packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv-transform.ts (0 hunks)
  • packages/core/src/command/analytics-platform/merge-csvs/__tests__/index.test.ts (1 hunks)
  • packages/core/src/command/analytics-platform/merge-csvs/index.ts (1 hunks)
  • packages/core/src/external/aws/lambda-logic/README.md (1 hunks)
  • packages/core/src/external/aws/lambda.ts (7 hunks)
  • packages/core/src/external/aws/s3.ts (1 hunks)
  • packages/core/src/external/snowflake/commands.ts (1 hunks)
  • packages/core/src/util/config.ts (0 hunks)
  • packages/data-transformation/fhir-to-csv/main.py (2 hunks)
  • packages/infra/lib/analytics-platform/analytics-platform-stack.ts (8 hunks)
  • packages/infra/lib/analytics-platform/types.ts (1 hunks)
  • packages/infra/lib/api-stack.ts (1 hunks)
  • packages/infra/lib/api-stack/api-service.ts (0 hunks)
  • packages/infra/lib/shared/settings.ts (1 hunks)
  • packages/lambdas/package.json (4 hunks)
  • packages/lambdas/src/analytics-platform/fhir-to-csv.ts (0 hunks)
  • packages/lambdas/src/analytics-platform/merge-csvs.ts (1 hunks)
  • packages/utils/package.json (3 hunks)
  • packages/utils/src/analytics-platform/1-fhir-to-csv.ts (5 hunks)
  • packages/utils/src/analytics-platform/2-merge-csvs.ts (1 hunks)
  • packages/utils/src/analytics-platform/3.readme.md (1 hunks)
  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts (1 hunks)
  • packages/utils/src/analytics-platform/util/check-duplicate-patient-ids.ts (1 hunks)
  • packages/utils/src/analytics-platform/util/check-merge-groups.ts (1 hunks)
  • packages/utils/src/consolidated/bulk-recreate-consolidated.ts (4 hunks)
  • packages/utils/src/fhir/bundle/count_resources.sh (1 hunks)
  • packages/utils/src/fhir/bundle/find_with_custom_filter.sh (1 hunks)
💤 Files with no reviewable changes (5)
  • packages/infra/lib/api-stack/api-service.ts
  • packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv-transform.ts
  • packages/core/src/command/analytics-platform/fhir-to-csv/batch/fhir-to-csv.ts
  • packages/core/src/util/config.ts
  • packages/lambdas/src/analytics-platform/fhir-to-csv.ts
🚧 Files skipped from review as they are similar to previous changes (15)
  • packages/utils/package.json
  • packages/utils/src/analytics-platform/util/check-duplicate-patient-ids.ts
  • packages/lambdas/package.json
  • packages/core/src/command/analytics-platform/merge-csvs/tests/index.test.ts
  • packages/core/src/external/snowflake/commands.ts
  • packages/core/src/external/aws/s3.ts
  • packages/lambdas/src/analytics-platform/merge-csvs.ts
  • packages/infra/lib/api-stack.ts
  • packages/utils/src/consolidated/bulk-recreate-consolidated.ts
  • packages/core/package.json
  • packages/infra/lib/analytics-platform/types.ts
  • packages/utils/src/fhir/bundle/count_resources.sh
  • packages/infra/lib/shared/settings.ts
  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{js,jsx,ts,tsx}

📄 CodeRabbit Inference Engine (.cursorrules)

**/*.{js,jsx,ts,tsx}: Don’t use null inside the app, only on code interacting with external interfaces/services, like DB and HTTP; convert to undefined before sending inwards into the code
Use const whenever possible
Use async/await instead of .then()
Naming: classes, enums: PascalCase
Naming: constants, variables, functions: camelCase
Naming: file names: kebab-case
Naming: Don’t use negative names, like notEnabled, prefer isDisabled
If possible, use decomposing objects for function parameters
Prefer Nullish Coalesce (??) than the OR operator (||) when you want to provide a default value
Avoid creating arrow functions
Use truthy syntax instead of in - i.e., if (data.link) not if ('link' in data)
While handling errors, keep the stack trace around: if you create a new Error (e.g., MetriportError), make sure to pass the original error as the new one’s cause so the stack trace is available upstream.
max column length is 100 chars
multi-line comments use /** */
top-level comments go after the import (save pre-import to basic file header, like license)
move literals to constants declared after imports when possible

Files:

  • packages/utils/src/analytics-platform/util/check-merge-groups.ts
  • packages/utils/src/analytics-platform/2-merge-csvs.ts
  • packages/utils/src/analytics-platform/1-fhir-to-csv.ts
  • packages/infra/lib/analytics-platform/analytics-platform-stack.ts
  • packages/core/src/external/aws/lambda.ts
**/*.{ts,tsx}

📄 CodeRabbit Inference Engine (.cursorrules)

Use types whenever possible

Files:

  • packages/utils/src/analytics-platform/util/check-merge-groups.ts
  • packages/utils/src/analytics-platform/2-merge-csvs.ts
  • packages/utils/src/analytics-platform/1-fhir-to-csv.ts
  • packages/infra/lib/analytics-platform/analytics-platform-stack.ts
  • packages/core/src/external/aws/lambda.ts
**/*.ts

⚙️ CodeRabbit Configuration File

**/*.ts: - Use the Onion Pattern to organize a package's code in layers

  • Try to use immutable code and avoid sharing state across different functions, objects, and systems
  • Try to build code that's idempotent whenever possible
  • Prefer functional programming style functions: small, deterministic, 1 input, 1 output
  • Minimize coupling / dependencies
  • Avoid modifying objects received as parameter
  • Only add comments to code to explain why something was done, not how it works
  • Naming
    • classes, enums: PascalCase
    • constants, variables, functions: camelCase
    • file names: kebab-case
    • table and column names: snake_case
    • Use meaningful names, so whoever is reading the code understands what it means
    • Don’t use negative names, like notEnabled, prefer isDisabled
    • For numeric values, if the type doesn’t convey the unit, add the unit to the name
  • Typescript
    • Use types
    • Prefer const instead of let
    • Avoid any and casting from any to other types
    • Type predicates: only applicable to narrow down the type, not to force a complete type conversion
    • Prefer deconstructing parameters for functions instead of multiple parameters that might be of
      the same type
    • Don’t use null inside the app, only on code interacting with external interfaces/services,
      like DB and HTTP; convert to undefined before sending inwards into the code
    • Use async/await instead of .then()
    • Use the strict equality operator ===, don’t use abstract equality operator ==
    • When calling a Promise-returning function asynchronously (i.e., not awaiting), use .catch() to
      handle errors (see processAsyncError and emptyFunction depending on the case)
    • Date and Time
      • Always use buildDayjs() to create dayjs instances
      • Prefer dayjs.duration(...) to create duration consts and keep them as duration
  • Prefer Nullish Coalesce (??) than the OR operator (||) to provide a default value
  • Avoid creating arrow functions
  • U...

Files:

  • packages/utils/src/analytics-platform/util/check-merge-groups.ts
  • packages/utils/src/analytics-platform/2-merge-csvs.ts
  • packages/utils/src/analytics-platform/1-fhir-to-csv.ts
  • packages/infra/lib/analytics-platform/analytics-platform-stack.ts
  • packages/core/src/external/aws/lambda.ts
🧠 Learnings (15)
📓 Common learnings
Learnt from: leite08
PR: metriport/metriport#4255
File: packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts:19-20
Timestamp: 2025-07-27T00:40:32.149Z
Learning: User leite08 acknowledged the duplicate inputBundle spread issue in packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts and will fix it in a follow-up PR rather than in the current patch release PR #4255.
Learnt from: leite08
PR: metriport/metriport#4013
File: packages/core/src/fhir-to-cda/cda-templates/components/procedures.ts:104-105
Timestamp: 2025-06-12T22:53:09.606Z
Learning: User leite08 prefers responses in English only.
Learnt from: leite08
PR: metriport/metriport#3991
File: packages/core/src/external/aws/s3.ts:621-630
Timestamp: 2025-06-10T22:20:21.203Z
Learning: User leite08 prefers that all review comments are written in English.
📚 Learning: 2025-07-18T09:53:47.448Z
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/main.py:0-0
Timestamp: 2025-07-18T09:53:47.448Z
Learning: In packages/data-transformation/fhir-to-csv/main.py, the user thomasyopes prefers to let Snowflake operations (create_temp_tables, copy_into_temp_table, append_temp_tables, rename_temp_tables) hard fail rather than adding explicit try-catch error handling blocks.

Applied to files:

  • packages/data-transformation/fhir-to-csv/main.py
📚 Learning: 2025-07-18T09:53:38.906Z
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/main.py:0-0
Timestamp: 2025-07-18T09:53:38.906Z
Learning: In packages/data-transformation/fhir-to-csv/main.py, S3 upload operations should be allowed to hard fail without try-catch blocks. The team prefers immediate failure over error handling and continuation for S3 upload operations in the FHIR-to-CSV transformation process.

Applied to files:

  • packages/data-transformation/fhir-to-csv/main.py
📚 Learning: 2025-07-18T09:54:01.790Z
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:54-58
Timestamp: 2025-07-18T09:54:01.790Z
Learning: The user thomasyopes prefers to keep the setupSnowflake.py file in packages/data-transformation/fhir-to-csv/src/setupSnowflake/ as-is, declining suggestions for additional error handling around configuration file parsing.

Applied to files:

  • packages/data-transformation/fhir-to-csv/main.py
📚 Learning: 2025-07-19T14:04:34.757Z
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:93-97
Timestamp: 2025-07-19T14:04:34.757Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes accepts SQL injection risk in the append_job_tables function's DELETE query where patient_id is directly interpolated, based on their assessment that they control all inputs to the system.

Applied to files:

  • packages/data-transformation/fhir-to-csv/main.py
📚 Learning: 2025-07-18T09:54:43.598Z
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py:0-0
Timestamp: 2025-07-18T09:54:43.598Z
Learning: In packages/data-transformation/fhir-to-csv/src/setupSnowflake/setupSnowflake.py, the user thomasyopes prefers to log and swallow exceptions in the copy_into_temp_table function rather than re-raising them. This allows the data transformation job to continue processing other files even if one file fails to copy to Snowflake, supporting partial success scenarios in batch data processing.

Applied to files:

  • packages/data-transformation/fhir-to-csv/main.py
📚 Learning: 2025-05-20T21:26:26.804Z
Learnt from: leite08
PR: metriport/metriport#3814
File: packages/api/src/routes/internal/medical/patient-consolidated.ts:141-174
Timestamp: 2025-05-20T21:26:26.804Z
Learning: The functionality introduced in packages/api/src/routes/internal/medical/patient-consolidated.ts is planned to be refactored in downstream PR #3857, including improvements to error handling and validation.

Applied to files:

  • packages/utils/src/analytics-platform/2-merge-csvs.ts
📚 Learning: 2025-07-27T00:40:32.149Z
Learnt from: leite08
PR: metriport/metriport#4255
File: packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts:19-20
Timestamp: 2025-07-27T00:40:32.149Z
Learning: User leite08 acknowledged the duplicate inputBundle spread issue in packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts and will fix it in a follow-up PR rather than in the current patch release PR #4255.

Applied to files:

  • packages/utils/src/analytics-platform/2-merge-csvs.ts
📚 Learning: 2025-06-17T20:03:18.931Z
Learnt from: RamilGaripov
PR: metriport/metriport#3962
File: packages/api/src/sequelize/migrations/2025-06-17_00_create-cohort-tables.ts:50-56
Timestamp: 2025-06-17T20:03:18.931Z
Learning: In the Metriport codebase, the patient table ID is defined as DataTypes.STRING in the database schema, not UUID. Foreign key references to patient.id should use DataTypes.STRING to maintain consistency.

Applied to files:

  • packages/utils/src/analytics-platform/2-merge-csvs.ts
📚 Learning: 2025-04-16T00:25:25.196Z
Learnt from: RamilGaripov
PR: metriport/metriport#3676
File: packages/core/src/command/hl7v2-subscriptions/hl7v2-to-fhir-conversion/shared.ts:1-10
Timestamp: 2025-04-16T00:25:25.196Z
Learning: The circular dependency between shared.ts (importing getPatientIdsOrFail) and adt/utils.ts (using unpackPidFieldOrFail) will be addressed in a follow-up PR.

Applied to files:

  • packages/utils/src/analytics-platform/2-merge-csvs.ts
📚 Learning: 2025-07-18T09:33:29.581Z
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/api/src/routes/internal/analytics-platform/index.ts:28-28
Timestamp: 2025-07-18T09:33:29.581Z
Learning: In packages/api/src/routes/internal/analytics-platform/index.ts, the patientId parameter handling is intentionally inconsistent: the `/fhir-to-csv` and `/fhir-to-csv/transform` endpoints require patientId (using getFromQueryOrFail) for single patient processing, while the `/fhir-to-csv/batch` endpoint treats patientId as optional (using getFromQuery) because it's designed to run over all patients when no specific patient ID is provided.

Applied to files:

  • packages/utils/src/analytics-platform/2-merge-csvs.ts
  • packages/utils/src/analytics-platform/1-fhir-to-csv.ts
📚 Learning: 2025-07-16T01:38:14.080Z
Learnt from: leite08
PR: metriport/metriport#4194
File: packages/api/src/command/medical/patient/get-patient-facilities.ts:36-44
Timestamp: 2025-07-16T01:38:14.080Z
Learning: In packages/api/src/command/medical/patient/get-patient-facilities.ts, the patient.facilityIds field is strongly typed as array of strings (non-nullable), so null/undefined checks are not needed when accessing this property.

Applied to files:

  • packages/utils/src/analytics-platform/2-merge-csvs.ts
📚 Learning: 2025-05-30T01:11:35.300Z
Learnt from: keshavsaharia
PR: metriport/metriport#3885
File: packages/utils/src/surescripts/generate-patient-load-file.ts:92-94
Timestamp: 2025-05-30T01:11:35.300Z
Learning: In V0 of the Surescripts implementation (packages/utils/src/surescripts/generate-patient-load-file.ts), there is always only 1 facility ID, so the current facility_ids parsing logic with substring operations is sufficient and additional validation is not necessary.

Applied to files:

  • packages/utils/src/analytics-platform/2-merge-csvs.ts
📚 Learning: 2025-03-11T20:42:46.516Z
Learnt from: thomasyopes
PR: metriport/metriport#3427
File: packages/core/src/external/ehr/api/sync-patient.ts:16-55
Timestamp: 2025-03-11T20:42:46.516Z
Learning: In the patient synchronization architecture, the flow follows this pattern: (1) `ehr-sync-patient-cloud.ts` sends messages to an SQS queue, (2) the `ehr-sync-patient` Lambda consumes these messages, and (3) the Lambda uses the `syncPatient` function to make the API calls to process the patient data.

Applied to files:

  • packages/utils/src/analytics-platform/1-fhir-to-csv.ts
📚 Learning: 2025-07-09T20:41:08.549Z
Learnt from: thomasyopes
PR: metriport/metriport#4167
File: packages/infra/lib/analytics-platform/analytics-platform-stack.ts:56-64
Timestamp: 2025-07-09T20:41:08.549Z
Learning: In the analytics platform Snowflake integration (packages/infra/lib/analytics-platform/analytics-platform-stack.ts), the `integrationUserArn` field in the config temporarily contains an AWS account ID rather than an ARN. This is planned to be updated to an actual ARN in a second pass, at which point the IAM Role's assumedBy should be changed from AccountPrincipal to ArnPrincipal.

Applied to files:

  • packages/infra/lib/analytics-platform/analytics-platform-stack.ts
🧬 Code Graph Analysis (4)
packages/utils/src/analytics-platform/2-merge-csvs.ts (6)
packages/shared/src/common/date.ts (1)
  • buildDayjs (93-95)
packages/shared/src/common/env-var.ts (1)
  • getEnvVarOrFail (14-20)
packages/core/src/external/aws/s3.ts (1)
  • S3Utils (140-600)
packages/lambdas/src/analytics-platform/merge-csvs.ts (1)
  • GroupAndMergeCSVsParamsLambda (19-22)
packages/core/src/util/log.ts (1)
  • out (30-35)
packages/utils/src/shared/duration.ts (1)
  • elapsedTimeAsStr (8-13)
packages/utils/src/analytics-platform/1-fhir-to-csv.ts (5)
packages/shared/src/common/date.ts (1)
  • buildDayjs (93-95)
packages/shared/src/common/env-var.ts (1)
  • getEnvVarOrFail (14-20)
packages/core/src/util/log.ts (1)
  • out (30-35)
packages/utils/src/patient/get-ids.ts (1)
  • getAllPatientIds (22-29)
packages/utils/src/shared/duration.ts (1)
  • elapsedTimeAsStr (8-13)
packages/infra/lib/analytics-platform/analytics-platform-stack.ts (4)
packages/infra/lib/shared/settings.ts (1)
  • QueueAndLambdaSettings (15-49)
packages/infra/lib/analytics-platform/types.ts (1)
  • AnalyticsPlatformsAssets (4-9)
packages/infra/config/env-config.ts (1)
  • EnvConfigNonSandbox (303-316)
packages/infra/lib/shared/lambda.ts (1)
  • createLambda (85-158)
packages/core/src/external/aws/lambda.ts (2)
packages/core/src/util/log.ts (1)
  • out (30-35)
packages/shared/src/index.ts (3)
  • MetriportError (43-43)
  • BadRequestError (42-42)
  • NotFoundError (44-44)
🪛 LanguageTool
packages/core/src/external/aws/lambda-logic/README.md

[grammar] ~3-~3: There might be a mistake here.
Context: ...in this folder, we will have a hard time maintaining it, all lambdas' logic will ...

(QB_NEW_EN)


[grammar] ~6-~6: There might be a mistake here.
Context: ...ommands/functions that are stored on the proper folder for each use case/business...

(QB_NEW_EN)


[grammar] ~7-~7: There might be a mistake here.
Context: ...folder for each use case/business logic.

(QB_NEW_EN)

packages/utils/src/analytics-platform/3.readme.md

[grammar] ~1-~1: There might be a mistake here.
Context: ...-folder for a script starting with "3-".

(QB_NEW_EN)

🪛 Shellcheck (0.10.0)
packages/utils/src/fhir/bundle/find_with_custom_filter.sh

[warning] 103-103: Declare and assign separately to avoid masking return values.

(SC2155)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: Analyze (javascript)
🔇 Additional comments (4)
packages/utils/src/analytics-platform/3.readme.md (1)

1-2: Nice addition; clear pointer to platform-specific step.

This lightweight README helps discoverability for step 3. LGTM.

packages/utils/src/fhir/bundle/find_with_custom_filter.sh (1)

112-114: Ensure cross-platform compatibility of find depth limiting

I ran your verification script on Linux—and indeed find -maxdepth works there—but BSD/macOS’s default find still lacks -maxdepth, which would cause the depth filter to be silently ignored on macOS. To prevent silent failures in CI or developer environments, please update the script to either detect and use GNU find (gfind), implement a BSD-compatible fallback with -prune, or clearly document that GNU findutils is required.

File: packages/utils/src/fhir/bundle/find_with_custom_filter.sh
Lines: 112–114

Suggestions:

  • Introduce a FIND_CMD variable that prefers gfind if available, falling back to find.
  • For BSD find, replace -maxdepth with a -prune expression:
    FIND_CMD=find
    if command -v gfind >/dev/null 2>&1; then
      FIND_CMD=gfind
    fi
    
    json_files=$(
      $FIND_CMD "$DIRECTORY" \
        $( [ "$FIND_CMD" = find ] \
            && printf '%s\n' -type d ! -path "$DIRECTORY" -prune -o \
            || printf '%s\n' -maxdepth "$MAXDEPTH" ) \
        -name '*.json' -type f -print | sort
    )
  • Alternatively, add a note in your README or the script header that GNU findutils is a prerequisite.
packages/infra/lib/analytics-platform/analytics-platform-stack.ts (2)

61-82: Double-check SQS/Lambda timeouts and visibility alignment under worst-case merges

Visibility is set to 2x Lambda timeout + 1s, which is usually fine. Given merge files can approach 800MB uncompressed and maxConcurrency is 20, confirm that:

  • the function consistently completes under mergeCsvsLambdaTimeout, and
  • reprocessing due to tight visibility doesn’t cause unnecessary churn when nearing timeouts.

If needed, we can adjust the visibility multiplier or add per-item backoff. Let me know if you want a patch to make these settings configurable via env.


151-157: Please verify that integrationUserArn in your real config is a full ARN before using ArnPrincipal

We’ve confirmed:

  • The config type (packages/infra/config/analytics-platform-config.ts) defines integrationUserArn as a string.
  • The example config (packages/infra/config/example.ts) shows a full ARN ("arn:aws:iam::000000000000:role/SnowflakeIntegrationRole").
  • We did not find any non‐example config file in the repo—your actual environment config could still be just an AWS account ID.

If your deployed configuration still supplies only an account ID, new iam.ArnPrincipal(...) will be invalid. Either:

  • Confirm that your real config has been migrated to supply a proper IAM ARN for the Snowflake integration user, or
  • Temporarily switch back to new iam.AccountPrincipal(...) until the config update is complete.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts (1)

23-27: Table name parsing relies on fragile string manipulation.

The parsing function uses hard-coded array indices and assumes a specific file structure. This could fail silently if the key format changes.

This comment was already raised in the past review comments section. The parsing logic is brittle and would benefit from more robust validation and error handling.

🧹 Nitpick comments (9)
packages/utils/src/analytics-platform/1-fhir-to-csv.ts (4)

1-1: Consider consolidating the header comment with existing structure.

The standalone header comment on Line 1 is redundant since the filename is already clear from the file path. Consider removing it to maintain consistency with other utility files.

-// 1-fhir-to-csv.ts
-

35-35: Increased concurrency may need monitoring.

The increase from 20 to 30 parallel executions could improve throughput but may also increase the risk of hitting SQS rate limits or Lambda concurrency constraints.

Consider making this configurable via environment variables to allow runtime adjustment based on observed performance and error rates.

-const numberOfParallelExecutions = 30;
+const numberOfParallelExecutions = parseInt(getEnvVar("PARALLEL_EXECUTIONS") ?? "30", 10);

77-77: TODO comment should be addressed or tracked.

The TODO comment about using FhirToCsvCloud suggests there may be a more appropriate abstraction available.

Would you like me to help investigate the FhirToCsvCloud abstraction or create an issue to track this potential refactoring?


96-96: Reduced jitter may require monitoring.

The decrease in maxJitterMillis from 200 to 100 could lead to more synchronized requests and potential thundering herd effects.

Consider monitoring error rates and adjusting jitter values if rate limiting issues occur.

packages/core/src/command/analytics-platform/merge-csvs/index.ts (2)

19-22: Consider making concurrency limits configurable.

The hard-coded concurrency limits could benefit from being environment-configurable for different deployment scenarios.

As mentioned in a past review comment, these values could be moved to environment variables via CDK for runtime adjustment.

const numberOfParallelListObjectsFromS3 = parseInt(process.env.PARALLEL_LIST_OBJECTS ?? "20", 10);
const numberOfParallelMergeFileGroups = parseInt(process.env.PARALLEL_MERGE_GROUPS ?? "5", 10);

409-438: Stream handling could be more robust.

The current implementation pipes S3 streams to gzip but doesn't handle potential stream errors during piping. Consider adding error handling to the pipe operation.

// Pipe S3 stream directly to gzip without loading into memory
-s3ReadStream.pipe(gzip, { end: false });
+s3ReadStream.pipe(gzip, { end: false });
+s3ReadStream.on("error", (error) => {
+  log(`Error reading file ${file.key}: ${errorToString(error)}`);
+  throw error;
+});
packages/lambdas/src/analytics-platform/merge-csvs.ts (3)

19-22: Remove unused exported type alias (or actually use it)

GroupAndMergeCSVsParamsLambda is not used. Either remove it or use it to type the parsed payload for stronger connections between schema and types.

Apply this diff if you choose to remove it:

-export type GroupAndMergeCSVsParamsLambda = Omit<
-  GroupAndMergeCSVsParams,
-  "sourceBucket" | "destinationBucket" | "region"
->;

24-46: Prefer a named handler (guideline) and avoid sending full event/patient IDs to Sentry/logs

  • Style: avoid arrow functions for handlers (guideline), use a named function and wrap it.
  • Telemetry hygiene: avoid capture.setExtra({ event }) and logging the entire payload; it can be large and may include patient IDs. Send only scoped metadata (job IDs, counts) and summarize logs.

Apply this diff:

-export const handler = capture.wrapHandler(async (event: SQSEvent) => {
-  capture.setExtra({ event, context: lambdaName });
-  const message = getSingleMessageOrFail(event.Records, lambdaName);
-  if (!message) return;
-  const parsedBody = parseBody(inputSchema, message.body);
-  capture.setExtra(parsedBody);
-  const { cxId, fhirToCsvJobId, mergeCsvJobId, patientIds, targetGroupSizeMB } = parsedBody;
-  const log = prefixedLog(`mergeCsvJobId ${mergeCsvJobId}`);
-  log(`Running with params:: ${JSON.stringify(parsedBody)}`);
-
-  const results = await groupAndMergeCSVs({
-    cxId,
-    fhirToCsvJobId,
-    mergeCsvJobId,
-    patientIds,
-    targetGroupSizeMB,
-    sourceBucket: bucket,
-    destinationBucket: bucket,
-    region,
-  });
-
-  return results;
-});
+async function rawHandler(event: SQSEvent) {
+  const message = getSingleMessageOrFail(event.Records, lambdaName);
+  if (!message) return;
+
+  const parsedBody = parseBody(inputSchema, message.body);
+  const { cxId, fhirToCsvJobId, mergeCsvJobId, patientIds, targetGroupSizeMB } = parsedBody;
+
+  // Keep Sentry extras minimal and PI/PHI-safe
+  capture.setExtra({
+    lambdaName,
+    messageId: message.messageId,
+    cxId,
+    fhirToCsvJobId,
+    mergeCsvJobId,
+    patientCount: patientIds.length,
+    targetGroupSizeMB,
+  });
+
+  const log = prefixedLog(`mergeCsvJobId ${mergeCsvJobId}`);
+  log(
+    `Running with cxId=${cxId}, fhirToCsvJobId=${fhirToCsvJobId}, mergeCsvJobId=${mergeCsvJobId}, ` +
+      `patientCount=${patientIds.length}, targetGroupSizeMB=${targetGroupSizeMB}`
+  );
+
+  const results = await groupAndMergeCSVs({
+    cxId,
+    fhirToCsvJobId,
+    mergeCsvJobId,
+    patientIds,
+    targetGroupSizeMB,
+    sourceBucket: bucket,
+    destinationBucket: bucket,
+    region,
+  });
+
+  return results;
+}
+
+export const handler = capture.wrapHandler(rawHandler);

48-54: Harden input validation: require ≥1 patient and a positive target size

Add simple constraints to prevent empty work and invalid sizes. Optionally cap to 1,200 (as per grouping step) for defense in depth.

Apply this diff:

 export const inputSchema = z.object({
   cxId: z.string(),
   fhirToCsvJobId: z.string(),
   mergeCsvJobId: z.string(),
-  patientIds: z.array(z.string()),
-  targetGroupSizeMB: z.number(),
+  patientIds: z.array(z.string()).min(1, "At least one patientId is required"),
+  targetGroupSizeMB: z.number().positive(),
 });

If you want to enforce the 1,200 cap here too:

-  patientIds: z.array(z.string()).min(1, "At least one patientId is required"),
+  patientIds: z.array(z.string()).min(1).max(1200),
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled
  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 85b6fad and 665c14b.

📒 Files selected for processing (8)
  • packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts (1 hunks)
  • packages/core/src/command/analytics-platform/merge-csvs/file-name.ts (1 hunks)
  • packages/core/src/command/analytics-platform/merge-csvs/index.ts (1 hunks)
  • packages/data-transformation/fhir-to-csv/src/utils/file.py (1 hunks)
  • packages/lambdas/src/analytics-platform/merge-csvs.ts (1 hunks)
  • packages/utils/src/analytics-platform/1-fhir-to-csv.ts (4 hunks)
  • packages/utils/src/analytics-platform/2-merge-csvs.ts (1 hunks)
  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
  • packages/utils/src/analytics-platform/2-merge-csvs.ts
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{js,jsx,ts,tsx}

📄 CodeRabbit Inference Engine (.cursorrules)

**/*.{js,jsx,ts,tsx}: Don’t use null inside the app, only on code interacting with external interfaces/services, like DB and HTTP; convert to undefined before sending inwards into the code
Use const whenever possible
Use async/await instead of .then()
Naming: classes, enums: PascalCase
Naming: constants, variables, functions: camelCase
Naming: file names: kebab-case
Naming: Don’t use negative names, like notEnabled, prefer isDisabled
If possible, use decomposing objects for function parameters
Prefer Nullish Coalesce (??) than the OR operator (||) when you want to provide a default value
Avoid creating arrow functions
Use truthy syntax instead of in - i.e., if (data.link) not if ('link' in data)
While handling errors, keep the stack trace around: if you create a new Error (e.g., MetriportError), make sure to pass the original error as the new one’s cause so the stack trace is available upstream.
max column length is 100 chars
multi-line comments use /** */
top-level comments go after the import (save pre-import to basic file header, like license)
move literals to constants declared after imports when possible

Files:

  • packages/core/src/command/analytics-platform/merge-csvs/file-name.ts
  • packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts
  • packages/lambdas/src/analytics-platform/merge-csvs.ts
  • packages/utils/src/analytics-platform/1-fhir-to-csv.ts
  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
**/*.{ts,tsx}

📄 CodeRabbit Inference Engine (.cursorrules)

Use types whenever possible

Files:

  • packages/core/src/command/analytics-platform/merge-csvs/file-name.ts
  • packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts
  • packages/lambdas/src/analytics-platform/merge-csvs.ts
  • packages/utils/src/analytics-platform/1-fhir-to-csv.ts
  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
**/*.ts

⚙️ CodeRabbit Configuration File

**/*.ts: - Use the Onion Pattern to organize a package's code in layers

  • Try to use immutable code and avoid sharing state across different functions, objects, and systems
  • Try to build code that's idempotent whenever possible
  • Prefer functional programming style functions: small, deterministic, 1 input, 1 output
  • Minimize coupling / dependencies
  • Avoid modifying objects received as parameter
  • Only add comments to code to explain why something was done, not how it works
  • Naming
    • classes, enums: PascalCase
    • constants, variables, functions: camelCase
    • file names: kebab-case
    • table and column names: snake_case
    • Use meaningful names, so whoever is reading the code understands what it means
    • Don’t use negative names, like notEnabled, prefer isDisabled
    • For numeric values, if the type doesn’t convey the unit, add the unit to the name
  • Typescript
    • Use types
    • Prefer const instead of let
    • Avoid any and casting from any to other types
    • Type predicates: only applicable to narrow down the type, not to force a complete type conversion
    • Prefer deconstructing parameters for functions instead of multiple parameters that might be of
      the same type
    • Don’t use null inside the app, only on code interacting with external interfaces/services,
      like DB and HTTP; convert to undefined before sending inwards into the code
    • Use async/await instead of .then()
    • Use the strict equality operator ===, don’t use abstract equality operator ==
    • When calling a Promise-returning function asynchronously (i.e., not awaiting), use .catch() to
      handle errors (see processAsyncError and emptyFunction depending on the case)
    • Date and Time
      • Always use buildDayjs() to create dayjs instances
      • Prefer dayjs.duration(...) to create duration consts and keep them as duration
  • Prefer Nullish Coalesce (??) than the OR operator (||) to provide a default value
  • Avoid creating arrow functions
  • U...

Files:

  • packages/core/src/command/analytics-platform/merge-csvs/file-name.ts
  • packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts
  • packages/lambdas/src/analytics-platform/merge-csvs.ts
  • packages/utils/src/analytics-platform/1-fhir-to-csv.ts
  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
🧠 Learnings (8)
📓 Common learnings
Learnt from: leite08
PR: metriport/metriport#4255
File: packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts:19-20
Timestamp: 2025-07-27T00:40:32.149Z
Learning: User leite08 acknowledged the duplicate inputBundle spread issue in packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts and will fix it in a follow-up PR rather than in the current patch release PR #4255.
Learnt from: leite08
PR: metriport/metriport#4013
File: packages/core/src/fhir-to-cda/cda-templates/components/procedures.ts:104-105
Timestamp: 2025-06-12T22:53:09.606Z
Learning: User leite08 prefers responses in English only.
Learnt from: leite08
PR: metriport/metriport#3991
File: packages/core/src/external/aws/s3.ts:621-630
Timestamp: 2025-06-10T22:20:21.203Z
Learning: User leite08 prefers that all review comments are written in English.
📚 Learning: 2025-05-28T19:23:20.179Z
Learnt from: keshavsaharia
PR: metriport/metriport#3885
File: packages/core/src/external/sftp/surescripts/client.ts:1-134
Timestamp: 2025-05-28T19:23:20.179Z
Learning: In packages/core/src/external/sftp/surescripts/client.ts, the standalone getPatientLoadFileName function intentionally omits the transmission ID from the filename, which differs from the class method getPatientLoadFileName. This difference in filename generation is expected behavior.

Applied to files:

  • packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts
📚 Learning: 2025-05-20T21:26:26.804Z
Learnt from: leite08
PR: metriport/metriport#3814
File: packages/api/src/routes/internal/medical/patient-consolidated.ts:141-174
Timestamp: 2025-05-20T21:26:26.804Z
Learning: The functionality introduced in packages/api/src/routes/internal/medical/patient-consolidated.ts is planned to be refactored in downstream PR #3857, including improvements to error handling and validation.

Applied to files:

  • packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts
📚 Learning: 2025-07-18T09:33:29.581Z
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/api/src/routes/internal/analytics-platform/index.ts:28-28
Timestamp: 2025-07-18T09:33:29.581Z
Learning: In packages/api/src/routes/internal/analytics-platform/index.ts, the patientId parameter handling is intentionally inconsistent: the `/fhir-to-csv` and `/fhir-to-csv/transform` endpoints require patientId (using getFromQueryOrFail) for single patient processing, while the `/fhir-to-csv/batch` endpoint treats patientId as optional (using getFromQuery) because it's designed to run over all patients when no specific patient ID is provided.

Applied to files:

  • packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts
  • packages/utils/src/analytics-platform/1-fhir-to-csv.ts
📚 Learning: 2025-03-11T20:42:46.516Z
Learnt from: thomasyopes
PR: metriport/metriport#3427
File: packages/core/src/external/ehr/api/sync-patient.ts:16-55
Timestamp: 2025-03-11T20:42:46.516Z
Learning: In the patient synchronization architecture, the flow follows this pattern: (1) `ehr-sync-patient-cloud.ts` sends messages to an SQS queue, (2) the `ehr-sync-patient` Lambda consumes these messages, and (3) the Lambda uses the `syncPatient` function to make the API calls to process the patient data.

Applied to files:

  • packages/utils/src/analytics-platform/1-fhir-to-csv.ts
📚 Learning: 2025-08-15T03:55:25.167Z
Learnt from: leite08
PR: metriport/metriport#4350
File: packages/core/src/command/analytics-platform/merge-csvs/index.ts:336-363
Timestamp: 2025-08-15T03:55:25.167Z
Learning: For bulk upload operations (non-transactional flows), logging error details is sufficient and detailed error aggregation in thrown exceptions is not necessary, unlike transactional flows where detailed error information in exceptions may be more important.

Applied to files:

  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
📚 Learning: 2025-04-24T02:02:14.646Z
Learnt from: leite08
PR: metriport/metriport#3412
File: packages/core/src/command/patient-import/steps/result/patient-import-result-cloud.ts:16-18
Timestamp: 2025-04-24T02:02:14.646Z
Learning: When logging payloads, it's acceptable to include them in logs if they only contain non-sensitive identifiers (like IDs) and are small enough to not create multi-line logs.

Applied to files:

  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
📚 Learning: 2025-07-27T00:40:32.149Z
Learnt from: leite08
PR: metriport/metriport#4255
File: packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts:19-20
Timestamp: 2025-07-27T00:40:32.149Z
Learning: User leite08 acknowledged the duplicate inputBundle spread issue in packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts and will fix it in a follow-up PR rather than in the current patch release PR #4255.

Applied to files:

  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
🧬 Code Graph Analysis (4)
packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts (1)
packages/shared/src/index.ts (1)
  • MetriportError (43-43)
packages/lambdas/src/analytics-platform/merge-csvs.ts (5)
packages/shared/src/common/env-var.ts (1)
  • getEnvVarOrFail (14-20)
packages/core/src/command/analytics-platform/merge-csvs/index.ts (2)
  • GroupAndMergeCSVsParams (24-33)
  • groupAndMergeCSVs (64-113)
packages/lambdas/src/shared/capture.ts (1)
  • capture (19-118)
packages/lambdas/src/shared/sqs.ts (1)
  • getSingleMessageOrFail (62-83)
packages/lambdas/src/shared/parse-body.ts (1)
  • parseBody (43-52)
packages/utils/src/analytics-platform/1-fhir-to-csv.ts (6)
packages/shared/src/common/date.ts (1)
  • buildDayjs (93-95)
packages/shared/src/common/env-var.ts (1)
  • getEnvVarOrFail (14-20)
packages/shared/src/index.ts (1)
  • sleep (13-13)
packages/core/src/util/log.ts (1)
  • out (30-35)
packages/utils/src/patient/get-ids.ts (1)
  • getAllPatientIds (22-29)
packages/utils/src/shared/duration.ts (1)
  • elapsedTimeAsStr (8-13)
packages/core/src/command/analytics-platform/merge-csvs/index.ts (6)
packages/shared/src/util/uuid-v7.ts (1)
  • uuidv4 (384-386)
packages/core/src/util/log.ts (1)
  • out (30-35)
packages/mllp-server/src/utils.ts (1)
  • s3Utils (18-18)
packages/core/src/external/aws/s3.ts (2)
  • S3Utils (140-600)
  • executeWithRetriesS3 (94-111)
packages/core/src/command/analytics-platform/merge-csvs/file-name.ts (3)
  • buildMergeCsvsJobPrefix (5-7)
  • buildMergeInfoPrefix (9-11)
  • buildMergeCsvsFileGroupKey (13-24)
packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts (2)
  • buildFhirToCsvPatientPrefix (11-21)
  • parseTableNameFromFhirToCsvFileKey (23-27)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: Analyze (javascript)
🔇 Additional comments (25)
packages/data-transformation/fhir-to-csv/src/utils/file.py (1)

6-7: LGTM! Path format aligns with TypeScript utilities.

The updated path format using labeled segments (cx=, f2c=, pt=) is consistent with the TypeScript key-building utilities introduced elsewhere in the PR. This provides better structure and parsability for the analytics pipeline.

packages/core/src/command/analytics-platform/merge-csvs/file-name.ts (1)

1-24: Well-structured key-building utilities for analytics pipeline.

The hierarchical key structure and consistent parameter destructuring follow good TypeScript patterns. The utilities provide a clean API for constructing S3 keys used throughout the merge CSV workflow.

packages/utils/src/analytics-platform/1-fhir-to-csv.ts (9)

11-11: New imports are appropriate for the enhanced workflow.

The addition of fs for file writing and buildDayjs for consistent date handling aligns well with the updated functionality.

Also applies to: 16-16


21-30: Updated documentation clearly reflects the new JSON-to-CSV workflow.

The comments accurately describe the Lambda+SQS-based approach and provide clear usage instructions.


38-38: Job ID generation is appropriate for batch processing.

Using a timestamp-based job ID ensures uniqueness across runs while maintaining readability for debugging purposes.


43-45: Good addition of region-based AWS configuration.

Extracting the region from environment variables and using it to initialize the SQS client follows AWS best practices.


53-53: Enhanced logging improves observability.

The timestamp logging at start and the job ID in progress messages provide better traceability for debugging and monitoring.

Also applies to: 63-63


83-86: Progress tracking enhances user experience.

The progress logging every 100 patients provides good visibility into the batch processing status without overwhelming the logs.


101-103: Improved error handling with file output.

Writing failed patient IDs to a file with the job ID is much more practical than console logging for batch processing scenarios.


106-111: Comprehensive completion logging.

The final log messages provide all necessary information for monitoring and debugging, including elapsed time and job ID for correlation.


122-123: Clear messaging about the workflow change.

The updated confirmation message accurately reflects the JSON-to-CSV conversion process.

packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts (3)

1-1: Import is correctly scoped to error handling needs.

The MetriportError import is only used in the parsing function where error handling is needed.


3-21: Well-structured hierarchical key builders.

The three key-building functions follow a logical hierarchy and use consistent parameter destructuring patterns. The composition approach (building upon previous functions) is clean and maintainable.


29-37: Patient ID parsing includes proper error handling.

The function correctly handles edge cases (trailing slashes) and throws meaningful errors with context when parsing fails. This is much more robust than the table name parsing.

packages/core/src/command/analytics-platform/merge-csvs/index.ts (9)

1-16: Well-organized imports with clear dependencies.

The imports are logically grouped and all dependencies are appropriate for the CSV merging functionality.


17-33: Constants and types provide good API structure.

The exported constant and type definitions create a clean public API. The GroupAndMergeCSVsParams type is well-structured with all necessary parameters.


35-55: Clear interface definitions support the workflow.

The FileInfo and MergeResult interfaces are well-designed with appropriate fields. The internal FileGroup interface properly encapsulates grouping logic.


64-113: Main orchestration function is well-structured.

The groupAndMergeCSVs function follows a clear flow: setup, metadata storage, file listing, grouping, and merging. The logging provides good observability throughout the process.


115-151: Metadata storage functions enhance debuggability.

Both storeInputParams and storeFileGroups provide valuable debugging information by persisting run metadata to S3. The JSON formatting is appropriate for human readability.

Also applies to: 153-171


176-239: File listing logic is efficient and well-implemented.

The concurrent per-patient listing approach with jitter is a good optimization. The file filtering logic (size > 0, not ending with "/") and deduplication are appropriate safeguards.


244-320: File grouping algorithm is well-designed.

The grouping logic properly handles both large files (as singletons) and smaller files (with balanced distribution). The greedy algorithm with size-based sorting should produce reasonably balanced groups.


325-373: Merge orchestration includes appropriate error handling.

The concurrent merging with retries and error collection provides good resilience. Based on the learned context, logging error details is sufficient for bulk operations.


378-460: File group merging implementation is memory-efficient.

The streaming approach using gzip compression and sequential file processing is well-designed to avoid memory issues. The progress logging every 50 files provides good visibility.

packages/lambdas/src/analytics-platform/merge-csvs.ts (2)

1-12: Solid wiring and dependency usage

Good use of the core API and lambdas shared helpers (capture, parseBody, getSingleMessageOrFail). Imports look correct and minimal.


26-27: SQS batchSize=1 for MergeCsvs Lambda Confirmed

The CDK stack already configures the MergeCsvs event source with batchSize: 1, so no changes are needed:

  • packages/infra/lib/analytics-platform/analytics-platform-stack.ts (settings().mergeCsvs.eventSource):
    • batchSize: 1 (lines 76–78)

@leite08 leite08 force-pushed the eng-743-bulk-ingest branch from 665c14b to 0743148 Compare August 19, 2025 23:02
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (9)
packages/core/src/external/aws/lambda.ts (3)

1-1: Use NodeHttpHandler in AWS SDK v3; httpOptions is ignored in v3

httpOptions isn’t a valid config for v3 clients and will be silently ignored. Provide a custom requestHandler with NodeHttpHandler to enforce timeouts.

Apply this diff:

 import { InvokeCommandOutput, LambdaClient as LambdaClientV3 } from "@aws-sdk/client-lambda";
+import { NodeHttpHandler } from "@smithy/node-http-handler";
@@
-export function makeLambdaClientV3(region: string, timeoutInMillis?: number): LambdaClientV3 {
-  const lambdaClient = new LambdaClientV3({
-    region,
-    ...(timeoutInMillis ? { httpOptions: { timeout: timeoutInMillis } } : {}),
-  });
-  return lambdaClient;
-}
+export function makeLambdaClientV3(region: string, timeoutInMillis?: number): LambdaClientV3 {
+  const requestHandler = timeoutInMillis
+    ? new NodeHttpHandler({
+        connectionTimeout: timeoutInMillis,
+        socketTimeout: timeoutInMillis,
+      })
+    : undefined;
+
+  return new LambdaClientV3({
+    region,
+    ...(requestHandler ? { requestHandler } : {}),
+  });
+}

Also applies to: 27-33


97-106: Decode v3 Payload (Uint8Array) before JSON.parse

toString() on a Uint8Array won’t decode UTF-8 and can yield incorrect strings. Decode the bytes first.

 export function getLambdaErrorV3(result: InvokeCommandOutput): LambdaError | undefined {
   if (!result.Payload) return undefined;
   if (!isLambdaErrorV3(result)) return undefined;
-  const response = JSON.parse(result.Payload.toString());
+  const payloadStr = Buffer.from(result.Payload as Uint8Array).toString("utf-8");
+  const response = JSON.parse(payloadStr);
   return {
     errorType: response.errorType,
     errorMessage: response.errorMessage,
     log: logResultToString(result.LogResult),
   };
 }

223-250: Return decoded v3 payload string; avoid .toString() on Uint8Array

Ensure UTF-8 decoding before returning.

   if (isLambdaErrorV3(result)) {
@@
     throw new MetriportError(msg, undefined, { lambdaName, errorDetails });
   }
-  return result.Payload.toString();
+  return Buffer.from(result.Payload as Uint8Array).toString("utf-8");
 }

Optional cleanup (centralize decoding to avoid duplication with getLambdaErrorV3):

// helper (place near other utils in this file)
function decodePayload(payload: Uint8Array | string): string {
  return typeof payload === "string" ? payload : Buffer.from(payload).toString("utf-8");
}

// usage:
const payloadStr = decodePayload(result.Payload as Uint8Array | string);
packages/utils/src/analytics-platform/util/check-merge-groups.ts (3)

249-254: Fix inconsistent script name in usage message

The usage text still references check-duplicate-keys-simple.ts; update to check-merge-groups.ts.

-      console.log("Usage: ts-node check-duplicate-keys-simple.ts <folder-path>");
-      console.log("Example: ts-node check-duplicate-keys-simple.ts /path/to/your/folder");
+      console.log("Usage: ts-node check-merge-groups.ts <folder-path>");
+      console.log("Example: ts-node check-merge-groups.ts /path/to/your/folder");
       console.log("\nOr use current directory:");
-      console.log("ts-node check-duplicate-keys-simple.ts .");
+      console.log("ts-node check-merge-groups.ts .");

93-103: Deduplicate per-key file list before counting duplicates (fix false positives)

Current logic uses raw occurrences (files.length), inflating counts when the same key repeats within a single groups file. Count unique files and use that for the duplicate condition and count.

   private findDuplicates(): void {
     this.keyMap.forEach((files, key) => {
-      if (files.length > 1) {
-        this.duplicateKeys.push({
-          key,
-          files: [...new Set(files)], // Remove duplicate file references
-          count: files.length,
-        });
-      }
+      const uniqueFiles = [...new Set(files)];
+      if (uniqueFiles.length > 1) {
+        this.duplicateKeys.push({
+          key,
+          files: uniqueFiles,
+          count: uniqueFiles.length,
+        });
+      }
     });
   }

190-194: Export unique-file counts in JSON (align with detection semantics)

allKeys[].count should reflect the number of unique group files per key, not raw occurrences.

-      allKeys: Array.from(this.keyMap.entries()).map(([key, files]) => ({
-        key,
-        files: [...new Set(files)],
-        count: files.length,
-      })),
+      allKeys: Array.from(this.keyMap.entries()).map(([key, files]) => {
+        const uniqueFiles = [...new Set(files)];
+        return {
+          key,
+          files: uniqueFiles,
+          count: uniqueFiles.length,
+        };
+      }),
packages/utils/src/fhir/bundle/find_with_custom_filter.sh (2)

97-110: Use jq’s exit code instead of string comparison; fix SC2155

This was previously suggested and acknowledged as a working utility. Keeping a record here since ShellCheck flags SC2155 and comparing literal “true” is brittle. If you prefer to leave as-is, feel free to ignore.

Apply this diff:

 # Function to check if a file matches the custom filter
 check_file_with_filter() {
     local file="$1"
     local filter="$2"
-    
-    # Apply the custom jq filter to the file
-    local result=$(jq -e "$filter" "$file" 2>/dev/null)
-    
-    if [ "$result" = "true" ]; then
-        return 0  # File matches the filter
-    fi
-    
-    return 1  # File doesn't match the filter
+    # Apply the custom jq filter; truthy -> exit 0, false/null -> non-zero
+    if jq -e "$filter" "$file" >/dev/null 2>&1; then
+        return 0
+    fi
+    return 1
 }

112-118: Make file discovery and iteration NUL-safe; avoid word-splitting issues

Prior feedback suggested switching to -print0 plus a NUL-delimited read loop to handle filenames with spaces/newlines. If you’re intentionally keeping it simple for a utility, you can skip; otherwise, this improves robustness.

Apply this diff:

-# Find all JSON files in the directory and subdirectories up to maxdepth
-json_files=$(find "$DIRECTORY" -maxdepth "$MAXDEPTH" -name "*.json" -type f | sort)
-
-if [ -z "$json_files" ]; then
-    echo "No JSON files found in directory: $DIRECTORY"
-    exit 0
-fi
+# Counter to detect if any JSON files exist
+scanned_count=0
@@
-# Process each JSON file
-for file in $json_files; do
+# Process each JSON file (NUL-delimited to be safe with spaces/newlines)
+while IFS= read -r -d '' file; do
+    scanned_count=$((scanned_count + 1))
@@
-done
+done < <(find "$DIRECTORY" -maxdepth "$MAXDEPTH" -type f -name "*.json" -print0 | sort -z)
+
+if [ "$scanned_count" -eq 0 ]; then
+    echo "No JSON files found in directory: $DIRECTORY"
+    exit 0
+fi

Also applies to: 129-147

packages/utils/src/analytics-platform/2-merge-csvs.ts (1)

85-96: Fail fast when no patient IDs are found for the job

This was previously suggested: continuing with an empty set can hide misconfigurations (wrong prefix) and waste time. Throw early with a descriptive message.

Apply this diff:

   const uniquePatientIds = [...new Set(patientsToMerge)];
   const patientIdChunks = chunk(uniquePatientIds, maxPatientCountPerLambda);
 
+  if (uniquePatientIds.length === 0) {
+    throw new Error(
+      `No patient IDs found for fhirToCsvJobId=${fhirToCsvJobId} ` +
+      `at prefix=${buildFhirToCsvJobPrefix({ cxId, jobId: fhirToCsvJobId })}`
+    );
+  }
🧹 Nitpick comments (19)
packages/core/src/external/aws/lambda.ts (3)

66-70: Docs mismatch the function behavior

The JSDoc says “Checks whether the lambda invocation was successful”, but the function returns true when there is an error (FunctionError present). Update the wording to avoid confusion.

-/**
- * Checks whether the lambda invocation was successful.
- */
+/**
+ * Checks whether the lambda invocation returned an error (FunctionError is set).
+ */
 export function isLambdaErrorV3(result: InvokeCommandOutput): boolean {
   return result.FunctionError !== undefined;
 }

196-209: Tighten overload discriminant for failGracefully

Using boolean | false collapses to boolean, so the overload doesn’t properly discriminate return types. Use the literal type false in the first signature.

 export function getLambdaResultPayloadV3(params: {
   result: InvokeCommandOutput;
   lambdaName?: string;
-  failGracefully?: boolean | false;
+  failGracefully?: false;
   failOnEmptyResponse?: boolean | false;
   log?: typeof console.log;
 }): string;
 export function getLambdaResultPayloadV3(params: {
   result: InvokeCommandOutput;
   lambdaName?: string;
   failGracefully: true;
   failOnEmptyResponse?: boolean | false;
   log?: typeof console.log;
 }): string | undefined;

213-216: Use a distinct log prefix for the v3 path

Helps trace logs to the v3 helper.

 export function getLambdaResultPayloadV3({
   result,
   lambdaName = "<unknown-name>",
   failGracefully = false,
   failOnEmptyResponse = true,
-  log = out("getLambdaResultPayload").log,
+  log = out("getLambdaResultPayloadV3").log,
packages/utils/src/analytics-platform/3.readme.md (1)

1-1: Tighten grammar and clarity in the sentence.

Replace the hyphenated “sub-folder” and split into two sentences for readability.

Apply this diff:

-The 3rd step is platform-specific, check the respective sub-folder for a script starting with "3-".
+The 3rd step is platform-specific. Check the respective subfolder for a script starting with "3-".
packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts (5)

23-39: Fix typos and align the docstring with the actual behavior (no max-rows arg, no patientIds/includeDeletes).

Documentation mentions parameters/features not present in the script and has a typo (“buy” → “by”). Align with the current S3 layout and usage.

Apply this diff:

- * It relies on the mergeCsvJobId to be passed as a parameter. This ID is determined buy the 2nd step in
- * the flow, the script `2-merge-csvs.ts`.
+ * It relies on the mergeCsvJobId to be passed as a parameter. This ID is determined by the 2nd step
+ * in the flow (`2-merge-csvs.ts`).
  *
- * It relies on the merged CSV files to be on the configured bucket, with the following structure:
- *
- * ./snowflake/merged/cxId/patientId/jobId/_tmp_fhir-to-csv_output_cxId_patientId_resourceType.csv
- *
- * As a required runtime parameter, it needs the max number of rows to be used from each file found on S3.
- *
- * Set these constants based on the output from the merge-csvs lambda:
- * - patientIds
- * - mergeCsvJobId
- *
- * Set this constant to determine whether to include deletes while updating Snowflake, to strain
- * transaction control:
- * - includeDeletes
+ * It expects the merged CSV files to be in the configured S3 bucket under:
+ *   s3://<bucket>/${buildMergeCsvsJobPrefix({ cxId, jobId: mergeCsvJobId })}
+ * Files are organized in per-resource-type folders (e.g., .../observation/, .../patient/).

49-55: Prefer stderr + exit code for missing CLI arg (cleaner UX than throwing).

Throwing prints a stack trace which is noisy for a simple usage error. Emitting to stderr and exiting with code 1 is cleaner.

Apply this diff:

 const mergeCsvJobId = process.argv[2];
 if (!mergeCsvJobId) {
-  console.log(
-    "Usage: ts-node src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts <mergeCsvJobId>"
-  );
-  throw new Error("mergeCsvJobId is required");
+  console.error("Error: mergeCsvJobId is required");
+  console.error(
+    "Usage: ts-node src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts <mergeCsvJobId>"
+  );
+  process.exit(1);
 }

103-105: Short-circuit when no S3 files are found to avoid unnecessary Snowflake connection.

Minor optimization: skip connecting when there’s nothing to ingest.

Apply this diff:

     const files = await s3Utils.listObjects(bucketName, prefixName);
     console.log(`Found ${files.length} files in ${prefixUrl}`);
+    if (files.length === 0) {
+      console.log(`No files found under ${prefixUrl} - nothing to ingest`);
+      return;
+    }

156-159: Avoid false positives when checking for resource files (guard against prefix collisions).

Using includes(resourceType) can match superset names (e.g., condition vs condition_code_coding). Check for the folder delimiter to ensure we only process when the specific resource folder exists.

Apply this diff:

-  if (!files.some(file => file.Key?.includes(resourceType))) {
+  if (!files.some(file => file.Key && file.Key.includes(`/${resourceType}/`))) {
     console.log(`>>> No files found for ${resourceType} - skipping`);
     return;
   }

221-221: Quote column identifiers to avoid issues with reserved words or special characters.

INI keys are assumed to be valid unquoted identifiers. Quoting is safer and future-proof.

Apply this diff:

-    columnDefs[resourceType] = columns.map(column => `${column} VARCHAR`).join(", ");
+    columnDefs[resourceType] = columns
+      .map(col => `"${col.replace(/"/g, '""')}" VARCHAR`)
+      .join(", ");
packages/core/src/command/analytics-platform/merge-csvs/__tests__/index.test.ts (2)

1-1: Avoid file-wide ESLint disable for non-null assertions in tests

Instead of a global disable, guard the specific variables before use. This keeps the rule active and avoids masking future issues.

Apply a minimal refactor around the two non-null usages:

-/* eslint-disable @typescript-eslint/no-non-null-assertion */

For the large file assertion (around lines 102–107):

-      expect(largeFileGroup).toBeDefined();
-      expect(largeFileGroup!.totalSize).toBe(largeFileSize);
+      if (!largeFileGroup) {
+        throw new Error("Expected a group containing the large file");
+      }
+      expect(largeFileGroup.totalSize).toBe(largeFileSize);

For the “very large target” scenario (around lines 211–214):

-      expect(groups[0]!.files).toHaveLength(2);
+      const first = groups[0];
+      if (!first) throw new Error("Expected at least one group");
+      expect(first.files).toHaveLength(2);

7-13: Prefer using MB_IN_BYTES constant to avoid magic numbers (1024 * 1024)

Since MB_IN_BYTES is exported, using it improves readability and avoids repeated literals.

Example import adjustment:

-import { FileInfo, groupFilesByTypeAndSize } from "../index";
+import { MB_IN_BYTES, FileInfo, groupFilesByTypeAndSize } from "../index";

Example usage:

-{ key: "path/patient1/condition.csv", size: 50 * 1024 * 1024, tableName: "condition" }
+{ key: "path/patient1/condition.csv", size: 50 * MB_IN_BYTES, tableName: "condition" }

If you prefer minimal churn, this can be deferred.

Also applies to: 30-33, 46-51, 133-138, 151-156, 248-254, 265-270

packages/utils/src/analytics-platform/util/check-merge-groups.ts (2)

68-85: Avoid inflating counts when the same key appears multiple times in one groups.json

Even with downstream dedup, it’s cleaner to record each (key, groups-file) pair once. Deduplicate keys within a single file before updating the map.

   private processFile(filePath: string): void {
     try {
       console.log(`Processing: ${path.basename(filePath)}`);

       const content = fs.readFileSync(filePath, "utf-8");
       const data: GroupEntry[] = JSON.parse(content);

-      data.forEach(group => {
-        group.files.forEach(file => {
-          const key = file.key;
-          if (!this.keyMap.has(key)) {
-            this.keyMap.set(key, []);
-          }
-          // eslint-disable-next-line @typescript-eslint/no-non-null-assertion
-          this.keyMap.get(key)!.push(path.basename(filePath));
-        });
-      });
+      const fileBase = path.basename(filePath);
+      const seenKeysInThisFile = new Set<string>();
+
+      data.forEach(group => {
+        group.files.forEach(file => {
+          const key = file.key;
+          if (seenKeysInThisFile.has(key)) return;
+          seenKeysInThisFile.add(key);
+
+          if (!this.keyMap.has(key)) {
+            this.keyMap.set(key, []);
+          }
+          // eslint-disable-next-line @typescript-eslint/no-non-null-assertion
+          this.keyMap.get(key)!.push(fileBase);
+        });
+      });
     } catch (error) {
       console.error(`Error processing ${filePath}:`, error);
     }
   }

108-133: Optional: Exit with non-zero code when duplicates are found (CI-friendly)

If you intend to wire this into CI, failing the script on duplicates improves automation. You can gate on this.duplicateKeys.length.

Example approach (after generateReport/exportResults):

   // Export results
   this.exportResults();
+
+  // Indicate failure via exit code when duplicates are found (optional)
+  if (this.duplicateKeys.length > 0) {
+    process.exitCode = 2;
+  }

If desired, I can provide a follow-up patch that adds a --fail-on-duplicates flag instead.

Also applies to: 176-200

packages/utils/src/fhir/bundle/find_with_custom_filter.sh (3)

27-44: Exit with code 0 on --help; keep “usage” reusable for error paths

Help should not signal an error. Make usage accept an exit code, defaulting to 1, and call it with 0 for -h/--help.

Apply this diff:

-show_usage() {
-    echo "Usage: $0 -d <directory> -f <jq_filter> [--maxdepth <depth>]"
+show_usage() {
+    local exit_code="${1:-1}"
+    echo "Usage: $0 -d <directory> -f <jq_filter> [--maxdepth <depth>]"
@@
-    echo "  $0 --directory . --filter '[.entry[] | select(.resource.resourceType == \"Encounter\")] | length > 0' --maxdepth 5"
-    exit 1
+    echo "  $0 --directory . --filter '[.entry[] | select(.resource.resourceType == \"Encounter\")] | length > 0' --maxdepth 5"
+    exit "$exit_code"
 }
@@
-        -h|--help)
-            show_usage
+        -h|--help)
+            show_usage 0

Also applies to: 66-67


1-2: Harden the script with strict mode

Enable strict-er bash options to fail fast on errors.

Apply this diff:

 #!/bin/bash
+
+set -euo pipefail

112-114: Ensure cross-platform compatibility for find on BSD/macOS

On macOS’s default /usr/bin/find, the -maxdepth flag isn’t supported. To keep this script portable, detect GNU find (installed as gfind via Homebrew) or fall back to a -prune-based approach.

• File needing update:
packages/utils/src/fhir/bundle/find_with_custom_filter.sh

• Replace the direct -maxdepth invocation with one of the following patterns:

  1. Detect GNU find
+# Prefer GNU find if available
+if command -v gfind >/dev/null 2>&1; then
+  FIND_CMD=gfind
+else
+  FIND_CMD=find
+fi
 json_files=$(
-  find "$DIRECTORY" -maxdepth "$MAXDEPTH" -name "*.json" -type f \
+  $FIND_CMD "$DIRECTORY" -maxdepth "$MAXDEPTH" -name "*.json" -type f \
    | sort
 )
  1. Portable -prune fallback (no -maxdepth)
json_files=$(
  find "$DIRECTORY" -type d \
    \( $(printf '%s -o ' "$(printf -- "-path %q/%s/*" "$DIRECTORY" "$(printf '/..%.0s' $(seq 1 $MAXDEPTH))")") -prune \) \
    -o -type f -name '*.json' -print \
  | sort
)

Choose the approach that best fits your environment and team’s requirements.

packages/utils/src/analytics-platform/2-merge-csvs.ts (2)

26-38: Minor grammar in script header

“invoke” → “invokes”, and a couple of phrasing tweaks.

Apply this diff:

- * This script invoke the lambda that merges the patients' CSVs files into single files.
+ * This script invokes the Lambda that merges patients’ CSV files into single files.
@@
- * It needs to be run after the lambdas that convert the FHIR data to CSV are done
+ * Run it after the Lambdas that convert FHIR data to CSV are done
@@
- * We need the output `fhirToCsvJobId` of that script as an argument.
+ * It requires the `fhirToCsvJobId` output from that script as an argument.

126-128: Clarify log message: index is a chunk index, not a message count

The current text reads as if multiple messages were put; it’s one message per chunk. Make the wording explicit and show progress.

Apply this diff:

-        log(
-          `>>> Put ${index} messages on queue (${amountOfPatientsProcessed}/${uniquePatientIds.length} patients)`
-        );
+        log(
+          `>>> Put message ${index + 1}/${patientIdChunks.length} on queue ` +
+          `(${amountOfPatientsProcessed}/${uniquePatientIds.length} patients)`
+        );
packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts (1)

23-27: Avoid magic indexes when parsing table name; use a defensive regex

Relying on split("/")[5] and slice(6) is brittle if file naming or path depth changes. A targeted regex is more maintainable and self-explanatory.

Apply this diff:

 export function parseTableNameFromFhirToCsvFileKey(key: string): string {
-  // e.g.: snowflake/fhir-to-csv/cx=cx-id/f2c=2025-08-08T02-18-56/pt=patient-id/_tmp_fhir-to-csv_output_cx-id_patient-id_condition.csv
-  const tableName = key.split("/")[5]?.split("_").slice(6).join("_").split(".")[0] || "";
-  return tableName;
+  // e.g.: .../pt=patient-id/_tmp_fhir-to-csv_output_<cxId>_<patientId>_<table>.csv
+  const match = key.match(/_tmp_fhir-to-csv_output_[^_]+_[^_]+_(?<table>[^/.]+)\.csv$/);
+  return match?.groups?.table ?? "";
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled
  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 665c14b and 0743148.

⛔ Files ignored due to path filters (2)
  • package-lock.json is excluded by !**/package-lock.json
  • packages/lambdas/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (32)
  • packages/core/package.json (4 hunks)
  • packages/core/src/command/analytics-platform/fhir-to-csv/batch/fhir-to-csv.ts (0 hunks)
  • packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv-transform.ts (0 hunks)
  • packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts (1 hunks)
  • packages/core/src/command/analytics-platform/merge-csvs/__tests__/index.test.ts (1 hunks)
  • packages/core/src/command/analytics-platform/merge-csvs/file-name.ts (1 hunks)
  • packages/core/src/command/analytics-platform/merge-csvs/index.ts (1 hunks)
  • packages/core/src/external/aws/lambda-logic/README.md (1 hunks)
  • packages/core/src/external/aws/lambda.ts (7 hunks)
  • packages/core/src/external/aws/s3.ts (1 hunks)
  • packages/core/src/external/snowflake/commands.ts (1 hunks)
  • packages/core/src/util/config.ts (0 hunks)
  • packages/data-transformation/fhir-to-csv/main.py (2 hunks)
  • packages/data-transformation/fhir-to-csv/src/utils/file.py (1 hunks)
  • packages/infra/lib/analytics-platform/analytics-platform-stack.ts (8 hunks)
  • packages/infra/lib/analytics-platform/types.ts (1 hunks)
  • packages/infra/lib/api-stack.ts (1 hunks)
  • packages/infra/lib/api-stack/api-service.ts (0 hunks)
  • packages/infra/lib/shared/settings.ts (1 hunks)
  • packages/lambdas/package.json (3 hunks)
  • packages/lambdas/src/analytics-platform/fhir-to-csv.ts (0 hunks)
  • packages/lambdas/src/analytics-platform/merge-csvs.ts (1 hunks)
  • packages/utils/package.json (3 hunks)
  • packages/utils/src/analytics-platform/1-fhir-to-csv.ts (4 hunks)
  • packages/utils/src/analytics-platform/2-merge-csvs.ts (1 hunks)
  • packages/utils/src/analytics-platform/3.readme.md (1 hunks)
  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts (1 hunks)
  • packages/utils/src/analytics-platform/util/check-duplicate-patient-ids.ts (1 hunks)
  • packages/utils/src/analytics-platform/util/check-merge-groups.ts (1 hunks)
  • packages/utils/src/consolidated/bulk-recreate-consolidated.ts (4 hunks)
  • packages/utils/src/fhir/bundle/count_resources.sh (1 hunks)
  • packages/utils/src/fhir/bundle/find_with_custom_filter.sh (1 hunks)
💤 Files with no reviewable changes (5)
  • packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv-transform.ts
  • packages/core/src/command/analytics-platform/fhir-to-csv/batch/fhir-to-csv.ts
  • packages/core/src/util/config.ts
  • packages/infra/lib/api-stack/api-service.ts
  • packages/lambdas/src/analytics-platform/fhir-to-csv.ts
🚧 Files skipped from review as they are similar to previous changes (18)
  • packages/lambdas/package.json
  • packages/utils/package.json
  • packages/infra/lib/api-stack.ts
  • packages/utils/src/fhir/bundle/count_resources.sh
  • packages/lambdas/src/analytics-platform/merge-csvs.ts
  • packages/utils/src/analytics-platform/util/check-duplicate-patient-ids.ts
  • packages/core/src/external/snowflake/commands.ts
  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
  • packages/data-transformation/fhir-to-csv/src/utils/file.py
  • packages/core/src/external/aws/s3.ts
  • packages/core/src/command/analytics-platform/merge-csvs/file-name.ts
  • packages/core/package.json
  • packages/data-transformation/fhir-to-csv/main.py
  • packages/infra/lib/shared/settings.ts
  • packages/utils/src/analytics-platform/1-fhir-to-csv.ts
  • packages/utils/src/consolidated/bulk-recreate-consolidated.ts
  • packages/infra/lib/analytics-platform/analytics-platform-stack.ts
  • packages/infra/lib/analytics-platform/types.ts
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{js,jsx,ts,tsx}

📄 CodeRabbit Inference Engine (.cursorrules)

**/*.{js,jsx,ts,tsx}: Don’t use null inside the app, only on code interacting with external interfaces/services, like DB and HTTP; convert to undefined before sending inwards into the code
Use const whenever possible
Use async/await instead of .then()
Naming: classes, enums: PascalCase
Naming: constants, variables, functions: camelCase
Naming: file names: kebab-case
Naming: Don’t use negative names, like notEnabled, prefer isDisabled
If possible, use decomposing objects for function parameters
Prefer Nullish Coalesce (??) than the OR operator (||) when you want to provide a default value
Avoid creating arrow functions
Use truthy syntax instead of in - i.e., if (data.link) not if ('link' in data)
While handling errors, keep the stack trace around: if you create a new Error (e.g., MetriportError), make sure to pass the original error as the new one’s cause so the stack trace is available upstream.
max column length is 100 chars
multi-line comments use /** */
top-level comments go after the import (save pre-import to basic file header, like license)
move literals to constants declared after imports when possible

Files:

  • packages/core/src/command/analytics-platform/merge-csvs/__tests__/index.test.ts
  • packages/utils/src/analytics-platform/util/check-merge-groups.ts
  • packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts
  • packages/utils/src/analytics-platform/2-merge-csvs.ts
  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
  • packages/core/src/external/aws/lambda.ts
**/*.{ts,tsx}

📄 CodeRabbit Inference Engine (.cursorrules)

Use types whenever possible

Files:

  • packages/core/src/command/analytics-platform/merge-csvs/__tests__/index.test.ts
  • packages/utils/src/analytics-platform/util/check-merge-groups.ts
  • packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts
  • packages/utils/src/analytics-platform/2-merge-csvs.ts
  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
  • packages/core/src/external/aws/lambda.ts
**/*.ts

⚙️ CodeRabbit Configuration File

**/*.ts: - Use the Onion Pattern to organize a package's code in layers

  • Try to use immutable code and avoid sharing state across different functions, objects, and systems
  • Try to build code that's idempotent whenever possible
  • Prefer functional programming style functions: small, deterministic, 1 input, 1 output
  • Minimize coupling / dependencies
  • Avoid modifying objects received as parameter
  • Only add comments to code to explain why something was done, not how it works
  • Naming
    • classes, enums: PascalCase
    • constants, variables, functions: camelCase
    • file names: kebab-case
    • table and column names: snake_case
    • Use meaningful names, so whoever is reading the code understands what it means
    • Don’t use negative names, like notEnabled, prefer isDisabled
    • For numeric values, if the type doesn’t convey the unit, add the unit to the name
  • Typescript
    • Use types
    • Prefer const instead of let
    • Avoid any and casting from any to other types
    • Type predicates: only applicable to narrow down the type, not to force a complete type conversion
    • Prefer deconstructing parameters for functions instead of multiple parameters that might be of
      the same type
    • Don’t use null inside the app, only on code interacting with external interfaces/services,
      like DB and HTTP; convert to undefined before sending inwards into the code
    • Use async/await instead of .then()
    • Use the strict equality operator ===, don’t use abstract equality operator ==
    • When calling a Promise-returning function asynchronously (i.e., not awaiting), use .catch() to
      handle errors (see processAsyncError and emptyFunction depending on the case)
    • Date and Time
      • Always use buildDayjs() to create dayjs instances
      • Prefer dayjs.duration(...) to create duration consts and keep them as duration
  • Prefer Nullish Coalesce (??) than the OR operator (||) to provide a default value
  • Avoid creating arrow functions
  • U...

Files:

  • packages/core/src/command/analytics-platform/merge-csvs/__tests__/index.test.ts
  • packages/utils/src/analytics-platform/util/check-merge-groups.ts
  • packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts
  • packages/utils/src/analytics-platform/2-merge-csvs.ts
  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
  • packages/core/src/external/aws/lambda.ts
🧠 Learnings (33)
📓 Common learnings
Learnt from: leite08
PR: metriport/metriport#4255
File: packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts:19-20
Timestamp: 2025-07-27T00:40:32.149Z
Learning: User leite08 acknowledged the duplicate inputBundle spread issue in packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts and will fix it in a follow-up PR rather than in the current patch release PR #4255.
Learnt from: leite08
PR: metriport/metriport#4013
File: packages/core/src/fhir-to-cda/cda-templates/components/procedures.ts:104-105
Timestamp: 2025-06-12T22:53:09.606Z
Learning: User leite08 prefers responses in English only.
Learnt from: leite08
PR: metriport/metriport#3991
File: packages/core/src/external/aws/s3.ts:621-630
Timestamp: 2025-06-10T22:20:21.203Z
Learning: User leite08 prefers that all review comments are written in English.
📚 Learning: 2025-05-20T21:26:26.804Z
Learnt from: leite08
PR: metriport/metriport#3814
File: packages/api/src/routes/internal/medical/patient-consolidated.ts:141-174
Timestamp: 2025-05-20T21:26:26.804Z
Learning: The functionality introduced in packages/api/src/routes/internal/medical/patient-consolidated.ts is planned to be refactored in downstream PR #3857, including improvements to error handling and validation.

Applied to files:

  • packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts
  • packages/utils/src/analytics-platform/2-merge-csvs.ts
📚 Learning: 2025-07-18T09:33:29.581Z
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/api/src/routes/internal/analytics-platform/index.ts:28-28
Timestamp: 2025-07-18T09:33:29.581Z
Learning: In packages/api/src/routes/internal/analytics-platform/index.ts, the patientId parameter handling is intentionally inconsistent: the `/fhir-to-csv` and `/fhir-to-csv/transform` endpoints require patientId (using getFromQueryOrFail) for single patient processing, while the `/fhir-to-csv/batch` endpoint treats patientId as optional (using getFromQuery) because it's designed to run over all patients when no specific patient ID is provided.

Applied to files:

  • packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts
  • packages/utils/src/analytics-platform/2-merge-csvs.ts
📚 Learning: 2025-06-17T20:03:18.931Z
Learnt from: RamilGaripov
PR: metriport/metriport#3962
File: packages/api/src/sequelize/migrations/2025-06-17_00_create-cohort-tables.ts:50-56
Timestamp: 2025-06-17T20:03:18.931Z
Learning: In the Metriport codebase, the patient table ID is defined as DataTypes.STRING in the database schema, not UUID. Foreign key references to patient.id should use DataTypes.STRING to maintain consistency.

Applied to files:

  • packages/utils/src/analytics-platform/2-merge-csvs.ts
📚 Learning: 2025-07-27T00:40:32.149Z
Learnt from: leite08
PR: metriport/metriport#4255
File: packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts:19-20
Timestamp: 2025-07-27T00:40:32.149Z
Learning: User leite08 acknowledged the duplicate inputBundle spread issue in packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts and will fix it in a follow-up PR rather than in the current patch release PR #4255.

Applied to files:

  • packages/utils/src/analytics-platform/2-merge-csvs.ts
  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
  • packages/utils/src/fhir/bundle/find_with_custom_filter.sh
📚 Learning: 2025-04-16T00:25:25.196Z
Learnt from: RamilGaripov
PR: metriport/metriport#3676
File: packages/core/src/command/hl7v2-subscriptions/hl7v2-to-fhir-conversion/shared.ts:1-10
Timestamp: 2025-04-16T00:25:25.196Z
Learning: The circular dependency between shared.ts (importing getPatientIdsOrFail) and adt/utils.ts (using unpackPidFieldOrFail) will be addressed in a follow-up PR.

Applied to files:

  • packages/utils/src/analytics-platform/2-merge-csvs.ts
📚 Learning: 2025-07-16T01:38:14.080Z
Learnt from: leite08
PR: metriport/metriport#4194
File: packages/api/src/command/medical/patient/get-patient-facilities.ts:36-44
Timestamp: 2025-07-16T01:38:14.080Z
Learning: In packages/api/src/command/medical/patient/get-patient-facilities.ts, the patient.facilityIds field is strongly typed as array of strings (non-nullable), so null/undefined checks are not needed when accessing this property.

Applied to files:

  • packages/utils/src/analytics-platform/2-merge-csvs.ts
📚 Learning: 2025-05-30T01:11:35.300Z
Learnt from: keshavsaharia
PR: metriport/metriport#3885
File: packages/utils/src/surescripts/generate-patient-load-file.ts:92-94
Timestamp: 2025-05-30T01:11:35.300Z
Learning: In V0 of the Surescripts implementation (packages/utils/src/surescripts/generate-patient-load-file.ts), there is always only 1 facility ID, so the current facility_ids parsing logic with substring operations is sufficient and additional validation is not necessary.

Applied to files:

  • packages/utils/src/analytics-platform/2-merge-csvs.ts
📚 Learning: 2025-08-15T03:59:27.025Z
Learnt from: leite08
PR: metriport/metriport#4350
File: packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts:159-161
Timestamp: 2025-08-15T03:59:27.025Z
Learning: User leite08 prefers fail-fast behavior for bulk import scripts in Snowflake. When tables already exist, the script should fail rather than continue, as this prevents accidental duplicate data ingestion. The `IF NOT EXISTS` clause should be omitted in bulk import scenarios to ensure data integrity.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-04-05T16:02:22.473Z
Learnt from: leite08
PR: metriport/metriport#3611
File: packages/utils/src/carequality/download-directory.ts:68-82
Timestamp: 2025-04-05T16:02:22.473Z
Learning: Utility scripts in the packages/utils directory don't require comprehensive error handling like production code.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-05-27T23:51:45.100Z
Learnt from: leite08
PR: metriport/metriport#3857
File: packages/utils/src/consolidated/filter-consolidated.ts:39-68
Timestamp: 2025-05-27T23:51:45.100Z
Learning: Utility scripts in the packages/utils directory don't require comprehensive error handling or logging changes (like switching from console.log to out().log), as they are meant for local/developer environments, not production code.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-03-14T00:30:18.860Z
Learnt from: leite08
PR: metriport/metriport#3468
File: packages/utils/src/s3/list-objects.ts:17-28
Timestamp: 2025-03-14T00:30:18.860Z
Learning: Files under `./packages/utils` folder are utility scripts and don't require the same coding standards used in other packages (like error handling, structured logging, etc.).

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-04-24T02:17:02.661Z
Learnt from: leite08
PR: metriport/metriport#3412
File: packages/utils/src/patient-import/send-webhook.ts:41-55
Timestamp: 2025-04-24T02:17:02.661Z
Learning: Files under `packages/utils` don't need robust error handling as they are utility scripts meant for local/developer environments, not production code.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-05-30T14:19:02.018Z
Learnt from: leite08
PR: metriport/metriport#3924
File: packages/utils/src/open-search/ingest-patients.ts:73-74
Timestamp: 2025-05-30T14:19:02.018Z
Learning: Utility scripts in the codebase don't need to strictly follow all coding guidelines, particularly around logging practices like avoiding multi-line logs or objects as second parameters.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-03-14T00:31:01.986Z
Learnt from: leite08
PR: metriport/metriport#3468
File: packages/utils/src/s3/list-objects.ts:12-13
Timestamp: 2025-03-14T00:31:01.986Z
Learning: Files under the `./packages/utils` folder are utility scripts and don't require the same coding standards as other packages. They often contain placeholder values meant to be filled in by developers before use.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-05-31T23:09:05.757Z
Learnt from: lucasdellabella
PR: metriport/metriport#3941
File: packages/utils/src/hl7v2-notifications/unpack-patient-roster-identifier.ts:61-64
Timestamp: 2025-05-31T23:09:05.757Z
Learning: Utility scripts in the utils package may have relaxed coding standards compared to production code, particularly for error handling. When reviewing debugging scripts or one-off utilities, apply less stringent requirements for error handling patterns.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-07-27T23:42:34.014Z
Learnt from: RamilGaripov
PR: metriport/metriport#4259
File: packages/utils/src/document/reconversion-kickoff-loader.ts:78-81
Timestamp: 2025-07-27T23:42:34.014Z
Learning: For utility scripts in packages/utils, RamilGaripov prefers to keep them simple without extensive validation or error handling, as they are local developer tools where the developer controls the input and execution environment.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-03-14T01:13:34.887Z
Learnt from: leite08
PR: metriport/metriport#3469
File: packages/utils/src/s3/list-objects.ts:23-27
Timestamp: 2025-03-14T01:13:34.887Z
Learning: Files under `./packages/utils` are permitted to use `console.log` directly and are not required to use `out().log` instead, as they're exempt from certain coding standards that apply to the rest of the codebase.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-04-10T16:01:21.135Z
Learnt from: leite08
PR: metriport/metriport#3648
File: packages/utils/src/csv/find-dups.ts:14-14
Timestamp: 2025-04-10T16:01:21.135Z
Learning: Files under `packages/utils` are allowed to have empty strings for input parameters as these are meant to be manually modified by developers before running scripts in their local environments.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-08-15T18:57:28.999Z
Learnt from: lucasdellabella
PR: metriport/metriport#4382
File: packages/utils/src/hl7v2-notifications/reprocess-adt-conversion-bundles/common.ts:36-48
Timestamp: 2025-08-15T18:57:28.999Z
Learning: In utility scripts like packages/utils/src/hl7v2-notifications/reprocess-adt-conversion-bundles/common.ts, user lucasdellabella prefers to let bundle parsing errors fail fast rather than adding error handling, as it provides immediate visibility into data issues during script execution.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-04-05T16:02:31.219Z
Learnt from: leite08
PR: metriport/metriport#3611
File: packages/utils/src/carequality/download-directory.ts:33-66
Timestamp: 2025-04-05T16:02:31.219Z
Learning: In the Metriport codebase, utility scripts in the packages/utils directory have different error handling requirements compared to production code, and do not necessarily need the same level of error handling.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-08-01T18:20:14.618Z
Learnt from: keshavsaharia
PR: metriport/metriport#4161
File: packages/utils/src/surescripts/compare/dr-first.ts:253-267
Timestamp: 2025-08-01T18:20:14.618Z
Learning: For one-off utility scripts in packages/utils/src/surescripts/, it's acceptable to ignore certain code quality issues like array modification during iteration, as these scripts prioritize functionality over perfect code patterns.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-05-26T16:24:13.656Z
Learnt from: leite08
PR: metriport/metriport#3859
File: packages/utils/src/consolidated/seed-patient-consolidated.ts:94-94
Timestamp: 2025-05-26T16:24:13.656Z
Learning: For utility scripts in the packages/utils directory, leite08 accepts simpler string replacement approaches (like replaceAll) even when more targeted parsing approaches would be safer, indicating different code quality standards apply to utility scripts versus production code.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-05-31T23:09:07.678Z
Learnt from: lucasdellabella
PR: metriport/metriport#3941
File: packages/utils/src/hl7v2-notifications/unpack-patient-roster-identifier.ts:66-68
Timestamp: 2025-05-31T23:09:07.678Z
Learning: Scripts and utility tools may have relaxed code standards compared to production application code. Type safety improvements and other refactoring suggestions should be applied with less rigor to debugging scripts.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-04-03T02:23:58.332Z
Learnt from: leite08
PR: metriport/metriport#3595
File: packages/utils/src/carequality/download-directory.ts:36-66
Timestamp: 2025-04-03T02:23:58.332Z
Learning: In the metriport codebase, utility scripts may not require the same level of error handling as production code, especially for simple or one-off tasks like in the CQ directory download script.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-06-10T13:41:50.425Z
Learnt from: lucasdellabella
PR: metriport/metriport#3987
File: packages/utils/src/local-bulk-insert-patients.ts:216-219
Timestamp: 2025-06-10T13:41:50.425Z
Learning: For utility scripts and testing scripts in the packages/utils directory, prefer simpler error handling rather than detailed contextual error messages, as these are typically used internally and don't require the same level of robustness as production code.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-06-26T06:27:07.109Z
Learnt from: keshavsaharia
PR: metriport/metriport#4099
File: packages/utils/src/surescripts/shared.ts:52-63
Timestamp: 2025-06-26T06:27:07.109Z
Learning: For utility scripts in packages/utils, comprehensive error handling (like try-catch blocks for S3 operations and JSON parsing) is not necessary, as immediate failures can be more helpful for developers running these one-off tools.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-04-24T02:17:19.333Z
Learnt from: leite08
PR: metriport/metriport#3412
File: packages/utils/src/patient-import/send-webhook.ts:36-40
Timestamp: 2025-04-24T02:17:19.333Z
Learning: Utility scripts in the `packages/utils` directory should include proper error handling with try/catch blocks around their main functions, logging errors and exiting with non-zero status codes upon failure.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-05-27T16:10:48.223Z
Learnt from: lucasdellabella
PR: metriport/metriport#3866
File: packages/core/src/command/hl7v2-subscriptions/hl7v2-roster-generator.ts:142-151
Timestamp: 2025-05-27T16:10:48.223Z
Learning: In packages/core/src/command/hl7v2-subscriptions/hl7v2-roster-generator.ts, the user prefers to let axios errors bubble up naturally rather than adding try-catch blocks that re-throw with MetriportError wrappers, especially when the calling code already has retry mechanisms in place via simpleExecuteWithRetries.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-07-16T18:48:28.251Z
Learnt from: leite08
PR: metriport/metriport#4095
File: packages/commonwell-cert-runner/src/single-commands/document-parse.ts:14-24
Timestamp: 2025-07-16T18:48:28.251Z
Learning: For utility scripts in the packages/commonwell-cert-runner directory, comprehensive error handling (like try-catch blocks for file operations and JSON parsing) is not necessary, as these are certification testing tools rather than production code.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-08-11T16:17:40.913Z
Learnt from: keshavsaharia
PR: metriport/metriport#4342
File: packages/core/src/external/quest/command/bundle/get-bundle.ts:0-0
Timestamp: 2025-08-11T16:17:40.913Z
Learning: In packages/core/src/external/quest/command/bundle/get-bundle.ts, the Quest lab conversion files in S3 are guaranteed to contain valid JSON after the existence check, so JSON.parse() error handling is not necessary as these files are generated by a controlled internal process.

Applied to files:

  • packages/core/src/external/aws/lambda.ts
📚 Learning: 2025-06-25T01:55:27.902Z
Learnt from: thomasyopes
PR: metriport/metriport#4090
File: packages/core/src/command/conversion-fhir/conversion-fhir-cloud.ts:15-18
Timestamp: 2025-06-25T01:55:27.902Z
Learning: In the Metriport codebase, JSON.stringify operations on Lambda payload construction with ConverterRequest objects (containing string payload and FhirConverterParams) are safe and will not throw, as they contain only simple serializable data types.

Applied to files:

  • packages/core/src/external/aws/lambda.ts
📚 Learning: 2025-06-10T22:20:21.203Z
Learnt from: leite08
PR: metriport/metriport#3991
File: packages/core/src/external/aws/s3.ts:621-630
Timestamp: 2025-06-10T22:20:21.203Z
Learning: User leite08 prefers that all review comments are written in English.

Applied to files:

  • packages/utils/src/fhir/bundle/find_with_custom_filter.sh
🧬 Code Graph Analysis (5)
packages/core/src/command/analytics-platform/merge-csvs/__tests__/index.test.ts (1)
packages/core/src/command/analytics-platform/merge-csvs/index.ts (2)
  • groupFilesByTypeAndSize (244-320)
  • FileInfo (35-39)
packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts (1)
packages/shared/src/index.ts (1)
  • MetriportError (43-43)
packages/utils/src/analytics-platform/2-merge-csvs.ts (6)
packages/shared/src/common/date.ts (1)
  • buildDayjs (93-95)
packages/shared/src/common/env-var.ts (1)
  • getEnvVarOrFail (14-20)
packages/core/src/external/aws/s3.ts (1)
  • S3Utils (140-600)
packages/lambdas/src/analytics-platform/merge-csvs.ts (1)
  • GroupAndMergeCSVsParamsLambda (19-22)
packages/utils/src/shared/duration.ts (1)
  • elapsedTimeAsStr (8-13)
packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts (2)
  • buildFhirToCsvJobPrefix (7-9)
  • parsePatientIdFromFhirToCsvPatientPrefix (29-37)
packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts (6)
packages/shared/src/common/env-var.ts (1)
  • getEnvVarOrFail (14-20)
packages/core/src/command/analytics-platform/merge-csvs/file-name.ts (1)
  • buildMergeCsvsJobPrefix (5-7)
packages/shared/src/index.ts (1)
  • sleep (13-13)
packages/shared/src/common/date.ts (1)
  • buildDayjs (93-95)
packages/utils/src/shared/duration.ts (1)
  • elapsedTimeAsStr (8-13)
packages/core/src/external/snowflake/commands.ts (3)
  • promisifyConnect (3-15)
  • promisifyExecute (31-53)
  • promisifyDestroy (17-29)
packages/core/src/external/aws/lambda.ts (2)
packages/core/src/util/log.ts (1)
  • out (30-35)
packages/shared/src/index.ts (3)
  • MetriportError (43-43)
  • BadRequestError (42-42)
  • NotFoundError (44-44)
🪛 LanguageTool
packages/core/src/external/aws/lambda-logic/README.md

[grammar] ~3-~3: There might be a mistake here.
Context: ...in this folder, we will have a hard time maintaining it, all lambdas' logic will ...

(QB_NEW_EN)


[grammar] ~6-~6: There might be a mistake here.
Context: ...ommands/functions that are stored on the proper folder for each use case/business...

(QB_NEW_EN)


[grammar] ~7-~7: There might be a mistake here.
Context: ...folder for each use case/business logic.

(QB_NEW_EN)

packages/utils/src/analytics-platform/3.readme.md

[grammar] ~1-~1: There might be a mistake here.
Context: ...-folder for a script starting with "3-".

(QB_NEW_EN)

🪛 Shellcheck (0.10.0)
packages/utils/src/fhir/bundle/find_with_custom_filter.sh

[warning] 103-103: Declare and assign separately to avoid masking return values.

(SC2155)

🔇 Additional comments (4)
packages/core/src/external/aws/lambda.ts (1)

13-14: Deprecation note: good call-out

Clear guidance to prefer v3. No action needed.

packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts (2)

164-168: LGTM: Intentional fail-fast on CREATE TABLE avoids duplicate ingestion.

Omitting IF NOT EXISTS here matches the bulk-ingest guardrail you want, preventing accidental second loads into an existing table.


178-181: Ensure Snowflake FILE FORMAT ‘gzip_csv_format’ exists and is accessible.

If this file format isn’t provisioned in the account or current schema/DB/role can’t see it, COPY will fail. Confirm it’s created (or switch to an inline file format definition).

packages/core/src/command/analytics-platform/merge-csvs/__tests__/index.test.ts (1)

4-290: Solid, comprehensive tests for grouping logic — LGTM

Great coverage across uniqueness, grouping by tableName, large/small handling, edge cases, IDs, and totals. The tests exercise only the public API and avoid order assumptions, which keeps them robust.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (8)
packages/lambdas/package.json (1)

65-65: Re: snowflake-sdk in shared lambdas package

Previous review flagged keeping snowflake-sdk here can bloat all lambda bundles. You mentioned it’s intentional in preparation for moving Step 3 to Lambda. Consider scoping it to the specific ingest lambda (or a dedicated layer) when that ships so other functions don’t carry it.

packages/utils/src/fhir/bundle/find_with_custom_filter.sh (2)

97-110: Use jq’s exit status directly instead of capturing output (SC2155).

This was previously noted; consider relying on jq -e’s exit code for robustness.

 check_file_with_filter() {
     local file="$1"
     local filter="$2"
-    
-    # Apply the custom jq filter to the file
-    local result=$(jq -e "$filter" "$file" 2>/dev/null)
-    
-    if [ "$result" = "true" ]; then
-        return 0  # File matches the filter
-    fi
-    
-    return 1  # File doesn't match the filter
+    # Apply the custom jq filter; truthy -> exit 0, false/null -> non-zero
+    if jq -e "$filter" "$file" >/dev/null 2>&1; then
+        return 0
+    fi
+    return 1
 }

112-118: Handle spaces/newlines in filenames with NUL-safe iteration.

This was previously suggested; using -print0 + read -d '' prevents word-splitting issues.

-# Find all JSON files in the directory and subdirectories up to maxdepth
-json_files=$(find "$DIRECTORY" -maxdepth "$MAXDEPTH" -name "*.json" -type f | sort)
-
-if [ -z "$json_files" ]; then
-    echo "No JSON files found in directory: $DIRECTORY"
-    exit 0
-fi
+scanned_count=0
@@
-# Process each JSON file
-for file in $json_files; do
+# Process each JSON file (NUL-delimited to be safe with spaces/newlines)
+while IFS= read -r -d '' file; do
+    scanned_count=$((scanned_count + 1))
@@
-done
+done < <(find "$DIRECTORY" -maxdepth "$MAXDEPTH" -type f -name "*.json" -print0)
+
+if [ "$scanned_count" -eq 0 ]; then
+    echo "No JSON files found in directory: $DIRECTORY"
+    exit 0
+fi

Also applies to: 129-147

packages/utils/src/analytics-platform/2-merge-csvs.ts (2)

169-171: Add validation for empty patient ID list

The function doesn't validate if any patient IDs were found, which could lead to silent failures downstream.

Apply this fix to validate the patient list:

   const patientIds = files.flatMap(file =>
     file.Prefix ? parsePatientIdFromFhirToCsvPatientPrefix(file.Prefix) : []
   );
+  
+  if (patientIds.length === 0) {
+    throw new Error(`No patient data found for fhirToCsvJobId: ${fhirToCsvJobId}`);
+  }
+  
   return patientIds;

121-121: Deduplication ID could exceed reasonable limits with many patient IDs

Using ptIdsOfThisRun.join(",") to generate the deduplication ID could create very long strings (up to 1,200 UUIDs = ~43KB), and if patient IDs contain commas, it could lead to collisions.

Consider using a hash-based approach for more reliable deduplication:

+import crypto from "crypto";

 // ... in the executeAsynchronously callback:
         await sqsClient.sendMessageToQueue(queueUrl, payloadString, {
           fifo: true,
-          messageDeduplicationId: createUuidFromText(ptIdsOfThisRun.join(",")),
+          // Create a deterministic hash of the sorted patient IDs for consistent deduplication
+          messageDeduplicationId: crypto
+            .createHash("sha256")
+            .update(JSON.stringify({ jobId: mergeCsvJobId, patientIds: [...ptIdsOfThisRun].sort() }))
+            .digest("hex")
+            .substring(0, 32), // Use first 32 chars for a valid dedup ID
           messageGroupId,
packages/core/src/external/aws/lambda.ts (3)

27-33: Use NodeHttpHandler for AWS SDK v3 client timeouts

The AWS SDK v3 doesn't support the httpOptions property. Network-level timeouts must be configured using a custom HTTP handler.

Import the handler and update the client construction:

+import { NodeHttpHandler } from "@smithy/node-http-handler";

 export function makeLambdaClientV3(region: string, timeoutInMillis?: number): LambdaClientV3 {
-  const lambdaClient = new LambdaClientV3({
-    region,
-    ...(timeoutInMillis ? { httpOptions: { timeout: timeoutInMillis } } : {}),
-  });
-  return lambdaClient;
+  const config: any = { region };
+  
+  if (timeoutInMillis) {
+    config.requestHandler = new NodeHttpHandler({
+      connectionTimeout: timeoutInMillis,
+      socketTimeout: timeoutInMillis,
+    });
+  }
+  
+  return new LambdaClientV3(config);
 }

97-106: Properly decode Uint8Array payload before JSON parsing

The V3 SDK returns Payload as a Uint8Array. Calling .toString() on it may not decode UTF-8 correctly.

Apply this fix to properly decode the payload:

 export function getLambdaErrorV3(result: InvokeCommandOutput): LambdaError | undefined {
   if (!result.Payload) return undefined;
   if (!isLambdaErrorV3(result)) return undefined;
-  const response = JSON.parse(result.Payload.toString());
+  const responseStr = Buffer.from(result.Payload as Uint8Array).toString("utf-8");
+  const response = JSON.parse(responseStr);
   return {
     errorType: response.errorType,
     errorMessage: response.errorMessage,
     log: logResultToString(result.LogResult),
   };
 }

249-249: Decode Uint8Array payload before returning

The return statement should decode the Uint8Array properly.

Apply this fix:

-  return result.Payload.toString();
+  return Buffer.from(result.Payload as Uint8Array).toString("utf-8");
🧹 Nitpick comments (15)
packages/utils/src/consolidated/bulk-recreate-consolidated.ts (3)

25-28: Doc vs config mismatch: comment recommends ≥1s min delay, code sets 500ms

The note suggests keeping delay/sleep above 1 second, but minimumDelayTime below is 500 ms. Either bump the min delay to 1 second or update this note to match the current 500 ms policy.

If you intend to keep the 1 second policy, update the config (see next comment). If you intend 500 ms, adjust this comment:

- * Be mindful about that, so we likely don't want to get the delay/sleep under 1 second.
+ * Be mindful about that, so we likely don't want to get the delay/sleep under 500 milliseconds.

58-59: Align minimum delay with stated policy and consider effective global rate with concurrency

Given the OpenSearch re-ingestion concern, using 1 second as the hard minimum seems safer. With numberOfParallelExecutions = 30 and a default 2s sleep per patient, your effective request rate can still be high (~15 req/s if API calls are fast). If the intent is “≥1s between patient kicks” globally, consider either raising the default or reducing concurrency.

Apply this diff to align min delay to the stated 1s policy:

-const minimumDelayTime = dayjs.duration(500, "milliseconds");
+const minimumDelayTime = dayjs.duration(1, "second");
 const defaultDelayTime = dayjs.duration(2, "seconds");

Optional: expose both delays and concurrency as CLI flags so they can be tuned per run without code changes.


13-13: getDelayTime returns a number in milliseconds—no .asMilliseconds() conversion needed

Verified that getDelayTime(…) already returns a number (ms). You can safely use it directly for both logging and sleep(). For clarity, consider renaming the local variable to delayMillis:

• File: packages/utils/src/consolidated/bulk-recreate-consolidated.ts
• Lines: 114–116

- const delayTime = getDelayTime({ log, minimumDelayTime, defaultDelayTime });
- log(`...sleeping for ${delayTime} ms`);
- await sleep(delayTime);
+ const delayMillis = getDelayTime({ log, minimumDelayTime, defaultDelayTime });
+ log(`...sleeping for ${delayMillis} ms`);
+ await sleep(delayMillis);
packages/utils/src/analytics-platform/3.readme.md (1)

1-1: Tighten wording and fix minor grammar

Clearer phrasing and consistent terminology (“subfolder” and “starts with”) improves readability.

Apply this diff:

-The 3rd step is platform-specific, check the respective sub-folder for a script starting with "3-".
+The third step is platform-specific. Check the respective subfolder for a script whose name starts with "3-".
packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts (3)

23-39: Fix typo and remove outdated doc references to non-existent runtime params

  • “buy” → “by”.
  • The note about “max number of rows” and delete toggles is stale and can confuse readers since the script only accepts mergeCsvJobId.

Apply this diff:

- * It relies on the mergeCsvJobId to be passed as a parameter. This ID is determined buy the 2nd step in
+ * It relies on the mergeCsvJobId to be passed as a parameter. This ID is determined by the 2nd step in
  * the flow, the script `2-merge-csvs.ts`.
  *
- * It relies on the merged CSV files to be on the configured bucket, with the following structure:
+ * It relies on the merged CSV files to be on the configured bucket, with the following structure:
  *
  * ./snowflake/merged/cxId/patientId/jobId/_tmp_fhir-to-csv_output_cxId_patientId_resourceType.csv
  *
- * As a required runtime parameter, it needs the max number of rows to be used from each file found on S3.
- *
- * Set these constants based on the output from the merge-csvs lambda:
- * - patientIds
- * - mergeCsvJobId
- *
- * Set this constant to determine whether to include deletes while updating Snowflake, to strain
- * transaction control:
- * - includeDeletes
- *
+ * Note: No additional runtime parameters are required beyond <mergeCsvJobId>.

156-159: Avoid false positives when checking for resource files

Using substring matching can mis-detect resources with overlapping names (e.g., “condition” vs folders like “condition_code_coding”). Check for the exact resource subfolder prefix instead.

Apply this diff:

-  if (!files.some(file => file.Key?.includes(resourceType))) {
+  if (
+    !files.some(file => {
+      const key = file.Key ?? "";
+      return key.startsWith(`${prefixName}/${resourceType}/`);
+    })
+  ) {

170-174: Avoid object-name collisions and ease cleanup: use a dedicated stage name

Using the same name for table and stage can be problematic in Snowflake (object namespaces may collide across types; behavior can vary and is account-config dependent). Safer to use a dedicated stage name and drop it defensively.

Apply this diff:

-  const createStageCmd =
-    `CREATE STAGE IF NOT EXISTS ${tableName} STORAGE_INTEGRATION = ANALYTICS_BUCKET ` +
-    `URL = '${prefixUrl}/${resourceType}/'`;
+  const stageName = `STG_${tableName}`;
+  const createStageCmd =
+    `CREATE STAGE IF NOT EXISTS ${stageName} STORAGE_INTEGRATION = ANALYTICS_BUCKET ` +
+    `URL = '${prefixUrl}/${resourceType}/'`;
@@
-  const copyCmd = `COPY INTO ${tableName} FROM @${tableName} FILE_FORMAT = gzip_csv_format ON_ERROR = ABORT_STATEMENT`;
+  const copyCmd = `COPY INTO ${tableName} FROM @${stageName} FILE_FORMAT = gzip_csv_format ON_ERROR = ABORT_STATEMENT`;
@@
-  const dropStageCmd = `DROP STAGE ${tableName};`;
+  const dropStageCmd = `DROP STAGE IF EXISTS ${stageName};`;

Also applies to: 176-181, 183-184

packages/lambdas/package.json (1)

52-52: Remove unused ini dependency from shared lambdas package

I didn’t find any import or require calls for ini under packages/lambdas/**. To avoid unnecessary cold-start bloat, please remove it from the shared lambdas package.json and only add it when and where it’s actually consumed (for example, scoped to the ingest lambda’s own package or layer).

Locations to update:

  • packages/lambdas/package.json (remove the "ini": "^5.0.0", entry on line 52)

Suggested diff:

--- a/packages/lambdas/package.json
+++ b/packages/lambdas/package.json
@@ -49,7 +49,6 @@
     "js-yaml": "^4.1.0",
-    "ini": "^5.0.0",
     "lodash": "^4.17.21",
     "uuid": "^8.3.2"
   },
packages/utils/src/fhir/bundle/find_with_custom_filter.sh (1)

1-2: Optional: make the script fail-fast.

Add strict mode to avoid continuing with unset vars and to surface errors early.

 #!/bin/bash
+set -euo pipefail
packages/utils/src/analytics-platform/util/check-duplicate-patient-ids.ts (1)

238-257: Optional: sort files and IDs for deterministic output.

For large runs, deterministic ordering helps diff reports across runs. Consider sorting paramsFiles and the per-ID file lists.

-    const paramsFiles = this.findParamsFiles();
+    const paramsFiles = this.findParamsFiles().sort((a, b) =>
+      a.localeCompare(b, undefined, { numeric: true, sensitivity: "base" })
+    );

Outside selected lines (illustrative):

  • When building the per-ID files array for reporting, wrap Array.from(filesSet).sort(...) similarly before writing text/JSON.
packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts (2)

23-27: Make tableName parsing robust to underscores and extensions (e.g., .csv.gz).

Indexing and fixed slice(6) are brittle if IDs contain underscores or the path shape changes. Use the filename’s last underscore and strip known extensions.

-export function parseTableNameFromFhirToCsvFileKey(key: string): string {
-  // e.g.: snowflake/fhir-to-csv/cx=cx-id/f2c=2025-08-08T02-18-56/pt=patient-id/_tmp_fhir-to-csv_output_cx-id_patient-id_condition.csv
-  const tableName = key.split("/")[5]?.split("_").slice(6).join("_").split(".")[0] || "";
-  return tableName;
-}
+export function parseTableNameFromFhirToCsvFileKey(key: string): string {
+  // e.g.: snowflake/fhir-to-csv/cx=cx-id/f2c=.../pt=.../_tmp_fhir-to-csv_output_cx-id_patient-id_condition.csv
+  const fileName = key.split("/").pop() ?? "";
+  if (!fileName) return "";
+  // Drop compression extensions
+  const withoutExt = fileName.replace(/\.csv(\.gz)?$/i, "");
+  const idx = withoutExt.lastIndexOf("_");
+  return idx >= 0 ? withoutExt.slice(idx + 1) : "";
+}

29-36: Harden patientId parsing by validating the last segment.

Return a clearer error when the last segment isn’t pt=... to prevent accidental misparses.

 export function parsePatientIdFromFhirToCsvPatientPrefix(key: string): string {
   // e.g.: snowflake/fhir-to-csv/cx=cx-id/f2c=2025-08-08T02-18-56/pt=patient-id/
   const withoutSlash = key.endsWith("/") ? key.slice(0, -1) : key;
-  const patientId = withoutSlash.split("/").pop()?.replace("pt=", "");
-  if (!patientId) {
+  const lastSegment = withoutSlash.split("/").pop();
+  if (!lastSegment || !lastSegment.startsWith("pt=")) {
     throw new MetriportError(`Failed to parse patientId from key`, undefined, { key });
   }
-  return patientId;
+  return lastSegment.slice(3);
 }
packages/infra/lib/analytics-platform/analytics-platform-stack.ts (1)

61-82: Consider making merge configuration adjustable via environment variables

The merge CSV Lambda settings are hardcoded with fixed memory (4096 MB) and concurrency (20). These values might need adjustment based on workload patterns.

Consider making these configurable through CDK context or environment configuration:

+  const mergeCsvMemory = Number(process.env.MERGE_CSV_MEMORY) || 4096;
+  const mergeCsvConcurrency = Number(process.env.MERGE_CSV_CONCURRENCY) || 20;
   const mergeCsvsLambdaTimeout = Duration.minutes(15).minus(Duration.seconds(10));
   const mergeCsvs: QueueAndLambdaSettings = {
     name: "MergeCsvs",
     entry: "analytics-platform/merge-csvs",
     lambda: {
-      memory: 4096,
+      memory: mergeCsvMemory as 4096 | 6144 | 8192 | 10240,
       timeout: mergeCsvsLambdaTimeout,
     },
     queue: {
       alarmMaxAgeOfOldestMessage: Duration.hours(2),
       maxMessageCountAlarmThreshold: 1_000,
       maxReceiveCount: 1,
       visibilityTimeout: Duration.seconds(mergeCsvsLambdaTimeout.toSeconds() * 2 + 1),
       createRetryLambda: false,
     },
     eventSource: {
       batchSize: 1,
       reportBatchItemFailures: true,
-      maxConcurrency: 20,
+      maxConcurrency: mergeCsvConcurrency,
     },
     waitTime: waitTimeFhirToCsv,
packages/core/src/command/analytics-platform/merge-csvs/index.ts (2)

19-22: Consider making concurrency limits configurable

These constants control critical performance characteristics but are hardcoded. Runtime adjustment would be beneficial for tuning based on actual workload.

As noted by @leite08, consider moving these to environment variables via CDK for runtime configurability:

-// Too much can result in continuous rate limit errors from S3 (we try up to 5 times)
-const numberOfParallelListObjectsFromS3 = 20;
-// Too much can result in an OOM error at the lambda
-const numberOfParallelMergeFileGroups = 5;
+// Too much can result in continuous rate limit errors from S3 (we try up to 5 times)
+const numberOfParallelListObjectsFromS3 = Number(process.env.PARALLEL_LIST_OBJECTS_FROM_S3) || 20;
+// Too much can result in an OOM error at the lambda
+const numberOfParallelMergeFileGroups = Number(process.env.PARALLEL_MERGE_FILE_GROUPS) || 5;

224-224: Refactor table name parsing for resilience

The parseTableNameFromFhirToCsvFileKey implementation in
packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts
relies on hardcoded indices (key.split("/")[5]?.split("_").slice(6)…), and there are no unit tests verifying its behavior. This will fail silently if the S3 key format ever shifts.

• packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts
– Wrap each array access in a length check or switch to a regex with named capture groups.
– Return a clear error or fallback when the expected segments aren’t present.
• packages/core/src/command/analytics-platform/merge-csvs/index.ts (around line 224)
– Guard against empty/invalid tableName or propagate errors appropriately.

Next steps:

  1. Add unit tests covering valid and malformed keys for parseTableNameFromFhirToCsvFileKey.
  2. Refactor the parsing logic for clarity and safety (e.g. regex, explicit validations).
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled
  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 665c14b and 0743148.

⛔ Files ignored due to path filters (2)
  • package-lock.json is excluded by !**/package-lock.json
  • packages/lambdas/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (32)
  • packages/core/package.json (4 hunks)
  • packages/core/src/command/analytics-platform/fhir-to-csv/batch/fhir-to-csv.ts (0 hunks)
  • packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv-transform.ts (0 hunks)
  • packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts (1 hunks)
  • packages/core/src/command/analytics-platform/merge-csvs/__tests__/index.test.ts (1 hunks)
  • packages/core/src/command/analytics-platform/merge-csvs/file-name.ts (1 hunks)
  • packages/core/src/command/analytics-platform/merge-csvs/index.ts (1 hunks)
  • packages/core/src/external/aws/lambda-logic/README.md (1 hunks)
  • packages/core/src/external/aws/lambda.ts (7 hunks)
  • packages/core/src/external/aws/s3.ts (1 hunks)
  • packages/core/src/external/snowflake/commands.ts (1 hunks)
  • packages/core/src/util/config.ts (0 hunks)
  • packages/data-transformation/fhir-to-csv/main.py (2 hunks)
  • packages/data-transformation/fhir-to-csv/src/utils/file.py (1 hunks)
  • packages/infra/lib/analytics-platform/analytics-platform-stack.ts (8 hunks)
  • packages/infra/lib/analytics-platform/types.ts (1 hunks)
  • packages/infra/lib/api-stack.ts (1 hunks)
  • packages/infra/lib/api-stack/api-service.ts (0 hunks)
  • packages/infra/lib/shared/settings.ts (1 hunks)
  • packages/lambdas/package.json (3 hunks)
  • packages/lambdas/src/analytics-platform/fhir-to-csv.ts (0 hunks)
  • packages/lambdas/src/analytics-platform/merge-csvs.ts (1 hunks)
  • packages/utils/package.json (3 hunks)
  • packages/utils/src/analytics-platform/1-fhir-to-csv.ts (4 hunks)
  • packages/utils/src/analytics-platform/2-merge-csvs.ts (1 hunks)
  • packages/utils/src/analytics-platform/3.readme.md (1 hunks)
  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts (1 hunks)
  • packages/utils/src/analytics-platform/util/check-duplicate-patient-ids.ts (1 hunks)
  • packages/utils/src/analytics-platform/util/check-merge-groups.ts (1 hunks)
  • packages/utils/src/consolidated/bulk-recreate-consolidated.ts (4 hunks)
  • packages/utils/src/fhir/bundle/count_resources.sh (1 hunks)
  • packages/utils/src/fhir/bundle/find_with_custom_filter.sh (1 hunks)
💤 Files with no reviewable changes (5)
  • packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv-transform.ts
  • packages/core/src/util/config.ts
  • packages/core/src/command/analytics-platform/fhir-to-csv/batch/fhir-to-csv.ts
  • packages/infra/lib/api-stack/api-service.ts
  • packages/lambdas/src/analytics-platform/fhir-to-csv.ts
🚧 Files skipped from review as they are similar to previous changes (15)
  • packages/utils/src/analytics-platform/1-fhir-to-csv.ts
  • packages/infra/lib/shared/settings.ts
  • packages/utils/package.json
  • packages/infra/lib/analytics-platform/types.ts
  • packages/core/src/external/snowflake/commands.ts
  • packages/infra/lib/api-stack.ts
  • packages/core/package.json
  • packages/lambdas/src/analytics-platform/merge-csvs.ts
  • packages/utils/src/fhir/bundle/count_resources.sh
  • packages/core/src/command/analytics-platform/merge-csvs/file-name.ts
  • packages/core/src/external/aws/s3.ts
  • packages/core/src/command/analytics-platform/merge-csvs/tests/index.test.ts
  • packages/data-transformation/fhir-to-csv/main.py
  • packages/utils/src/analytics-platform/util/check-merge-groups.ts
  • packages/data-transformation/fhir-to-csv/src/utils/file.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{js,jsx,ts,tsx}

📄 CodeRabbit Inference Engine (.cursorrules)

**/*.{js,jsx,ts,tsx}: Don’t use null inside the app, only on code interacting with external interfaces/services, like DB and HTTP; convert to undefined before sending inwards into the code
Use const whenever possible
Use async/await instead of .then()
Naming: classes, enums: PascalCase
Naming: constants, variables, functions: camelCase
Naming: file names: kebab-case
Naming: Don’t use negative names, like notEnabled, prefer isDisabled
If possible, use decomposing objects for function parameters
Prefer Nullish Coalesce (??) than the OR operator (||) when you want to provide a default value
Avoid creating arrow functions
Use truthy syntax instead of in - i.e., if (data.link) not if ('link' in data)
While handling errors, keep the stack trace around: if you create a new Error (e.g., MetriportError), make sure to pass the original error as the new one’s cause so the stack trace is available upstream.
max column length is 100 chars
multi-line comments use /** */
top-level comments go after the import (save pre-import to basic file header, like license)
move literals to constants declared after imports when possible

Files:

  • packages/utils/src/analytics-platform/util/check-duplicate-patient-ids.ts
  • packages/utils/src/analytics-platform/2-merge-csvs.ts
  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
  • packages/utils/src/consolidated/bulk-recreate-consolidated.ts
  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
  • packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts
  • packages/core/src/external/aws/lambda.ts
  • packages/infra/lib/analytics-platform/analytics-platform-stack.ts
**/*.{ts,tsx}

📄 CodeRabbit Inference Engine (.cursorrules)

Use types whenever possible

Files:

  • packages/utils/src/analytics-platform/util/check-duplicate-patient-ids.ts
  • packages/utils/src/analytics-platform/2-merge-csvs.ts
  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
  • packages/utils/src/consolidated/bulk-recreate-consolidated.ts
  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
  • packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts
  • packages/core/src/external/aws/lambda.ts
  • packages/infra/lib/analytics-platform/analytics-platform-stack.ts
**/*.ts

⚙️ CodeRabbit Configuration File

**/*.ts: - Use the Onion Pattern to organize a package's code in layers

  • Try to use immutable code and avoid sharing state across different functions, objects, and systems
  • Try to build code that's idempotent whenever possible
  • Prefer functional programming style functions: small, deterministic, 1 input, 1 output
  • Minimize coupling / dependencies
  • Avoid modifying objects received as parameter
  • Only add comments to code to explain why something was done, not how it works
  • Naming
    • classes, enums: PascalCase
    • constants, variables, functions: camelCase
    • file names: kebab-case
    • table and column names: snake_case
    • Use meaningful names, so whoever is reading the code understands what it means
    • Don’t use negative names, like notEnabled, prefer isDisabled
    • For numeric values, if the type doesn’t convey the unit, add the unit to the name
  • Typescript
    • Use types
    • Prefer const instead of let
    • Avoid any and casting from any to other types
    • Type predicates: only applicable to narrow down the type, not to force a complete type conversion
    • Prefer deconstructing parameters for functions instead of multiple parameters that might be of
      the same type
    • Don’t use null inside the app, only on code interacting with external interfaces/services,
      like DB and HTTP; convert to undefined before sending inwards into the code
    • Use async/await instead of .then()
    • Use the strict equality operator ===, don’t use abstract equality operator ==
    • When calling a Promise-returning function asynchronously (i.e., not awaiting), use .catch() to
      handle errors (see processAsyncError and emptyFunction depending on the case)
    • Date and Time
      • Always use buildDayjs() to create dayjs instances
      • Prefer dayjs.duration(...) to create duration consts and keep them as duration
  • Prefer Nullish Coalesce (??) than the OR operator (||) to provide a default value
  • Avoid creating arrow functions
  • U...

Files:

  • packages/utils/src/analytics-platform/util/check-duplicate-patient-ids.ts
  • packages/utils/src/analytics-platform/2-merge-csvs.ts
  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
  • packages/utils/src/consolidated/bulk-recreate-consolidated.ts
  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
  • packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts
  • packages/core/src/external/aws/lambda.ts
  • packages/infra/lib/analytics-platform/analytics-platform-stack.ts
🧠 Learnings (38)
📓 Common learnings
Learnt from: leite08
PR: metriport/metriport#4255
File: packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts:19-20
Timestamp: 2025-07-27T00:40:32.149Z
Learning: User leite08 acknowledged the duplicate inputBundle spread issue in packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts and will fix it in a follow-up PR rather than in the current patch release PR #4255.
Learnt from: leite08
PR: metriport/metriport#4013
File: packages/core/src/fhir-to-cda/cda-templates/components/procedures.ts:104-105
Timestamp: 2025-06-12T22:53:09.606Z
Learning: User leite08 prefers responses in English only.
Learnt from: leite08
PR: metriport/metriport#3991
File: packages/core/src/external/aws/s3.ts:621-630
Timestamp: 2025-06-10T22:20:21.203Z
Learning: User leite08 prefers that all review comments are written in English.
📚 Learning: 2025-05-20T21:26:26.804Z
Learnt from: leite08
PR: metriport/metriport#3814
File: packages/api/src/routes/internal/medical/patient-consolidated.ts:141-174
Timestamp: 2025-05-20T21:26:26.804Z
Learning: The functionality introduced in packages/api/src/routes/internal/medical/patient-consolidated.ts is planned to be refactored in downstream PR #3857, including improvements to error handling and validation.

Applied to files:

  • packages/utils/src/analytics-platform/util/check-duplicate-patient-ids.ts
  • packages/utils/src/analytics-platform/2-merge-csvs.ts
  • packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts
📚 Learning: 2025-07-27T00:40:32.149Z
Learnt from: leite08
PR: metriport/metriport#4255
File: packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts:19-20
Timestamp: 2025-07-27T00:40:32.149Z
Learning: User leite08 acknowledged the duplicate inputBundle spread issue in packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts and will fix it in a follow-up PR rather than in the current patch release PR #4255.

Applied to files:

  • packages/utils/src/analytics-platform/2-merge-csvs.ts
  • packages/lambdas/package.json
  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
  • packages/utils/src/fhir/bundle/find_with_custom_filter.sh
📚 Learning: 2025-04-16T00:25:25.196Z
Learnt from: RamilGaripov
PR: metriport/metriport#3676
File: packages/core/src/command/hl7v2-subscriptions/hl7v2-to-fhir-conversion/shared.ts:1-10
Timestamp: 2025-04-16T00:25:25.196Z
Learning: The circular dependency between shared.ts (importing getPatientIdsOrFail) and adt/utils.ts (using unpackPidFieldOrFail) will be addressed in a follow-up PR.

Applied to files:

  • packages/utils/src/analytics-platform/2-merge-csvs.ts
📚 Learning: 2025-06-17T20:03:18.931Z
Learnt from: RamilGaripov
PR: metriport/metriport#3962
File: packages/api/src/sequelize/migrations/2025-06-17_00_create-cohort-tables.ts:50-56
Timestamp: 2025-06-17T20:03:18.931Z
Learning: In the Metriport codebase, the patient table ID is defined as DataTypes.STRING in the database schema, not UUID. Foreign key references to patient.id should use DataTypes.STRING to maintain consistency.

Applied to files:

  • packages/utils/src/analytics-platform/2-merge-csvs.ts
📚 Learning: 2025-07-18T09:33:29.581Z
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/api/src/routes/internal/analytics-platform/index.ts:28-28
Timestamp: 2025-07-18T09:33:29.581Z
Learning: In packages/api/src/routes/internal/analytics-platform/index.ts, the patientId parameter handling is intentionally inconsistent: the `/fhir-to-csv` and `/fhir-to-csv/transform` endpoints require patientId (using getFromQueryOrFail) for single patient processing, while the `/fhir-to-csv/batch` endpoint treats patientId as optional (using getFromQuery) because it's designed to run over all patients when no specific patient ID is provided.

Applied to files:

  • packages/utils/src/analytics-platform/2-merge-csvs.ts
  • packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts
📚 Learning: 2025-07-16T01:38:14.080Z
Learnt from: leite08
PR: metriport/metriport#4194
File: packages/api/src/command/medical/patient/get-patient-facilities.ts:36-44
Timestamp: 2025-07-16T01:38:14.080Z
Learning: In packages/api/src/command/medical/patient/get-patient-facilities.ts, the patient.facilityIds field is strongly typed as array of strings (non-nullable), so null/undefined checks are not needed when accessing this property.

Applied to files:

  • packages/utils/src/analytics-platform/2-merge-csvs.ts
📚 Learning: 2025-05-30T01:11:35.300Z
Learnt from: keshavsaharia
PR: metriport/metriport#3885
File: packages/utils/src/surescripts/generate-patient-load-file.ts:92-94
Timestamp: 2025-05-30T01:11:35.300Z
Learning: In V0 of the Surescripts implementation (packages/utils/src/surescripts/generate-patient-load-file.ts), there is always only 1 facility ID, so the current facility_ids parsing logic with substring operations is sufficient and additional validation is not necessary.

Applied to files:

  • packages/utils/src/analytics-platform/2-merge-csvs.ts
📚 Learning: 2025-07-20T01:18:28.093Z
Learnt from: leite08
PR: metriport/metriport#4215
File: packages/core/src/command/analytics-platform/config.ts:4-13
Timestamp: 2025-07-20T01:18:28.093Z
Learning: User leite08 prefers that code improvement suggestions (like consolidating duplicated schemas) be made on feature PRs rather than release PRs. Such improvements should be deferred to follow-up PRs during the release cycle.

Applied to files:

  • packages/lambdas/package.json
📚 Learning: 2025-08-15T03:55:25.197Z
Learnt from: leite08
PR: metriport/metriport#4350
File: packages/core/src/command/analytics-platform/merge-csvs/index.ts:336-363
Timestamp: 2025-08-15T03:55:25.197Z
Learning: For bulk upload operations (non-transactional flows), logging error details is sufficient and detailed error aggregation in thrown exceptions is not necessary, unlike transactional flows where detailed error information in exceptions may be more important.

Applied to files:

  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
📚 Learning: 2025-04-24T02:02:14.646Z
Learnt from: leite08
PR: metriport/metriport#3412
File: packages/core/src/command/patient-import/steps/result/patient-import-result-cloud.ts:16-18
Timestamp: 2025-04-24T02:02:14.646Z
Learning: When logging payloads, it's acceptable to include them in logs if they only contain non-sensitive identifiers (like IDs) and are small enough to not create multi-line logs.

Applied to files:

  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
📚 Learning: 2025-06-18T17:24:31.650Z
Learnt from: keshavsaharia
PR: metriport/metriport#4045
File: packages/core/src/external/surescripts/__tests__/file-names.test.ts:21-21
Timestamp: 2025-06-18T17:24:31.650Z
Learning: The team does all testing and deploying in PST local time, which helps avoid issues with date-dependent code in tests like buildRequestFileName that use dayjs().format('YYYYMMDD').

Applied to files:

  • packages/utils/src/consolidated/bulk-recreate-consolidated.ts
📚 Learning: 2025-08-15T03:59:27.025Z
Learnt from: leite08
PR: metriport/metriport#4350
File: packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts:159-161
Timestamp: 2025-08-15T03:59:27.025Z
Learning: User leite08 prefers fail-fast behavior for bulk import scripts in Snowflake. When tables already exist, the script should fail rather than continue, as this prevents accidental duplicate data ingestion. The `IF NOT EXISTS` clause should be omitted in bulk import scenarios to ensure data integrity.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-04-05T16:02:22.473Z
Learnt from: leite08
PR: metriport/metriport#3611
File: packages/utils/src/carequality/download-directory.ts:68-82
Timestamp: 2025-04-05T16:02:22.473Z
Learning: Utility scripts in the packages/utils directory don't require comprehensive error handling like production code.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-05-27T23:51:45.100Z
Learnt from: leite08
PR: metriport/metriport#3857
File: packages/utils/src/consolidated/filter-consolidated.ts:39-68
Timestamp: 2025-05-27T23:51:45.100Z
Learning: Utility scripts in the packages/utils directory don't require comprehensive error handling or logging changes (like switching from console.log to out().log), as they are meant for local/developer environments, not production code.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-03-14T00:30:18.860Z
Learnt from: leite08
PR: metriport/metriport#3468
File: packages/utils/src/s3/list-objects.ts:17-28
Timestamp: 2025-03-14T00:30:18.860Z
Learning: Files under `./packages/utils` folder are utility scripts and don't require the same coding standards used in other packages (like error handling, structured logging, etc.).

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-04-24T02:17:02.661Z
Learnt from: leite08
PR: metriport/metriport#3412
File: packages/utils/src/patient-import/send-webhook.ts:41-55
Timestamp: 2025-04-24T02:17:02.661Z
Learning: Files under `packages/utils` don't need robust error handling as they are utility scripts meant for local/developer environments, not production code.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-05-30T14:19:02.018Z
Learnt from: leite08
PR: metriport/metriport#3924
File: packages/utils/src/open-search/ingest-patients.ts:73-74
Timestamp: 2025-05-30T14:19:02.018Z
Learning: Utility scripts in the codebase don't need to strictly follow all coding guidelines, particularly around logging practices like avoiding multi-line logs or objects as second parameters.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-03-14T00:31:01.986Z
Learnt from: leite08
PR: metriport/metriport#3468
File: packages/utils/src/s3/list-objects.ts:12-13
Timestamp: 2025-03-14T00:31:01.986Z
Learning: Files under the `./packages/utils` folder are utility scripts and don't require the same coding standards as other packages. They often contain placeholder values meant to be filled in by developers before use.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-05-31T23:09:05.757Z
Learnt from: lucasdellabella
PR: metriport/metriport#3941
File: packages/utils/src/hl7v2-notifications/unpack-patient-roster-identifier.ts:61-64
Timestamp: 2025-05-31T23:09:05.757Z
Learning: Utility scripts in the utils package may have relaxed coding standards compared to production code, particularly for error handling. When reviewing debugging scripts or one-off utilities, apply less stringent requirements for error handling patterns.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-07-27T23:42:34.014Z
Learnt from: RamilGaripov
PR: metriport/metriport#4259
File: packages/utils/src/document/reconversion-kickoff-loader.ts:78-81
Timestamp: 2025-07-27T23:42:34.014Z
Learning: For utility scripts in packages/utils, RamilGaripov prefers to keep them simple without extensive validation or error handling, as they are local developer tools where the developer controls the input and execution environment.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-03-14T01:13:34.887Z
Learnt from: leite08
PR: metriport/metriport#3469
File: packages/utils/src/s3/list-objects.ts:23-27
Timestamp: 2025-03-14T01:13:34.887Z
Learning: Files under `./packages/utils` are permitted to use `console.log` directly and are not required to use `out().log` instead, as they're exempt from certain coding standards that apply to the rest of the codebase.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-04-10T16:01:21.135Z
Learnt from: leite08
PR: metriport/metriport#3648
File: packages/utils/src/csv/find-dups.ts:14-14
Timestamp: 2025-04-10T16:01:21.135Z
Learning: Files under `packages/utils` are allowed to have empty strings for input parameters as these are meant to be manually modified by developers before running scripts in their local environments.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-08-15T18:57:28.999Z
Learnt from: lucasdellabella
PR: metriport/metriport#4382
File: packages/utils/src/hl7v2-notifications/reprocess-adt-conversion-bundles/common.ts:36-48
Timestamp: 2025-08-15T18:57:28.999Z
Learning: In utility scripts like packages/utils/src/hl7v2-notifications/reprocess-adt-conversion-bundles/common.ts, user lucasdellabella prefers to let bundle parsing errors fail fast rather than adding error handling, as it provides immediate visibility into data issues during script execution.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-04-05T16:02:31.219Z
Learnt from: leite08
PR: metriport/metriport#3611
File: packages/utils/src/carequality/download-directory.ts:33-66
Timestamp: 2025-04-05T16:02:31.219Z
Learning: In the Metriport codebase, utility scripts in the packages/utils directory have different error handling requirements compared to production code, and do not necessarily need the same level of error handling.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-08-01T18:20:14.618Z
Learnt from: keshavsaharia
PR: metriport/metriport#4161
File: packages/utils/src/surescripts/compare/dr-first.ts:253-267
Timestamp: 2025-08-01T18:20:14.618Z
Learning: For one-off utility scripts in packages/utils/src/surescripts/, it's acceptable to ignore certain code quality issues like array modification during iteration, as these scripts prioritize functionality over perfect code patterns.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-05-26T16:24:13.656Z
Learnt from: leite08
PR: metriport/metriport#3859
File: packages/utils/src/consolidated/seed-patient-consolidated.ts:94-94
Timestamp: 2025-05-26T16:24:13.656Z
Learning: For utility scripts in the packages/utils directory, leite08 accepts simpler string replacement approaches (like replaceAll) even when more targeted parsing approaches would be safer, indicating different code quality standards apply to utility scripts versus production code.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-05-31T23:09:07.678Z
Learnt from: lucasdellabella
PR: metriport/metriport#3941
File: packages/utils/src/hl7v2-notifications/unpack-patient-roster-identifier.ts:66-68
Timestamp: 2025-05-31T23:09:07.678Z
Learning: Scripts and utility tools may have relaxed code standards compared to production application code. Type safety improvements and other refactoring suggestions should be applied with less rigor to debugging scripts.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-04-03T02:23:58.332Z
Learnt from: leite08
PR: metriport/metriport#3595
File: packages/utils/src/carequality/download-directory.ts:36-66
Timestamp: 2025-04-03T02:23:58.332Z
Learning: In the metriport codebase, utility scripts may not require the same level of error handling as production code, especially for simple or one-off tasks like in the CQ directory download script.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-06-10T13:41:50.425Z
Learnt from: lucasdellabella
PR: metriport/metriport#3987
File: packages/utils/src/local-bulk-insert-patients.ts:216-219
Timestamp: 2025-06-10T13:41:50.425Z
Learning: For utility scripts and testing scripts in the packages/utils directory, prefer simpler error handling rather than detailed contextual error messages, as these are typically used internally and don't require the same level of robustness as production code.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-06-26T06:27:07.109Z
Learnt from: keshavsaharia
PR: metriport/metriport#4099
File: packages/utils/src/surescripts/shared.ts:52-63
Timestamp: 2025-06-26T06:27:07.109Z
Learning: For utility scripts in packages/utils, comprehensive error handling (like try-catch blocks for S3 operations and JSON parsing) is not necessary, as immediate failures can be more helpful for developers running these one-off tools.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-04-24T02:17:19.333Z
Learnt from: leite08
PR: metriport/metriport#3412
File: packages/utils/src/patient-import/send-webhook.ts:36-40
Timestamp: 2025-04-24T02:17:19.333Z
Learning: Utility scripts in the `packages/utils` directory should include proper error handling with try/catch blocks around their main functions, logging errors and exiting with non-zero status codes upon failure.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-05-27T16:10:48.223Z
Learnt from: lucasdellabella
PR: metriport/metriport#3866
File: packages/core/src/command/hl7v2-subscriptions/hl7v2-roster-generator.ts:142-151
Timestamp: 2025-05-27T16:10:48.223Z
Learning: In packages/core/src/command/hl7v2-subscriptions/hl7v2-roster-generator.ts, the user prefers to let axios errors bubble up naturally rather than adding try-catch blocks that re-throw with MetriportError wrappers, especially when the calling code already has retry mechanisms in place via simpleExecuteWithRetries.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-07-16T18:48:28.251Z
Learnt from: leite08
PR: metriport/metriport#4095
File: packages/commonwell-cert-runner/src/single-commands/document-parse.ts:14-24
Timestamp: 2025-07-16T18:48:28.251Z
Learning: For utility scripts in the packages/commonwell-cert-runner directory, comprehensive error handling (like try-catch blocks for file operations and JSON parsing) is not necessary, as these are certification testing tools rather than production code.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-08-11T16:17:40.913Z
Learnt from: keshavsaharia
PR: metriport/metriport#4342
File: packages/core/src/external/quest/command/bundle/get-bundle.ts:0-0
Timestamp: 2025-08-11T16:17:40.913Z
Learning: In packages/core/src/external/quest/command/bundle/get-bundle.ts, the Quest lab conversion files in S3 are guaranteed to contain valid JSON after the existence check, so JSON.parse() error handling is not necessary as these files are generated by a controlled internal process.

Applied to files:

  • packages/core/src/external/aws/lambda.ts
📚 Learning: 2025-06-25T01:55:27.902Z
Learnt from: thomasyopes
PR: metriport/metriport#4090
File: packages/core/src/command/conversion-fhir/conversion-fhir-cloud.ts:15-18
Timestamp: 2025-06-25T01:55:27.902Z
Learning: In the Metriport codebase, JSON.stringify operations on Lambda payload construction with ConverterRequest objects (containing string payload and FhirConverterParams) are safe and will not throw, as they contain only simple serializable data types.

Applied to files:

  • packages/core/src/external/aws/lambda.ts
📚 Learning: 2025-07-09T20:41:08.549Z
Learnt from: thomasyopes
PR: metriport/metriport#4167
File: packages/infra/lib/analytics-platform/analytics-platform-stack.ts:56-64
Timestamp: 2025-07-09T20:41:08.549Z
Learning: In the analytics platform Snowflake integration (packages/infra/lib/analytics-platform/analytics-platform-stack.ts), the `integrationUserArn` field in the config temporarily contains an AWS account ID rather than an ARN. This is planned to be updated to an actual ARN in a second pass, at which point the IAM Role's assumedBy should be changed from AccountPrincipal to ArnPrincipal.

Applied to files:

  • packages/infra/lib/analytics-platform/analytics-platform-stack.ts
📚 Learning: 2025-06-10T22:20:21.203Z
Learnt from: leite08
PR: metriport/metriport#3991
File: packages/core/src/external/aws/s3.ts:621-630
Timestamp: 2025-06-10T22:20:21.203Z
Learning: User leite08 prefers that all review comments are written in English.

Applied to files:

  • packages/utils/src/fhir/bundle/find_with_custom_filter.sh
🧬 Code Graph Analysis (7)
packages/utils/src/analytics-platform/2-merge-csvs.ts (4)
packages/shared/src/common/date.ts (1)
  • buildDayjs (93-95)
packages/shared/src/common/env-var.ts (1)
  • getEnvVarOrFail (14-20)
packages/lambdas/src/analytics-platform/merge-csvs.ts (1)
  • GroupAndMergeCSVsParamsLambda (19-22)
packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts (2)
  • buildFhirToCsvJobPrefix (7-9)
  • parsePatientIdFromFhirToCsvPatientPrefix (29-37)
packages/core/src/command/analytics-platform/merge-csvs/index.ts (6)
packages/shared/src/util/uuid-v7.ts (1)
  • uuidv4 (384-386)
packages/core/src/util/log.ts (1)
  • out (30-35)
packages/core/src/external/aws/s3.ts (2)
  • S3Utils (140-600)
  • executeWithRetriesS3 (94-111)
packages/core/src/command/analytics-platform/merge-csvs/file-name.ts (3)
  • buildMergeCsvsJobPrefix (5-7)
  • buildMergeInfoPrefix (9-11)
  • buildMergeCsvsFileGroupKey (13-24)
packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts (2)
  • buildFhirToCsvPatientPrefix (11-21)
  • parseTableNameFromFhirToCsvFileKey (23-27)
packages/shared/src/index.ts (1)
  • errorToString (45-45)
packages/utils/src/consolidated/bulk-recreate-consolidated.ts (1)
packages/utils/src/shared/duration.ts (1)
  • elapsedTimeAsStr (8-13)
packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts (6)
packages/shared/src/common/env-var.ts (1)
  • getEnvVarOrFail (14-20)
packages/core/src/external/aws/s3.ts (1)
  • S3Utils (140-600)
packages/core/src/command/analytics-platform/merge-csvs/file-name.ts (1)
  • buildMergeCsvsJobPrefix (5-7)
packages/shared/src/common/date.ts (1)
  • buildDayjs (93-95)
packages/utils/src/shared/duration.ts (1)
  • elapsedTimeAsStr (8-13)
packages/core/src/external/snowflake/commands.ts (3)
  • promisifyConnect (3-15)
  • promisifyExecute (31-53)
  • promisifyDestroy (17-29)
packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts (1)
packages/shared/src/index.ts (1)
  • MetriportError (43-43)
packages/core/src/external/aws/lambda.ts (2)
packages/core/src/util/log.ts (1)
  • out (30-35)
packages/shared/src/index.ts (3)
  • MetriportError (43-43)
  • BadRequestError (42-42)
  • NotFoundError (44-44)
packages/infra/lib/analytics-platform/analytics-platform-stack.ts (3)
packages/infra/lib/shared/settings.ts (1)
  • QueueAndLambdaSettings (15-49)
packages/infra/lib/analytics-platform/types.ts (1)
  • AnalyticsPlatformsAssets (4-9)
packages/infra/config/env-config.ts (1)
  • EnvConfigNonSandbox (303-316)
🪛 LanguageTool
packages/core/src/external/aws/lambda-logic/README.md

[grammar] ~3-~3: There might be a mistake here.
Context: ...in this folder, we will have a hard time maintaining it, all lambdas' logic will ...

(QB_NEW_EN)


[grammar] ~6-~6: There might be a mistake here.
Context: ...ommands/functions that are stored on the proper folder for each use case/business...

(QB_NEW_EN)


[grammar] ~7-~7: There might be a mistake here.
Context: ...folder for each use case/business logic.

(QB_NEW_EN)

packages/utils/src/analytics-platform/3.readme.md

[grammar] ~1-~1: There might be a mistake here.
Context: ...-folder for a script starting with "3-".

(QB_NEW_EN)

🪛 Shellcheck (0.10.0)
packages/utils/src/fhir/bundle/find_with_custom_filter.sh

[warning] 103-103: Declare and assign separately to avoid masking return values.

(SC2155)

🔇 Additional comments (5)
packages/utils/src/consolidated/bulk-recreate-consolidated.ts (1)

128-132: LGTM: nicer final elapsed time via elapsedTimeAsStr

Using the shared elapsedTimeAsStr utility improves readability and consistency across scripts.

packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts (2)

49-55: Argument validation LGTM

Good fail-fast guard with clear usage message when mergeCsvJobId is missing.


164-168: Fail-fast table creation aligns with bulk-ingest integrity goals

Omitting IF NOT EXISTS here is the right call to prevent accidental duplicate loads.

packages/lambdas/package.json (1)

31-31: Lambda v3 client addition looks good.

Bringing in @aws-sdk/client-lambda v3 aligns with the new Lambda+SQS workflow and core v3 helpers.

packages/core/src/command/analytics-platform/merge-csvs/index.ts (1)

342-369: Good approach to error logging for bulk operations

The error handling appropriately logs details for debugging while throwing a simple error for the caller, which is suitable for bulk upload operations as confirmed by previous learnings.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
packages/utils/src/fhir/bundle/find_with_custom_filter.sh (2)

114-127: Use jq exit status directly; remove brittle string comparison and fix SC2155.

This was raised before and still applies: rely on jq -e exit status rather than capturing output and comparing to "true". Also avoids SC2155.

-# Function to check if a file matches the custom filter
-check_file_with_filter() {
-    local file="$1"
-    local filter="$2"
-    
-    # Apply the custom jq filter to the file
-    local result=$(jq -e "$filter" "$file" 2>/dev/null)
-    
-    if [ "$result" = "true" ]; then
-        return 0  # File matches the filter
-    fi
-    
-    return 1  # File doesn't match the filter
-}
+# Function to check if a file matches the custom filter
+check_file_with_filter() {
+    local file="$1"
+    local filter="$2"
+    # Truthy -> exit 0, false/null -> non-zero
+    if jq -e "$filter" "$file" >/dev/null 2>&1; then
+        return 0
+    fi
+    return 1
+}

139-161: Make file iteration NUL-safe (handles spaces/newlines in filenames).

Iterating for file in $json_files breaks on spaces/newlines and can overflow argv. Prefer find -print0 with a NUL-delimited read loop; also lets you count robustly.

-    # Find all JSON files in the directory and subdirectories up to maxdepth
-    json_files=$(find "$DIRECTORY" -maxdepth "$MAXDEPTH" -name "*.json" -type f | sort)
-    
-    if [ -z "$json_files" ]; then
-        echo "No JSON files found in directory: $DIRECTORY"
-        exit 0
-    fi
+    # Iterate JSON files (NUL-delimited for safety)
+    scanned_count=0
@@
-    # Count number of files found
-    file_count=$(echo "$json_files" | wc -l)
+    # Count number of files found (after scanning)
+    file_count="$scanned_count"
@@
-    # Process each JSON file
-    for file in $json_files; do
+    # Process each JSON file
+    while IFS= read -r -d '' file; do
+        scanned_count=$((scanned_count + 1))
         # Get relative path from the search directory for better identification
         if [ "$DIRECTORY" = "." ]; then
             filename="$file"
         else
             filename="${file#$DIRECTORY/}"
         fi
@@
-    done
+    done < <(find "$DIRECTORY" -maxdepth "$MAXDEPTH" -type f -name "*.json" -print0)
+    
+    if [ "$scanned_count" -eq 0 ]; then
+        echo "No JSON files found in directory: $DIRECTORY"
+        exit 0
+    fi

Also applies to: 147-149

🧹 Nitpick comments (15)
packages/utils/src/fhir/bundle/extract_resource_ids.sh (4)

97-103: Use jq’s exit code to test filter truthiness (avoid brittle string comparisons).

jq -e already exits 0 for truthy and non-zero for false/null. Avoid capturing output and comparing to the string "false".

-        # First check if the filter matches any resources
-        local has_matches
-        has_matches=$(jq -r "$filter" "$file" 2>/dev/null || echo "false")
-        
-        if [[ "$has_matches" == "false" ]]; then
-            echo "No resources match the specified filter criteria."
-            return 0
-        fi
+        # First check if the filter matches any resources
+        if ! jq -e "$filter" "$file" >/dev/null 2>&1; then
+            echo "No resources match the specified filter criteria."
+            return 0
+        fi

119-125: CSV output: avoid sed concatenation; generate proper CSV via jq to handle quoting safely.

Using sed to prepend the resource type is fragile (quoting and commas). Let jq build the CSV rows.

-            if [[ -n "$resource_type" ]]; then
-                echo "resource_type,id"
-                jq -r "$jq_query" "$file" 2>/dev/null | grep -v "^null$" | sort -u | sed "s/^/$resource_type,/"
+            if [[ -n "$resource_type" ]]; then
+                echo "resource_type,id"
+                jq -r --arg rt "$resource_type" "$jq_query | select(. != null) | [\$rt, .] | @csv" "$file" 2>/dev/null | sort -u
             else
                 echo "resource_type,id"
                 jq -r ".entry[] | [.resource.resourceType, .resource.id] | @csv" "$file" 2>/dev/null | grep -v "null" | sort -u
             fi

61-74: Filter validation is overly restrictive; loosen contract or document expectations.

Requiring both “.entry” and “select(…)” in the filter rejects valid boolean filters such as .entry | length > 0. Consider relaxing the “must use select()” check, or document that -j must be an entry-level predicate vs. a bundle-level boolean. If you keep the boolean “gate” design, recommend only checking that the filter references .entry.


86-93: Guard against query injection in jq by avoiding direct string interpolation.

Embedding $resource_type inside the jq string can break if it contains quotes or special chars (low risk in practice, but easy to harden). Prefer --arg with a jq variable.

Example pattern (for when you refactor jq_query building):

  • Use: jq -r --arg rt "$resource_type" '.entry[] | select(.resource.resourceType == $rt) | .resource.id'
  • Avoid: ... == \"$resource_type\"

Not blocking, but worth tightening when you next touch this function.

packages/utils/src/fhir/bundle/find_with_custom_filter.sh (2)

30-51: Help should exit 0; let show_usage accept an exit code and update call sites.

Currently -h|--help exits with code 1 because show_usage unconditionally exits 1. Return 0 for help and 1 for errors.

-# Function to show usage
-show_usage() {
+# Function to show usage
+show_usage() {
+    local exit_code="${1:-1}"
     echo "Usage: $0 (-d <directory> | -f <file>) -j <jq_filter> [--maxdepth <depth>]"
@@
-    exit 1
+    exit "$exit_code"
 }

Update call sites:

-        -h|--help)
-            show_usage
+        -h|--help)
+            show_usage 0
             ;;
@@
-            show_usage
+            show_usage 1
             ;;
@@
-    show_usage
+    show_usage 1
@@
-    show_usage
+    show_usage 1

Also applies to: 77-79, 81-83, 90-96, 100-103


1-3: Optional: add set -euo pipefail for robustness.

For small utility scripts this can prevent subtle bugs (unset vars, pipeline masking). Only if you’re comfortable with the stricter semantics.

 #!/bin/bash
 
+# Safer bash settings
+set -euo pipefail
packages/utils/src/consolidated/bulk-recreate-consolidated.ts (3)

59-61: Align minimum delay with the OS re-ingest caution (set min ≥ 1s).

Docs advise not going under 1 second, but minimumDelayTime is set to 500 ms. This can oversaturate OS re-indexing, especially with concurrency > 1.

Apply this diff:

-const minimumDelayTime = dayjs.duration(500, "milliseconds");
+const minimumDelayTime = dayjs.duration(1, "second");
 const defaultDelayTime = dayjs.duration(2, "seconds");

118-129: Per-item sleep + high concurrency may produce higher-than-intended QPS.

With 30 workers each sleeping ~2s between items, effective throughput can still be quite high. If OS re-ingest pressure is a concern, consider a global rate limiter or reduce concurrency to complement the delay.


138-143: Exit with non-zero code when errors occurred.

Currently the script always exits with code 0, even if some patients failed. This makes automation blind to partial failures.

Apply this diff:

-  log(
-    `############# Done recreating consolidated for all ${
-      patientIds.length
-    } patients in ${elapsedTimeAsStr(startedAt)}`
-  );
-  process.exit(0);
+  log(
+    `############# Done recreating consolidated for all ${
+      patientIds.length
+    } patients in ${elapsedTimeAsStr(startedAt)}`
+  );
+  const exitCode = patientsWithErrors.length > 0 ? 1 : 0;
+  process.exit(exitCode);
packages/core/src/external/aws/s3.ts (2)

550-575: Broaden listObjects options type for consistency and flexibility.

Limiting to { maxAttempts?: number } prevents passing log or other retry controls supported by ExecuteWithRetriesOptions. Use a partial of that type.

Apply this diff:

-  async listObjects(
-    bucket: string,
-    prefix: string,
-    options: { maxAttempts?: number } = {}
-  ): Promise<AWS.S3.ObjectList> {
+  async listObjects(
+    bucket: string,
+    prefix: string,
+    options: Partial<ExecuteWithRetriesOptions<AWS.S3.ListObjectsV2Output>> = {}
+  ): Promise<AWS.S3.ObjectList> {

577-605: Expose retry options in listFirstLevelSubdirectories to match listObjects.

For parity and operational control under load, allow callers to tweak retries/logging here as well.

Apply this diff:

-  async listFirstLevelSubdirectories({
-    bucket,
-    prefix,
-    delimiter = "/",
-  }: {
-    bucket: string;
-    prefix: string;
-    delimiter?: string | undefined;
-  }): Promise<AWS.S3.CommonPrefixList> {
+  async listFirstLevelSubdirectories({
+    bucket,
+    prefix,
+    delimiter = "/",
+    options,
+  }: {
+    bucket: string;
+    prefix: string;
+    delimiter?: string | undefined;
+    options?: Partial<ExecuteWithRetriesOptions<AWS.S3.ListObjectsV2Output>>;
+  }): Promise<AWS.S3.CommonPrefixList> {
     const allObjects: AWS.S3.CommonPrefix[] = [];
     let continuationToken: string | undefined;
     do {
-      const res = await executeWithRetriesS3(() =>
-        this._s3
-          .listObjectsV2({
-            Bucket: bucket,
-            Prefix: prefix,
-            ...(delimiter ? { Delimiter: delimiter } : {}),
-            ...(continuationToken ? { ContinuationToken: continuationToken } : {}),
-          })
-          .promise()
-      );
+      const res = await executeWithRetriesS3(
+        () =>
+          this._s3
+            .listObjectsV2({
+              Bucket: bucket,
+              Prefix: prefix,
+              ...(delimiter ? { Delimiter: delimiter } : {}),
+              ...(continuationToken ? { ContinuationToken: continuationToken } : {}),
+            })
+            .promise(),
+        options
+      );

Optional: consider migrating both list functions to the v3 client for consistency with other v3 usages (e.g., presigned upload, copy, delete). This is not critical if v2 suffices operationally.

packages/utils/src/consolidated/download-consolidated.ts (4)

19-27: Fix typo and tighten wording in header comment.

“debuging” → “debugging”. Also clarify this script downloads bundles (it doesn’t recreate).

Apply this diff:

- * This script downloads consolidated bundles from S3 to the local filesystem.
- * Used for debuging purposes only, not to be used in production.
+ * This script downloads consolidated bundles from S3 to the local filesystem.
+ * Used for debugging purposes only; do not use in production.

47-51: Add a guard for empty patientIds to avoid a no-op run.

Short-circuit with a helpful message and non-zero exit to avoid confusion when the list is empty.

Apply this diff:

   await displayWarningAndConfirmation(cxId, patientIds.length, log);
 
   log(`>>> Using bucket ${medicalDocsBucketName}`);
   const outputDir = getOutputFileName(cxId);
   makeDirIfNeeded(outputDir);
+  if (patientIds.length === 0) {
+    log(">>> No patient IDs provided. Please add patient IDs to the patientIds array.");
+    process.exit(1);
+  }

61-63: Remove extra “ms” suffix; elapsedTimeAsStr already formats units.

The current log prints “… millis / … minms”. Drop the “ms”.

Apply this diff:

-  log(`>>> Done in ${elapsedTimeAsStr(startedAt)}ms`);
+  log(`>>> Done in ${elapsedTimeAsStr(startedAt)}`);

73-85: Handle S3 read failures per patient to avoid aborting the whole run.

A failure in getFileContentsAsString will reject the whole executeAsynchronously call. For a debugging tool, it’s nicer to log and continue with the next patient.

Apply this diff:

-  log(`>>> Getting contents from ${fileKey}...`);
-  const contents = await s3Utils.getFileContentsAsString(medicalDocsBucketName, fileKey);
-
-  try {
-    log(`>>> Storing bundle...`);
-    const filePath = `${outputDir}/${patientId}_CONSOLIDATED_DATA.json`;
-    makeDirIfNeeded(filePath);
-    fs.writeFileSync(filePath, contents, "utf8");
-  } catch (error) {
-    log(`>>> Error: ${error}`);
-    process.exit(1);
-  }
+  log(`>>> Getting contents from ${fileKey}...`);
+  try {
+    const contents = await s3Utils.getFileContentsAsString(medicalDocsBucketName, fileKey);
+    log(`>>> Storing bundle...`);
+    const filePath = `${outputDir}/${patientId}_CONSOLIDATED_DATA.json`;
+    makeDirIfNeeded(filePath);
+    fs.writeFileSync(filePath, contents, "utf8");
+  } catch (error) {
+    log(`>>> Error fetching or storing ${fileKey}: ${String(error)}`);
+    // Continue with the next patient
+    return;
+  }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled
  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0743148 and 2ac735a.

📒 Files selected for processing (8)
  • packages/api/src/routes/internal/analytics-platform/index.ts (0 hunks)
  • packages/core/src/command/analytics-platform/merge-csvs/index.ts (1 hunks)
  • packages/core/src/command/consolidated/search/document-reference/ingest.ts (0 hunks)
  • packages/core/src/external/aws/s3.ts (1 hunks)
  • packages/utils/src/consolidated/bulk-recreate-consolidated.ts (5 hunks)
  • packages/utils/src/consolidated/download-consolidated.ts (1 hunks)
  • packages/utils/src/fhir/bundle/extract_resource_ids.sh (1 hunks)
  • packages/utils/src/fhir/bundle/find_with_custom_filter.sh (1 hunks)
💤 Files with no reviewable changes (2)
  • packages/core/src/command/consolidated/search/document-reference/ingest.ts
  • packages/api/src/routes/internal/analytics-platform/index.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{js,jsx,ts,tsx}

📄 CodeRabbit Inference Engine (.cursorrules)

**/*.{js,jsx,ts,tsx}: Don’t use null inside the app, only on code interacting with external interfaces/services, like DB and HTTP; convert to undefined before sending inwards into the code
Use const whenever possible
Use async/await instead of .then()
Naming: classes, enums: PascalCase
Naming: constants, variables, functions: camelCase
Naming: file names: kebab-case
Naming: Don’t use negative names, like notEnabled, prefer isDisabled
If possible, use decomposing objects for function parameters
Prefer Nullish Coalesce (??) than the OR operator (||) when you want to provide a default value
Avoid creating arrow functions
Use truthy syntax instead of in - i.e., if (data.link) not if ('link' in data)
While handling errors, keep the stack trace around: if you create a new Error (e.g., MetriportError), make sure to pass the original error as the new one’s cause so the stack trace is available upstream.
max column length is 100 chars
multi-line comments use /** */
top-level comments go after the import (save pre-import to basic file header, like license)
move literals to constants declared after imports when possible

Files:

  • packages/utils/src/consolidated/download-consolidated.ts
  • packages/core/src/external/aws/s3.ts
  • packages/utils/src/consolidated/bulk-recreate-consolidated.ts
**/*.{ts,tsx}

📄 CodeRabbit Inference Engine (.cursorrules)

Use types whenever possible

Files:

  • packages/utils/src/consolidated/download-consolidated.ts
  • packages/core/src/external/aws/s3.ts
  • packages/utils/src/consolidated/bulk-recreate-consolidated.ts
**/*.ts

⚙️ CodeRabbit Configuration File

**/*.ts: - Use the Onion Pattern to organize a package's code in layers

  • Try to use immutable code and avoid sharing state across different functions, objects, and systems
  • Try to build code that's idempotent whenever possible
  • Prefer functional programming style functions: small, deterministic, 1 input, 1 output
  • Minimize coupling / dependencies
  • Avoid modifying objects received as parameter
  • Only add comments to code to explain why something was done, not how it works
  • Naming
    • classes, enums: PascalCase
    • constants, variables, functions: camelCase
    • file names: kebab-case
    • table and column names: snake_case
    • Use meaningful names, so whoever is reading the code understands what it means
    • Don’t use negative names, like notEnabled, prefer isDisabled
    • For numeric values, if the type doesn’t convey the unit, add the unit to the name
  • Typescript
    • Use types
    • Prefer const instead of let
    • Avoid any and casting from any to other types
    • Type predicates: only applicable to narrow down the type, not to force a complete type conversion
    • Prefer deconstructing parameters for functions instead of multiple parameters that might be of
      the same type
    • Don’t use null inside the app, only on code interacting with external interfaces/services,
      like DB and HTTP; convert to undefined before sending inwards into the code
    • Use async/await instead of .then()
    • Use the strict equality operator ===, don’t use abstract equality operator ==
    • When calling a Promise-returning function asynchronously (i.e., not awaiting), use .catch() to
      handle errors (see processAsyncError and emptyFunction depending on the case)
    • Date and Time
      • Always use buildDayjs() to create dayjs instances
      • Prefer dayjs.duration(...) to create duration consts and keep them as duration
  • Prefer Nullish Coalesce (??) than the OR operator (||) to provide a default value
  • Avoid creating arrow functions
  • U...

Files:

  • packages/utils/src/consolidated/download-consolidated.ts
  • packages/core/src/external/aws/s3.ts
  • packages/utils/src/consolidated/bulk-recreate-consolidated.ts
🧠 Learnings (5)
📓 Common learnings
Learnt from: leite08
PR: metriport/metriport#4255
File: packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts:19-20
Timestamp: 2025-07-27T00:40:32.149Z
Learning: User leite08 acknowledged the duplicate inputBundle spread issue in packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts and will fix it in a follow-up PR rather than in the current patch release PR #4255.
Learnt from: leite08
PR: metriport/metriport#4013
File: packages/core/src/fhir-to-cda/cda-templates/components/procedures.ts:104-105
Timestamp: 2025-06-12T22:53:09.606Z
Learning: User leite08 prefers responses in English only.
Learnt from: leite08
PR: metriport/metriport#3991
File: packages/core/src/external/aws/s3.ts:621-630
Timestamp: 2025-06-10T22:20:21.203Z
Learning: User leite08 prefers that all review comments are written in English.
📚 Learning: 2025-05-20T21:26:26.804Z
Learnt from: leite08
PR: metriport/metriport#3814
File: packages/api/src/routes/internal/medical/patient-consolidated.ts:141-174
Timestamp: 2025-05-20T21:26:26.804Z
Learning: The functionality introduced in packages/api/src/routes/internal/medical/patient-consolidated.ts is planned to be refactored in downstream PR #3857, including improvements to error handling and validation.

Applied to files:

  • packages/utils/src/consolidated/download-consolidated.ts
  • packages/utils/src/consolidated/bulk-recreate-consolidated.ts
📚 Learning: 2025-06-18T00:18:59.144Z
Learnt from: leite08
PR: metriport/metriport#4043
File: packages/core/src/command/consolidated/consolidated-create.ts:0-0
Timestamp: 2025-06-18T00:18:59.144Z
Learning: In the consolidated bundle creation process (packages/core/src/command/consolidated/consolidated-create.ts), the team prefers strict failure behavior where if any component (including pharmacy resources) fails to be retrieved, the entire consolidation should fail rather than gracefully degrading. This ensures data consistency and prevents incomplete bundles.

Applied to files:

  • packages/utils/src/consolidated/download-consolidated.ts
📚 Learning: 2025-07-27T00:40:32.149Z
Learnt from: leite08
PR: metriport/metriport#4255
File: packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts:19-20
Timestamp: 2025-07-27T00:40:32.149Z
Learning: User leite08 acknowledged the duplicate inputBundle spread issue in packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts and will fix it in a follow-up PR rather than in the current patch release PR #4255.

Applied to files:

  • packages/utils/src/fhir/bundle/find_with_custom_filter.sh
📚 Learning: 2025-06-10T22:20:21.203Z
Learnt from: leite08
PR: metriport/metriport#3991
File: packages/core/src/external/aws/s3.ts:621-630
Timestamp: 2025-06-10T22:20:21.203Z
Learning: User leite08 prefers that all review comments are written in English.

Applied to files:

  • packages/utils/src/fhir/bundle/find_with_custom_filter.sh
🧬 Code Graph Analysis (2)
packages/utils/src/consolidated/download-consolidated.ts (6)
packages/shared/src/common/env-var.ts (1)
  • getEnvVarOrFail (14-20)
packages/core/src/external/aws/s3.ts (1)
  • S3Utils (140-606)
packages/core/src/util/log.ts (1)
  • out (30-35)
packages/shared/src/common/date.ts (1)
  • buildDayjs (93-95)
packages/core/src/util/fs.ts (1)
  • makeDirIfNeeded (71-75)
packages/utils/src/shared/duration.ts (1)
  • elapsedTimeAsStr (8-13)
packages/utils/src/consolidated/bulk-recreate-consolidated.ts (3)
packages/shared/src/common/date.ts (1)
  • buildDayjs (93-95)
packages/core/src/util/fs.ts (1)
  • getFileContents (51-53)
packages/utils/src/shared/duration.ts (1)
  • elapsedTimeAsStr (8-13)
🪛 Shellcheck (0.10.0)
packages/utils/src/fhir/bundle/find_with_custom_filter.sh

[warning] 120-120: Declare and assign separately to avoid masking return values.

(SC2155)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: Analyze (javascript)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/utils/src/lambda/execute-lambda.ts (1)

39-46: Decode v3 Payload properly; .toString() on Uint8Array logs “[object Uint8Array]”.

For error details, decode the Uint8Array. You already expose a helper from core.

-import {
-  getLambdaResultPayloadV3,
-  logResultToString,
-  makeLambdaClientV3,
-} from "@metriport/core/external/aws/lambda";
+import {
+  getLambdaResultPayloadV3,
+  logResultToString,
+  makeLambdaClientV3,
+  resultV3ToString,
+} from "@metriport/core/external/aws/lambda";
@@
   if (lambdaResult.StatusCode !== 200) {
     throw new MetriportError("Lambda invocation failed", undefined, {
       lambdaName,
       status: lambdaResult.StatusCode,
       log: logResultToString(lambdaResult.LogResult),
-      payload: lambdaResult.Payload?.toString(),
+      payload: lambdaResult.Payload ? resultV3ToString(lambdaResult.Payload) : undefined,
     });
   }
♻️ Duplicate comments (2)
packages/utils/src/analytics-platform/util/check-merge-groups.ts (2)

93-103: Count duplicates by unique file references; current logic yields false positives

As written, a key repeated within the same groups file will be flagged as a duplicate. Compare unique file names and align the count accordingly.

Apply this diff:

   private findDuplicates(): void {
-    this.keyMap.forEach((files, key) => {
-      if (files.length > 1) {
-        this.duplicateKeys.push({
-          key,
-          files: [...new Set(files)], // Remove duplicate file references
-          count: files.length,
-        });
-      }
-    });
+    this.keyMap.forEach((files, key) => {
+      const uniqueFiles = [...new Set(files)];
+      if (uniqueFiles.length > 1) {
+        this.duplicateKeys.push({
+          key,
+          files: uniqueFiles,
+          count: uniqueFiles.length,
+        });
+      }
+    });
   }

190-194: Export JSON counts should match unique-file semantics

Ensure allKeys[].count reflects the number of unique files, consistent with the duplicate-detection rule.

Apply this diff:

-      allKeys: Array.from(this.keyMap.entries()).map(([key, files]) => ({
-        key,
-        files: [...new Set(files)],
-        count: files.length,
-      })),
+      allKeys: Array.from(this.keyMap.entries()).map(([key, files]) => {
+        const uniqueFiles = [...new Set(files)];
+        return {
+          key,
+          files: uniqueFiles,
+          count: uniqueFiles.length,
+        };
+      }),
🧹 Nitpick comments (7)
packages/utils/src/analytics-platform/util/check-merge-groups.ts (2)

75-83: Avoid pushing duplicate basenames per key to simplify downstream logic

De-duplicate per file while building keyMap. This reduces noise and keeps arrays small without changing behavior.

Apply this diff:

-      data.forEach(group => {
-        group.files.forEach(file => {
-          const key = file.key;
-          if (!this.keyMap.has(key)) {
-            this.keyMap.set(key, []);
-          }
-          // eslint-disable-next-line @typescript-eslint/no-non-null-assertion
-          this.keyMap.get(key)!.push(path.basename(filePath));
-        });
-      });
+      const basename = path.basename(filePath);
+      data.forEach(group => {
+        group.files.forEach(f => {
+          const key = f.key;
+          const filesArr = this.keyMap.get(key) ?? [];
+          if (!filesArr.includes(basename)) {
+            filesArr.push(basename);
+            this.keyMap.set(key, filesArr);
+          }
+        });
+      });

22-23: Extract repeated literals to constants (suffix and output filenames)

Improves maintainability and aligns with the “move literals to constants” guideline.

Apply these diffs:

  1. Add constants after the top-level comment:
@@
  *   > ts-node check-merge-groups.ts <path-to-merge-id>/_info
  * 
  */
+const GROUP_FILE_SUFFIX = "_groups.json";
+const TXT_REPORT_FILENAME = "duplicate-key-analysis.txt";
+const JSON_REPORT_FILENAME = "duplicate-key-analysis.json";
 
 interface FileEntry {
  1. Use the suffix constant when searching for files:
-        .filter(file => file.endsWith("_groups.json"))
+        .filter(file => file.endsWith(GROUP_FILE_SUFFIX))
  1. Use filename constants for report outputs:
-    const outputFile = path.join(this.targetFolder, "duplicate-key-analysis.txt");
+    const outputFile = path.join(this.targetFolder, TXT_REPORT_FILENAME);
-    const outputFile = path.join(this.targetFolder, "duplicate-key-analysis.json");
+    const outputFile = path.join(this.targetFolder, JSON_REPORT_FILENAME);

Also applies to: 57-58, 172-175, 197-200

packages/core/src/external/aws/lambda.ts (3)

63-68: Tighten overload types and make UTF‑8 explicit for v3 payload decoding.

Use NonNullable<> instead of Required<> (which doesn’t remove undefined from a union) and be explicit with encoding.

-export function resultV3ToString(result: Required<InvokeCommandOutput["Payload"]>): string;
+export function resultV3ToString(
+  result: NonNullable<InvokeCommandOutput["Payload"]>
+): string;
 export function resultV3ToString(result: undefined): undefined;
 export function resultV3ToString(result: InvokeCommandOutput["Payload"]): string | undefined {
   if (!result) return undefined;
-  return Buffer.from(result).toString();
+  return Buffer.from(result).toString("utf-8");
 }

80-84: Docstring is misleading: this checks for errors, not success.

Update the comment to avoid confusion.

-/**
- * Checks whether the lambda invocation was successful.
- */
+/**
+ * Checks whether the lambda invocation returned a FunctionError.
+ */
 export function isLambdaErrorV3(result: InvokeCommandOutput): boolean {
   return result.FunctionError !== undefined;
 }

229-229: Use a distinct log prefix for V3 variant.

Minor clarity improvement when skimming log output.

-  log = out("getLambdaResultPayload").log,
+  log = out("getLambdaResultPayloadV3").log,
packages/utils/src/lambda/execute-lambda.ts (2)

39-53: Avoid duplicate checks; rely on getLambdaResultPayloadV3 for status/payload validation.

You’re pre-validating StatusCode and Payload and then calling a helper that does the same. Consider delegating entirely to the helper for a single source of truth.

-  if (lambdaResult.StatusCode !== 200) {
-    throw new MetriportError("Lambda invocation failed", undefined, {
-      lambdaName,
-      status: lambdaResult.StatusCode,
-      log: logResultToString(lambdaResult.LogResult),
-      payload: lambdaResult.Payload ? resultV3ToString(lambdaResult.Payload) : undefined,
-    });
-  }
-  if (!lambdaResult.Payload) {
-    throw new MetriportError("Payload is undefined", undefined, {
-      lambdaName,
-      status: lambdaResult.StatusCode,
-      log: logResultToString(lambdaResult.LogResult),
-    });
-  }
-  const response = getLambdaResultPayloadV3({ result: lambdaResult, lambdaName });
+  const response = getLambdaResultPayloadV3({ result: lambdaResult, lambdaName });

If you still want the log payload included on failure, pass a logger that adds logResultToString(lambdaResult.LogResult) to the message, or catch and enrich the error in this script.


21-23: Parameterize the lambda name/payload for reusability.

Make this script usable without edits by reading from env vars when present.

-const lambdaName = "TesterLambda";
-const payload = {};
+const lambdaName = process.env.LAMBDA_NAME ?? "TesterLambda";
+const payload = process.env.LAMBDA_PAYLOAD_JSON
+  ? JSON.parse(process.env.LAMBDA_PAYLOAD_JSON)
+  : {};

Note: JSON.parse can throw; if you need friendlier errors, wrap it and surface a clear message.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled
  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2ac735a and e40e221.

⛔ Files ignored due to path filters (2)
  • package-lock.json is excluded by !**/package-lock.json
  • packages/lambdas/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (6)
  • packages/core/package.json (4 hunks)
  • packages/core/src/external/aws/lambda.ts (7 hunks)
  • packages/infra/lib/shared/settings.ts (2 hunks)
  • packages/lambdas/package.json (3 hunks)
  • packages/utils/src/analytics-platform/util/check-merge-groups.ts (1 hunks)
  • packages/utils/src/lambda/execute-lambda.ts (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/lambdas/package.json
  • packages/infra/lib/shared/settings.ts
  • packages/core/package.json
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{js,jsx,ts,tsx}

📄 CodeRabbit Inference Engine (.cursorrules)

**/*.{js,jsx,ts,tsx}: Don’t use null inside the app, only on code interacting with external interfaces/services, like DB and HTTP; convert to undefined before sending inwards into the code
Use const whenever possible
Use async/await instead of .then()
Naming: classes, enums: PascalCase
Naming: constants, variables, functions: camelCase
Naming: file names: kebab-case
Naming: Don’t use negative names, like notEnabled, prefer isDisabled
If possible, use decomposing objects for function parameters
Prefer Nullish Coalesce (??) than the OR operator (||) when you want to provide a default value
Avoid creating arrow functions
Use truthy syntax instead of in - i.e., if (data.link) not if ('link' in data)
While handling errors, keep the stack trace around: if you create a new Error (e.g., MetriportError), make sure to pass the original error as the new one’s cause so the stack trace is available upstream.
max column length is 100 chars
multi-line comments use /** */
top-level comments go after the import (save pre-import to basic file header, like license)
move literals to constants declared after imports when possible

Files:

  • packages/utils/src/analytics-platform/util/check-merge-groups.ts
  • packages/utils/src/lambda/execute-lambda.ts
  • packages/core/src/external/aws/lambda.ts
**/*.{ts,tsx}

📄 CodeRabbit Inference Engine (.cursorrules)

Use types whenever possible

Files:

  • packages/utils/src/analytics-platform/util/check-merge-groups.ts
  • packages/utils/src/lambda/execute-lambda.ts
  • packages/core/src/external/aws/lambda.ts
**/*.ts

⚙️ CodeRabbit Configuration File

**/*.ts: - Use the Onion Pattern to organize a package's code in layers

  • Try to use immutable code and avoid sharing state across different functions, objects, and systems
  • Try to build code that's idempotent whenever possible
  • Prefer functional programming style functions: small, deterministic, 1 input, 1 output
  • Minimize coupling / dependencies
  • Avoid modifying objects received as parameter
  • Only add comments to code to explain why something was done, not how it works
  • Naming
    • classes, enums: PascalCase
    • constants, variables, functions: camelCase
    • file names: kebab-case
    • table and column names: snake_case
    • Use meaningful names, so whoever is reading the code understands what it means
    • Don’t use negative names, like notEnabled, prefer isDisabled
    • For numeric values, if the type doesn’t convey the unit, add the unit to the name
  • Typescript
    • Use types
    • Prefer const instead of let
    • Avoid any and casting from any to other types
    • Type predicates: only applicable to narrow down the type, not to force a complete type conversion
    • Prefer deconstructing parameters for functions instead of multiple parameters that might be of
      the same type
    • Don’t use null inside the app, only on code interacting with external interfaces/services,
      like DB and HTTP; convert to undefined before sending inwards into the code
    • Use async/await instead of .then()
    • Use the strict equality operator ===, don’t use abstract equality operator ==
    • When calling a Promise-returning function asynchronously (i.e., not awaiting), use .catch() to
      handle errors (see processAsyncError and emptyFunction depending on the case)
    • Date and Time
      • Always use buildDayjs() to create dayjs instances
      • Prefer dayjs.duration(...) to create duration consts and keep them as duration
  • Prefer Nullish Coalesce (??) than the OR operator (||) to provide a default value
  • Avoid creating arrow functions
  • U...

Files:

  • packages/utils/src/analytics-platform/util/check-merge-groups.ts
  • packages/utils/src/lambda/execute-lambda.ts
  • packages/core/src/external/aws/lambda.ts
🧠 Learnings (4)
📓 Common learnings
Learnt from: leite08
PR: metriport/metriport#4255
File: packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts:19-20
Timestamp: 2025-07-27T00:40:32.149Z
Learning: User leite08 acknowledged the duplicate inputBundle spread issue in packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts and will fix it in a follow-up PR rather than in the current patch release PR #4255.
Learnt from: leite08
PR: metriport/metriport#4013
File: packages/core/src/fhir-to-cda/cda-templates/components/procedures.ts:104-105
Timestamp: 2025-06-12T22:53:09.606Z
Learning: User leite08 prefers responses in English only.
Learnt from: leite08
PR: metriport/metriport#3991
File: packages/core/src/external/aws/s3.ts:621-630
Timestamp: 2025-06-10T22:20:21.203Z
Learning: User leite08 prefers that all review comments are written in English.
📚 Learning: 2025-07-27T00:40:32.149Z
Learnt from: leite08
PR: metriport/metriport#4255
File: packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts:19-20
Timestamp: 2025-07-27T00:40:32.149Z
Learning: User leite08 acknowledged the duplicate inputBundle spread issue in packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts and will fix it in a follow-up PR rather than in the current patch release PR #4255.

Applied to files:

  • packages/utils/src/analytics-platform/util/check-merge-groups.ts
📚 Learning: 2025-08-11T16:17:40.913Z
Learnt from: keshavsaharia
PR: metriport/metriport#4342
File: packages/core/src/external/quest/command/bundle/get-bundle.ts:0-0
Timestamp: 2025-08-11T16:17:40.913Z
Learning: In packages/core/src/external/quest/command/bundle/get-bundle.ts, the Quest lab conversion files in S3 are guaranteed to contain valid JSON after the existence check, so JSON.parse() error handling is not necessary as these files are generated by a controlled internal process.

Applied to files:

  • packages/core/src/external/aws/lambda.ts
📚 Learning: 2025-06-25T01:55:27.902Z
Learnt from: thomasyopes
PR: metriport/metriport#4090
File: packages/core/src/command/conversion-fhir/conversion-fhir-cloud.ts:15-18
Timestamp: 2025-06-25T01:55:27.902Z
Learning: In the Metriport codebase, JSON.stringify operations on Lambda payload construction with ConverterRequest objects (containing string payload and FhirConverterParams) are safe and will not throw, as they contain only simple serializable data types.

Applied to files:

  • packages/core/src/external/aws/lambda.ts
🧬 Code Graph Analysis (2)
packages/utils/src/lambda/execute-lambda.ts (5)
packages/shared/src/common/env-var.ts (1)
  • getEnvVarOrFail (14-20)
packages/core/src/external/aws/lambda.ts (2)
  • makeLambdaClientV3 (28-40)
  • getLambdaResultPayloadV3 (224-264)
packages/shared/src/index.ts (1)
  • sleep (13-13)
packages/shared/src/common/date.ts (1)
  • buildDayjs (93-95)
packages/utils/src/shared/duration.ts (1)
  • elapsedTimeAsStr (8-13)
packages/core/src/external/aws/lambda.ts (2)
packages/core/src/util/log.ts (1)
  • out (30-35)
packages/shared/src/index.ts (3)
  • MetriportError (43-43)
  • BadRequestError (42-42)
  • NotFoundError (44-44)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: Analyze (javascript)
🔇 Additional comments (3)
packages/utils/src/analytics-platform/util/check-merge-groups.ts (1)

249-254: Usage message now matches the script name

The usage text correctly references check-merge-groups.ts. Thanks for addressing this.

packages/core/src/external/aws/lambda.ts (2)

111-120: LGTM: Properly decodes and surfaces v3 Lambda error details.

Parsing the decoded payload and returning structured error info is correct and consistent with the v2 behavior.


210-264: LGTM: V3 result handling mirrors v2 semantics with appropriate status and empty-payload checks.

The error mapping to BadRequestError/NotFoundError/MetriportError and graceful-fail behavior align well with the v2 path.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts (3)

20-47: Docs are outdated/inaccurate: remove references to patientIds and “max rows” arg; fix typo (“buy” → “by”).

The script only accepts mergeCsvJobId as a CLI argument and doesn’t support a “max rows per file” parameter. Also, references to setting patientIds are stale.

 /** 
  * Script to ingest compressed CSV files into Snowflake.
  *
- * It relies on the mergeCsvJobId to be passed as a parameter. This ID is determined buy the 2nd step in
+ * It relies on mergeCsvJobId passed as a parameter. This ID is determined by the 2nd step in
  * the flow, the script `2-merge-csvs.ts`.
  *
  * It relies on the merged CSV files to be on the configured bucket, with the following structure:
  *
  * ./snowflake/merged/cxId/patientId/jobId/_tmp_fhir-to-csv_output_cxId_patientId_resourceType.csv
  *
- * As a required runtime parameter, it needs the max number of rows to be used from each file found on S3.
- *
- * Set these constants based on the output from the merge-csvs lambda:
- * - patientIds
- * - mergeCsvJobId
- *
- * Set this constant to determine whether to include deletes while updating Snowflake, to strain
- * transaction control:
- * - includeDeletes
+ * Required CLI argument:
+ * - mergeCsvJobId (e.g., 2025-08-08T04-38-04)
+ *
+ * Note: this script ingests all rows from the merged CSVs; there is no “max rows per file” option.
  *
  * Set these env vars read with `getEnvVarOrFail`
  *
  * Run it with:
  * - ts-node src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts <mergeCsvJobId>
  *
  * Example:
  * - ts-node src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts 2025-08-08T04-38-04
  */

156-159: Avoid false positives when checking for resource files (match the folder segment).

Using includes(resourceType) can mis-detect (e.g., “condition” vs “condition_code_coding”). Match on “//” to check the correct S3 path segment.

-  if (!files.some(file => file.Key?.includes(resourceType))) {
+  if (!files.some(file => file.Key && file.Key.includes(`/${resourceType}/`))) {
     console.log(`>>> No files found for ${resourceType} - skipping`);
     return;
   }

169-185: Optional: use a distinct stage name and safe drop.

Not required, but using a dedicated stage name reduces cognitive collision with the table name, and IF EXISTS makes the drop idempotent if a previous attempt partially succeeded.

-  const createStageCmd =
-    `CREATE OR REPLACE TEMP STAGE ${tableName} STORAGE_INTEGRATION = ANALYTICS_BUCKET ` +
+  const stageName = `${tableName}_STAGE`;
+  const createStageCmd =
+    `CREATE OR REPLACE TEMP STAGE ${stageName} STORAGE_INTEGRATION = ANALYTICS_BUCKET ` +
     `URL = '${prefixUrl}/${resourceType}/'`;
   // console.log(`Create stage cmd: ${createStageCmd}`);
   await executeAsync(createStageCmd);

   console.log(`>>> Copying ${resourceType}...`);
   const startedAt = Date.now();
-  const copyCmd = `COPY INTO ${tableName} FROM @${tableName} FILE_FORMAT = gzip_csv_format ON_ERROR = ABORT_STATEMENT`;
+  const copyCmd = `COPY INTO ${tableName} FROM @${stageName} FILE_FORMAT = gzip_csv_format ON_ERROR = ABORT_STATEMENT`;
   // console.log(`Copy cmd: ${copyCmd}`);
   await executeAsync(copyCmd);
   console.log(`... Copied into ${resourceType} in ${elapsedTimeAsStr(startedAt)}`);

-  const dropStageCmd = `DROP STAGE ${tableName};`;
+  const dropStageCmd = `DROP STAGE IF EXISTS ${stageName};`;
   await executeAsync(dropStageCmd);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled
  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e40e221 and bba3cc1.

📒 Files selected for processing (1)
  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{js,jsx,ts,tsx}

📄 CodeRabbit Inference Engine (.cursorrules)

**/*.{js,jsx,ts,tsx}: Don’t use null inside the app, only on code interacting with external interfaces/services, like DB and HTTP; convert to undefined before sending inwards into the code
Use const whenever possible
Use async/await instead of .then()
Naming: classes, enums: PascalCase
Naming: constants, variables, functions: camelCase
Naming: file names: kebab-case
Naming: Don’t use negative names, like notEnabled, prefer isDisabled
If possible, use decomposing objects for function parameters
Prefer Nullish Coalesce (??) than the OR operator (||) when you want to provide a default value
Avoid creating arrow functions
Use truthy syntax instead of in - i.e., if (data.link) not if ('link' in data)
While handling errors, keep the stack trace around: if you create a new Error (e.g., MetriportError), make sure to pass the original error as the new one’s cause so the stack trace is available upstream.
max column length is 100 chars
multi-line comments use /** */
top-level comments go after the import (save pre-import to basic file header, like license)
move literals to constants declared after imports when possible

Files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
**/*.{ts,tsx}

📄 CodeRabbit Inference Engine (.cursorrules)

Use types whenever possible

Files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
**/*.ts

⚙️ CodeRabbit Configuration File

**/*.ts: - Use the Onion Pattern to organize a package's code in layers

  • Try to use immutable code and avoid sharing state across different functions, objects, and systems
  • Try to build code that's idempotent whenever possible
  • Prefer functional programming style functions: small, deterministic, 1 input, 1 output
  • Minimize coupling / dependencies
  • Avoid modifying objects received as parameter
  • Only add comments to code to explain why something was done, not how it works
  • Naming
    • classes, enums: PascalCase
    • constants, variables, functions: camelCase
    • file names: kebab-case
    • table and column names: snake_case
    • Use meaningful names, so whoever is reading the code understands what it means
    • Don’t use negative names, like notEnabled, prefer isDisabled
    • For numeric values, if the type doesn’t convey the unit, add the unit to the name
  • Typescript
    • Use types
    • Prefer const instead of let
    • Avoid any and casting from any to other types
    • Type predicates: only applicable to narrow down the type, not to force a complete type conversion
    • Prefer deconstructing parameters for functions instead of multiple parameters that might be of
      the same type
    • Don’t use null inside the app, only on code interacting with external interfaces/services,
      like DB and HTTP; convert to undefined before sending inwards into the code
    • Use async/await instead of .then()
    • Use the strict equality operator ===, don’t use abstract equality operator ==
    • When calling a Promise-returning function asynchronously (i.e., not awaiting), use .catch() to
      handle errors (see processAsyncError and emptyFunction depending on the case)
    • Date and Time
      • Always use buildDayjs() to create dayjs instances
      • Prefer dayjs.duration(...) to create duration consts and keep them as duration
  • Prefer Nullish Coalesce (??) than the OR operator (||) to provide a default value
  • Avoid creating arrow functions
  • U...

Files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
🧠 Learnings (26)
📓 Common learnings
Learnt from: leite08
PR: metriport/metriport#4255
File: packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts:19-20
Timestamp: 2025-07-27T00:40:32.149Z
Learning: User leite08 acknowledged the duplicate inputBundle spread issue in packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts and will fix it in a follow-up PR rather than in the current patch release PR #4255.
Learnt from: leite08
PR: metriport/metriport#4350
File: packages/utils/src/analytics-platform/2-merge-csvs.ts:58-61
Timestamp: 2025-08-20T12:40:34.053Z
Learning: In packages/utils/src/analytics-platform/2-merge-csvs.ts, the constant messageGroupId "merge-csvs" for SQS FIFO processing is intentional by user leite08, indicating they want sequential processing rather than concurrent chunk processing for the merge CSV workflow.
Learnt from: leite08
PR: metriport/metriport#4013
File: packages/core/src/fhir-to-cda/cda-templates/components/procedures.ts:104-105
Timestamp: 2025-06-12T22:53:09.606Z
Learning: User leite08 prefers responses in English only.
Learnt from: leite08
PR: metriport/metriport#3991
File: packages/core/src/external/aws/s3.ts:621-630
Timestamp: 2025-06-10T22:20:21.203Z
Learning: User leite08 prefers that all review comments are written in English.
📚 Learning: 2025-08-20T12:40:34.053Z
Learnt from: leite08
PR: metriport/metriport#4350
File: packages/utils/src/analytics-platform/2-merge-csvs.ts:58-61
Timestamp: 2025-08-20T12:40:34.053Z
Learning: In packages/utils/src/analytics-platform/2-merge-csvs.ts, the constant messageGroupId "merge-csvs" for SQS FIFO processing is intentional by user leite08, indicating they want sequential processing rather than concurrent chunk processing for the merge CSV workflow.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-08-15T03:59:27.025Z
Learnt from: leite08
PR: metriport/metriport#4350
File: packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts:159-161
Timestamp: 2025-08-15T03:59:27.025Z
Learning: User leite08 prefers fail-fast behavior for bulk import scripts in Snowflake. When tables already exist, the script should fail rather than continue, as this prevents accidental duplicate data ingestion. The `IF NOT EXISTS` clause should be omitted in bulk import scenarios to ensure data integrity.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-04-05T16:02:22.473Z
Learnt from: leite08
PR: metriport/metriport#3611
File: packages/utils/src/carequality/download-directory.ts:68-82
Timestamp: 2025-04-05T16:02:22.473Z
Learning: Utility scripts in the packages/utils directory don't require comprehensive error handling like production code.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-05-27T23:51:45.100Z
Learnt from: leite08
PR: metriport/metriport#3857
File: packages/utils/src/consolidated/filter-consolidated.ts:39-68
Timestamp: 2025-05-27T23:51:45.100Z
Learning: Utility scripts in the packages/utils directory don't require comprehensive error handling or logging changes (like switching from console.log to out().log), as they are meant for local/developer environments, not production code.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-03-14T00:30:18.860Z
Learnt from: leite08
PR: metriport/metriport#3468
File: packages/utils/src/s3/list-objects.ts:17-28
Timestamp: 2025-03-14T00:30:18.860Z
Learning: Files under `./packages/utils` folder are utility scripts and don't require the same coding standards used in other packages (like error handling, structured logging, etc.).

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-04-24T02:17:02.661Z
Learnt from: leite08
PR: metriport/metriport#3412
File: packages/utils/src/patient-import/send-webhook.ts:41-55
Timestamp: 2025-04-24T02:17:02.661Z
Learning: Files under `packages/utils` don't need robust error handling as they are utility scripts meant for local/developer environments, not production code.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-05-30T14:19:02.018Z
Learnt from: leite08
PR: metriport/metriport#3924
File: packages/utils/src/open-search/ingest-patients.ts:73-74
Timestamp: 2025-05-30T14:19:02.018Z
Learning: Utility scripts in the codebase don't need to strictly follow all coding guidelines, particularly around logging practices like avoiding multi-line logs or objects as second parameters.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-03-14T00:31:01.986Z
Learnt from: leite08
PR: metriport/metriport#3468
File: packages/utils/src/s3/list-objects.ts:12-13
Timestamp: 2025-03-14T00:31:01.986Z
Learning: Files under the `./packages/utils` folder are utility scripts and don't require the same coding standards as other packages. They often contain placeholder values meant to be filled in by developers before use.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-05-31T23:09:05.757Z
Learnt from: lucasdellabella
PR: metriport/metriport#3941
File: packages/utils/src/hl7v2-notifications/unpack-patient-roster-identifier.ts:61-64
Timestamp: 2025-05-31T23:09:05.757Z
Learning: Utility scripts in the utils package may have relaxed coding standards compared to production code, particularly for error handling. When reviewing debugging scripts or one-off utilities, apply less stringent requirements for error handling patterns.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-07-27T23:42:34.014Z
Learnt from: RamilGaripov
PR: metriport/metriport#4259
File: packages/utils/src/document/reconversion-kickoff-loader.ts:78-81
Timestamp: 2025-07-27T23:42:34.014Z
Learning: For utility scripts in packages/utils, RamilGaripov prefers to keep them simple without extensive validation or error handling, as they are local developer tools where the developer controls the input and execution environment.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-03-14T01:13:34.887Z
Learnt from: leite08
PR: metriport/metriport#3469
File: packages/utils/src/s3/list-objects.ts:23-27
Timestamp: 2025-03-14T01:13:34.887Z
Learning: Files under `./packages/utils` are permitted to use `console.log` directly and are not required to use `out().log` instead, as they're exempt from certain coding standards that apply to the rest of the codebase.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-04-10T16:01:21.135Z
Learnt from: leite08
PR: metriport/metriport#3648
File: packages/utils/src/csv/find-dups.ts:14-14
Timestamp: 2025-04-10T16:01:21.135Z
Learning: Files under `packages/utils` are allowed to have empty strings for input parameters as these are meant to be manually modified by developers before running scripts in their local environments.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-08-15T18:57:29.030Z
Learnt from: lucasdellabella
PR: metriport/metriport#4382
File: packages/utils/src/hl7v2-notifications/reprocess-adt-conversion-bundles/common.ts:36-48
Timestamp: 2025-08-15T18:57:29.030Z
Learning: In utility scripts like packages/utils/src/hl7v2-notifications/reprocess-adt-conversion-bundles/common.ts, user lucasdellabella prefers to let bundle parsing errors fail fast rather than adding error handling, as it provides immediate visibility into data issues during script execution.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-04-05T16:02:31.219Z
Learnt from: leite08
PR: metriport/metriport#3611
File: packages/utils/src/carequality/download-directory.ts:33-66
Timestamp: 2025-04-05T16:02:31.219Z
Learning: In the Metriport codebase, utility scripts in the packages/utils directory have different error handling requirements compared to production code, and do not necessarily need the same level of error handling.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-08-01T18:20:14.618Z
Learnt from: keshavsaharia
PR: metriport/metriport#4161
File: packages/utils/src/surescripts/compare/dr-first.ts:253-267
Timestamp: 2025-08-01T18:20:14.618Z
Learning: For one-off utility scripts in packages/utils/src/surescripts/, it's acceptable to ignore certain code quality issues like array modification during iteration, as these scripts prioritize functionality over perfect code patterns.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-07-27T00:40:32.149Z
Learnt from: leite08
PR: metriport/metriport#4255
File: packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts:19-20
Timestamp: 2025-07-27T00:40:32.149Z
Learning: User leite08 acknowledged the duplicate inputBundle spread issue in packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts and will fix it in a follow-up PR rather than in the current patch release PR #4255.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-05-26T16:24:13.656Z
Learnt from: leite08
PR: metriport/metriport#3859
File: packages/utils/src/consolidated/seed-patient-consolidated.ts:94-94
Timestamp: 2025-05-26T16:24:13.656Z
Learning: For utility scripts in the packages/utils directory, leite08 accepts simpler string replacement approaches (like replaceAll) even when more targeted parsing approaches would be safer, indicating different code quality standards apply to utility scripts versus production code.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-04-03T02:23:58.332Z
Learnt from: leite08
PR: metriport/metriport#3595
File: packages/utils/src/carequality/download-directory.ts:36-66
Timestamp: 2025-04-03T02:23:58.332Z
Learning: In the metriport codebase, utility scripts may not require the same level of error handling as production code, especially for simple or one-off tasks like in the CQ directory download script.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-05-31T23:09:07.678Z
Learnt from: lucasdellabella
PR: metriport/metriport#3941
File: packages/utils/src/hl7v2-notifications/unpack-patient-roster-identifier.ts:66-68
Timestamp: 2025-05-31T23:09:07.678Z
Learning: Scripts and utility tools may have relaxed code standards compared to production application code. Type safety improvements and other refactoring suggestions should be applied with less rigor to debugging scripts.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-06-10T13:41:50.425Z
Learnt from: lucasdellabella
PR: metriport/metriport#3987
File: packages/utils/src/local-bulk-insert-patients.ts:216-219
Timestamp: 2025-06-10T13:41:50.425Z
Learning: For utility scripts and testing scripts in the packages/utils directory, prefer simpler error handling rather than detailed contextual error messages, as these are typically used internally and don't require the same level of robustness as production code.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-06-26T06:27:07.109Z
Learnt from: keshavsaharia
PR: metriport/metriport#4099
File: packages/utils/src/surescripts/shared.ts:52-63
Timestamp: 2025-06-26T06:27:07.109Z
Learning: For utility scripts in packages/utils, comprehensive error handling (like try-catch blocks for S3 operations and JSON parsing) is not necessary, as immediate failures can be more helpful for developers running these one-off tools.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-04-24T02:17:19.333Z
Learnt from: leite08
PR: metriport/metriport#3412
File: packages/utils/src/patient-import/send-webhook.ts:36-40
Timestamp: 2025-04-24T02:17:19.333Z
Learning: Utility scripts in the `packages/utils` directory should include proper error handling with try/catch blocks around their main functions, logging errors and exiting with non-zero status codes upon failure.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-05-27T16:10:48.223Z
Learnt from: lucasdellabella
PR: metriport/metriport#3866
File: packages/core/src/command/hl7v2-subscriptions/hl7v2-roster-generator.ts:142-151
Timestamp: 2025-05-27T16:10:48.223Z
Learning: In packages/core/src/command/hl7v2-subscriptions/hl7v2-roster-generator.ts, the user prefers to let axios errors bubble up naturally rather than adding try-catch blocks that re-throw with MetriportError wrappers, especially when the calling code already has retry mechanisms in place via simpleExecuteWithRetries.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-07-16T18:48:28.251Z
Learnt from: leite08
PR: metriport/metriport#4095
File: packages/commonwell-cert-runner/src/single-commands/document-parse.ts:14-24
Timestamp: 2025-07-16T18:48:28.251Z
Learning: For utility scripts in the packages/commonwell-cert-runner directory, comprehensive error handling (like try-catch blocks for file operations and JSON parsing) is not necessary, as these are certification testing tools rather than production code.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-05-27T15:49:16.573Z
Learnt from: lucasdellabella
PR: metriport/metriport#3866
File: packages/core/src/command/hl7v2-subscriptions/__tests__/hl7v2-roster-generator.test.ts:190-191
Timestamp: 2025-05-27T15:49:16.573Z
Learning: The user lucasdellabella prefers to keep changes focused on the PR's scope and avoid making ancillary performance improvements (like replacing delete operators with undefined assignments) that are not directly related to the main objectives of the current PR.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
🧬 Code Graph Analysis (1)
packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts (7)
packages/shared/src/common/env-var.ts (1)
  • getEnvVarOrFail (14-20)
packages/core/src/external/aws/s3.ts (1)
  • S3Utils (140-606)
packages/core/src/command/analytics-platform/merge-csvs/file-name.ts (1)
  • buildMergeCsvsJobPrefix (5-7)
packages/shared/src/index.ts (1)
  • sleep (13-13)
packages/shared/src/common/date.ts (1)
  • buildDayjs (93-95)
packages/utils/src/shared/duration.ts (1)
  • elapsedTimeAsStr (8-13)
packages/core/src/external/snowflake/commands.ts (3)
  • promisifyConnect (3-15)
  • promisifyExecute (31-53)
  • promisifyDestroy (17-29)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: Analyze (javascript)
🔇 Additional comments (3)
packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts (3)

49-55: Good: basic CLI arg validation is in place.

Usage is printed and the script fails fast when mergeCsvJobId is missing. Suitable for a utils script.


164-168: Intentional fail-fast on CREATE TABLE acknowledged.

Omitting IF NOT EXISTS aligns with your bulk-ingest policy to avoid duplicate data. No change requested.


178-180: Ensure gzip_csv_format is provisioned in Snowflake

A search of the repo didn’t find any CREATE FILE FORMAT … gzip_csv_format statements, yet the COPY command (lines 178–180 in packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts) uses it. If that file format isn’t already defined in your target database/schema, the COPY will fail at runtime.

Please either:

  • Add a SQL migration under your infra (e.g. infra/sql/) to create this format, for example:

    CREATE FILE FORMAT IF NOT EXISTS gzip_csv_format
      TYPE = CSV
      COMPRESSION = GZIP
      FIELD_DELIMITER = ','
      SKIP_HEADER = 1
      NULL_IF = ('NULL');
  • Or document and verify that gzip_csv_format is already pre-provisioned in the target environment before executing this code.

@leite08 leite08 force-pushed the eng-743-bulk-ingest branch from 881fcbe to d0135db Compare August 23, 2025 14:24
Ref eng-743

Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
Ref eng-743

Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
Ref eng-743

Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
Ref eng-743

Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
Ref eng-743

Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
Ref eng-743

Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
Ref eng-743

Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
Ref eng-743

Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
…reate empty tables

Ref eng-743

Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/utils/src/analytics-platform/1-fhir-to-csv.ts (1)

182-183: Fail fast with top-level error handling for main().

Avoid silent unhandled rejections and ensure CI/ops see a non‑zero exit code.

-main();
+main().catch(error => {
+  const { log } = out("1-fhir-to-csv");
+  log(`>>> Fatal error: ${errorToString(error)}`);
+  process.exitCode = 1;
+});
♻️ Duplicate comments (2)
packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts (2)

54-60: Arg validation LGTM

Good fail-fast check with clear usage output.


186-201: Use a distinct temp stage name and drop defensively

Avoid name confusion with tables and be resilient on cleanup.

Apply this diff:

-  const createStageCmd =
-    `CREATE OR REPLACE TEMP STAGE ${tableName} STORAGE_INTEGRATION = ANALYTICS_BUCKET ` +
+  const stageName = `${tableName}_STAGE`;
+  const createStageCmd =
+    `CREATE OR REPLACE TEMP STAGE ${stageName} STORAGE_INTEGRATION = ANALYTICS_BUCKET ` +
     `URL = '${prefixUrl}/${resourceType}/'`;
@@
-  const copyCmd = `COPY INTO ${tableName} FROM @${tableName} FILE_FORMAT = gzip_csv_format ON_ERROR = ABORT_STATEMENT`;
+  const copyCmd = `COPY INTO ${tableName} FROM @${stageName} FILE_FORMAT = gzip_csv_format ON_ERROR = ABORT_STATEMENT`;
@@
-  const dropStageCmd = `DROP STAGE ${tableName};`;
+  const dropStageCmd = `DROP STAGE IF EXISTS ${stageName};`;
🧹 Nitpick comments (18)
packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts (4)

28-36: Fix typo and align docs with actual CLI parameters

The comment says “buy” instead of “by” and references a non-existent “max rows” parameter.

Apply this diff:

- * It relies on the mergeCsvJobId to be passed as a parameter. This ID is determined buy the 2nd step in
+ * It relies on the mergeCsvJobId to be passed as a parameter. This ID is determined by the 2nd step in
   * the flow, the script `2-merge-csvs.ts`.
@@
- * As a required runtime parameter, it needs the max number of rows to be used from each file found on S3.
+ * No additional runtime parameters are required; all rows from the merged CSVs are ingested.

92-100: Abort early when no S3 files are found

Prevents accidental empty loads due to wrong cxId/jobId/prefix.

Apply this diff:

   console.log(`Found ${files.length} files in ${prefixUrl}`);
+  if (files.length === 0) {
+    console.error("No files found. Check CX_ID, mergeCsvJobId, bucket, or prefix. Aborting.");
+    process.exit(1);
+  }
 
   await displayWarningAndConfirmation(files, orgName, database);

180-184: Tighten file presence check to avoid false positives

Using includes() risks matching shared prefixes (e.g., condition vs condition_code_coding).

Apply this diff:

-  if (!files.some(file => file.Key?.includes(resourceType))) {
+  if (!files.some(file => file.Key?.startsWith(`${prefixName}/${resourceType}/`))) {

214-221: Add filename to INI parse error for faster debugging

Improves visibility when a config lacks the Struct section.

Apply this diff:

-  if (!structSection) {
-    throw new Error("No [Struct] section found in the INI file");
-  }
+  if (!structSection) {
+    throw new Error(`No [Struct] section found in INI file: ${path}`);
+  }
packages/shared/src/common/numbers.ts (2)

12-26: Handle negatives and confirm suffix for billions.

Current thresholds don’t account for negatives (e.g., -1,500 → “-1500.0” instead of “-1.5K”). Also, confirm if “G” (giga) is desired over the more common “B” (billion) in user-facing logs.

Apply this diff to add sign-aware formatting (keeps “G”; switch to “B” if that’s the product choice):

 export function abbreviateNumber(num: number): string {
-  if (num < 1_000) {
-    return numberToString(num, 1);
-  }
-  if (num < 1_000_000) {
-    return `${numberToString(num / 1000, 1)}K`;
-  }
-  if (num < 1_000_000_000) {
-    return `${numberToString(num / 1_000_000, 1)}M`;
-  }
-  return `${numberToString(num / 1_000_000_000, 1)}G`;
+  const sign = num < 0 ? "-" : "";
+  const abs = Math.abs(num);
+  if (abs < 1_000) {
+    return `${sign}${numberToString(abs, 1)}`;
+  }
+  if (abs < 1_000_000) {
+    return `${sign}${numberToString(abs / 1_000, 1)}K`;
+  }
+  if (abs < 1_000_000_000) {
+    return `${sign}${numberToString(abs / 1_000_000, 1)}M`;
+  }
+  return `${sign}${numberToString(abs / 1_000_000_000, 1)}G`;
 }

28-47: Fix doc mismatch, use strict equality, and simplify formatting.

  • Docs say digits only apply when thousandSeparator is true, but code applies digits always.
  • Use strict equality per guidelines.
  • Minor simplification: destructure parts; format unconditionally; prepare a reusable formatter.

Apply this diff to fix docs and code:

 /**
  * Format a number to a string, with optional decimal places and thousand separator.
  *
  * @param num the number to format
- * @param digits number of decimal places, only used if `thousandSeparator` is true
+ * @param digits number of decimal places (applied to the string output)
  * @param thousandSeparator if true (default), use Intl.NumberFormat to format the number; otherwise,
  *                          return the number without formatting.
  */
 export function numberToString(num: number, digits = 2, thousandSeparator = true): string {
-  const asString = num.toFixed(digits);
-  if (!thousandSeparator) return asString;
-  const parts = asString.split(".");
-  const integerPart = parts[0] ?? 0;
-  const decimalPart = parts[1];
-  const integerPartWithSeparator = integerPart
-    ? Intl.NumberFormat("en-US").format(Number(integerPart))
-    : integerPart;
-  if (decimalPart == undefined) return integerPartWithSeparator.toString();
-  return `${integerPartWithSeparator}.${decimalPart}`;
+  const asString = num.toFixed(digits);
+  if (!thousandSeparator) return asString;
+  const [integerPart, decimalPart] = asString.split(".");
+  const integerPartWithSeparator = usNumberFormatter.format(Number(integerPart));
+  if (decimalPart === undefined) return integerPartWithSeparator;
+  return `${integerPartWithSeparator}.${decimalPart}`;
 }

And add this formatter once at the top of the file (after imports if any):

const usNumberFormatter = new Intl.NumberFormat("en-US");
packages/utils/src/analytics-platform/snowflake/test-snowflake-connection.ts (4)

1-1: Remove stray header comment and keep top-level comments after imports.

The leading “// 3-ingest-from-merged-csvs.ts” is inaccurate and violates the scripts guideline about top-level comments placement.

Apply this diff:

-// 3-ingest-from-merged-csvs.ts

69-76: Fix misleading variable name.

Variable name suggests a DROP but it’s executing SHOW TABLES.

Apply this diff:

-    console.log("Listing tables I have access to...");
-    const dropStageCmd = `SHOW TABLES;`;
-    const { rows: rowsRaw } = await executeAsync(dropStageCmd);
+    console.log("Listing tables I have access to...");
+    const showTablesCmd = `SHOW TABLES;`;
+    const { rows: rowsRaw } = await executeAsync(showTablesCmd);

85-97: Avoid arrow functions per guidelines; use for-of.

Keeps style consistent and avoids unnecessary arrow callbacks.

Apply this diff:

-    const groupByDatabase = groupBy(rows, "database_name");
-    for (const databaseName of Object.keys(groupByDatabase)) {
-      console.log(`Database: ${databaseName}`);
-      const tables = groupByDatabase[databaseName];
-      tables.forEach(table => {
-        console.log(
-          `> ${table.name} (${table.kind}): ` +
-            `${abbreviateNumber(table.rows)} rows, ` +
-            `${abbreviateNumber(table.bytes)} bytes, ` +
-            `owner: ${table.owner}`
-        );
-      });
-    }
+    const groupByDatabase = groupBy(rows, "database_name");
+    for (const databaseName of Object.keys(groupByDatabase)) {
+      console.log(`Database: ${databaseName}`);
+      const tables = groupByDatabase[databaseName];
+      for (const table of tables) {
+        console.log(
+          `> ${table.name} (${table.kind}): ` +
+            `${abbreviateNumber(table.rows)} rows, ` +
+            `${abbreviateNumber(table.bytes)} bytes, ` +
+            `owner: ${table.owner}`
+        );
+      }
+    }

43-51: Optional: catch and surface errors in main().

Improves DX by exiting non-zero and printing a clear failure if testIt throws.

Apply this diff:

 async function main() {
   await sleep(50); // Give some time to avoid mixing logs w/ Node's
   const startedAt = Date.now();
   console.log(`############## Started at ${buildDayjs().toISOString()} ##############`);
 
-  await testIt();
+  try {
+    await testIt();
+  } catch (err) {
+    console.error("Snowflake test failed:", errorToString(err));
+    process.exitCode = 1;
+  }
 
   console.log(`>>>>>>> Done after ${elapsedTimeAsStr(startedAt)}`);
 }
packages/utils/src/analytics-platform/1-fhir-to-csv.ts (8)

70-114: Fix typo: rename filtererdPatientIds to filteredPatientIds everywhere.

Current misspelling harms readability and risks future mistakes.

-  let filtererdPatientIds: string[] = [];
+  let filteredPatientIds: string[] = [];

   if (checkConsolidatedExists) {
     const localStartedAt = Date.now();
     log(
-      `>>> Got ${uniquePatientIds.length} patients, checking consolidated data... (this may take a while)`
+      `>>> Got ${uniquePatientIds.length} patients, checking consolidated data... (this may take a while)`
     );
-    filtererdPatientIds = await getPatientsWithConsolidatedData({ patientIds: uniquePatientIds });
+    filteredPatientIds = await getPatientsWithConsolidatedData({ patientIds: uniquePatientIds });
     log(
-      `>>> Filtered down to ${filtererdPatientIds.length} patients in ${elapsedTimeAsStr(
+      `>>> Filtered down to ${filteredPatientIds.length} patients in ${elapsedTimeAsStr(
         localStartedAt
       )}`
     );
   } else {
-    filtererdPatientIds = uniquePatientIds;
+    filteredPatientIds = uniquePatientIds;
   }

-  await displayWarningAndConfirmation(filtererdPatientIds, isAllPatients, orgName, log);
+  await displayWarningAndConfirmation(filteredPatientIds, isAllPatients, orgName, log);
   log(
-    `>>> Running it... ${filtererdPatientIds.length} patients, fhirToCsvJobId: ${fhirToCsvJobId}`
+    `>>> Running it... ${filteredPatientIds.length} patients, fhirToCsvJobId: ${fhirToCsvJobId}`
   );

   let amountOfPatientsProcessed = 0;

   const failedPatientIds: string[] = [];
   await executeAsynchronously(
-    filtererdPatientIds,
+    filteredPatientIds,
     async patientId => {
       const payload = JSON.stringify({
         jobId: fhirToCsvJobId,
         cxId,
         patientId,
       });
-        if (amountOfPatientsProcessed % 100 === 0) {
+        if (amountOfPatientsProcessed % 100 === 0) {
           log(
-            `>>> Sent ${amountOfPatientsProcessed}/${filtererdPatientIds.length} patients to queue`
+            `>>> Sent ${amountOfPatientsProcessed}/${filteredPatientIds.length} patients to queue`
           );
         }

44-45: Make jobId collision-resistant.

Two invocations within the same second will generate identical ids. Add milliseconds and a short random suffix.

-const fhirToCsvJobId = "F2C_" + buildDayjs().toISOString().slice(0, 19).replace(/[:.]/g, "-");
+const fhirToCsvJobId =
+  `F2C_${buildDayjs().format("YYYY-MM-DDTHH-mm-ss-SSS")}_${Math.random().toString(36).slice(2, 8)}`;

171-171: Loosen logger parameter type (avoid coupling to console).

Use the project’s logger type rather than typeof console.log.

-  log: typeof console.log
+  log: ReturnType<typeof out>["log"]

155-160: Use project logger for consistency.

Stay consistent with logging guidelines and prefixes.

-        console.log(
-          `Failed to check if consolidated data exists for patient ${patientId} - reason: ${errorToString(
-            error
-          )}`
-        );
+        out("getPatientsWithConsolidatedData").log(
+          `Failed to check if consolidated data exists for patient ${patientId} - reason: ${errorToString(error)}`
+        );

58-58: Prefer a meaningful log prefix.

Improves traceability across multi-script runs.

-  const { log } = out("");
+  const { log } = out("1-fhir-to-csv");

162-162: Right-size S3 HEAD concurrency to reduce throttling risk.

100 parallel HEADs can spike on large cohorts. Consider a smaller default and make it a constant.

-    { numberOfParallelExecutions: 100, minJitterMillis: 10, maxJitterMillis: 50 }
+    { numberOfParallelExecutions: 50, minJitterMillis: 10, maxJitterMillis: 50 }

If helpful, promote to a top-level constant.


103-103: Resolve TODO or create a tracking issue.

“Should be using FhirToCsvCloud?” — either implement or link the intended follow-up to avoid TODO drift.

Want me to open an issue and propose the exact switch-over plan?


128-132: PII handling for failure file.

Patient IDs are written to a file in the working dir. Confirm this is acceptable for your environment, or write to a secured temp dir and add a short header with jobId + timestamp.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled
  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f0dcb67 and 5e22379.

📒 Files selected for processing (4)
  • packages/shared/src/common/numbers.ts (1 hunks)
  • packages/utils/src/analytics-platform/1-fhir-to-csv.ts (1 hunks)
  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts (1 hunks)
  • packages/utils/src/analytics-platform/snowflake/test-snowflake-connection.ts (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{js,jsx,ts,tsx}

📄 CodeRabbit inference engine (.cursorrules)

**/*.{js,jsx,ts,tsx}: Don’t use null inside the app, only on code interacting with external interfaces/services, like DB and HTTP; convert to undefined before sending inwards into the code
Use const whenever possible
Use async/await instead of .then()
Naming: classes, enums: PascalCase
Naming: constants, variables, functions: camelCase
Naming: file names: kebab-case
Naming: Don’t use negative names, like notEnabled, prefer isDisabled
If possible, use decomposing objects for function parameters
Prefer Nullish Coalesce (??) than the OR operator (||) when you want to provide a default value
Avoid creating arrow functions
Use truthy syntax instead of in - i.e., if (data.link) not if ('link' in data)
While handling errors, keep the stack trace around: if you create a new Error (e.g., MetriportError), make sure to pass the original error as the new one’s cause so the stack trace is available upstream.
max column length is 100 chars
multi-line comments use /** */
top-level comments go after the import (save pre-import to basic file header, like license)
move literals to constants declared after imports when possible

Files:

  • packages/utils/src/analytics-platform/snowflake/test-snowflake-connection.ts
  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
  • packages/shared/src/common/numbers.ts
  • packages/utils/src/analytics-platform/1-fhir-to-csv.ts
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursorrules)

Use types whenever possible

Files:

  • packages/utils/src/analytics-platform/snowflake/test-snowflake-connection.ts
  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
  • packages/shared/src/common/numbers.ts
  • packages/utils/src/analytics-platform/1-fhir-to-csv.ts
**/*.ts

⚙️ CodeRabbit configuration file

**/*.ts: - Use the Onion Pattern to organize a package's code in layers

  • Try to use immutable code and avoid sharing state across different functions, objects, and systems
  • Try to build code that's idempotent whenever possible
  • Prefer functional programming style functions: small, deterministic, 1 input, 1 output
  • Minimize coupling / dependencies
  • Avoid modifying objects received as parameter
  • Only add comments to code to explain why something was done, not how it works
  • Naming
    • classes, enums: PascalCase
    • constants, variables, functions: camelCase
    • file names: kebab-case
    • table and column names: snake_case
    • Use meaningful names, so whoever is reading the code understands what it means
    • Don’t use negative names, like notEnabled, prefer isDisabled
    • For numeric values, if the type doesn’t convey the unit, add the unit to the name
  • Typescript
    • Use types
    • Prefer const instead of let
    • Avoid any and casting from any to other types
    • Type predicates: only applicable to narrow down the type, not to force a complete type conversion
    • Prefer deconstructing parameters for functions instead of multiple parameters that might be of
      the same type
    • Don’t use null inside the app, only on code interacting with external interfaces/services,
      like DB and HTTP; convert to undefined before sending inwards into the code
    • Use async/await instead of .then()
    • Use the strict equality operator ===, don’t use abstract equality operator ==
    • When calling a Promise-returning function asynchronously (i.e., not awaiting), use .catch() to
      handle errors (see processAsyncError and emptyFunction depending on the case)
    • Date and Time
      • Always use buildDayjs() to create dayjs instances
      • Prefer dayjs.duration(...) to create duration consts and keep them as duration
  • Prefer Nullish Coalesce (??) than the OR operator (||) to provide a default value
  • Avoid creating arrow functions
  • U...

Files:

  • packages/utils/src/analytics-platform/snowflake/test-snowflake-connection.ts
  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
  • packages/shared/src/common/numbers.ts
  • packages/utils/src/analytics-platform/1-fhir-to-csv.ts
🧠 Learnings (32)
📓 Common learnings
Learnt from: leite08
PR: metriport/metriport#4255
File: packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts:19-20
Timestamp: 2025-07-27T00:40:32.149Z
Learning: User leite08 acknowledged the duplicate inputBundle spread issue in packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts and will fix it in a follow-up PR rather than in the current patch release PR #4255.
Learnt from: leite08
PR: metriport/metriport#4350
File: packages/utils/src/analytics-platform/2-merge-csvs.ts:58-61
Timestamp: 2025-08-20T12:40:34.100Z
Learning: In packages/utils/src/analytics-platform/2-merge-csvs.ts, the constant messageGroupId "merge-csvs" for SQS FIFO processing is intentional by user leite08, indicating they want sequential processing rather than concurrent chunk processing for the merge CSV workflow.
Learnt from: leite08
PR: metriport/metriport#4013
File: packages/core/src/fhir-to-cda/cda-templates/components/procedures.ts:104-105
Timestamp: 2025-06-12T22:53:09.606Z
Learning: User leite08 prefers responses in English only.
Learnt from: leite08
PR: metriport/metriport#3991
File: packages/core/src/external/aws/s3.ts:621-630
Timestamp: 2025-06-10T22:20:21.203Z
Learning: User leite08 prefers that all review comments are written in English.
📚 Learning: 2025-08-20T12:40:34.100Z
Learnt from: leite08
PR: metriport/metriport#4350
File: packages/utils/src/analytics-platform/2-merge-csvs.ts:58-61
Timestamp: 2025-08-20T12:40:34.100Z
Learning: In packages/utils/src/analytics-platform/2-merge-csvs.ts, the constant messageGroupId "merge-csvs" for SQS FIFO processing is intentional by user leite08, indicating they want sequential processing rather than concurrent chunk processing for the merge CSV workflow.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
  • packages/utils/src/analytics-platform/1-fhir-to-csv.ts
📚 Learning: 2025-08-15T03:59:27.025Z
Learnt from: leite08
PR: metriport/metriport#4350
File: packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts:159-161
Timestamp: 2025-08-15T03:59:27.025Z
Learning: User leite08 prefers fail-fast behavior for bulk import scripts in Snowflake. When tables already exist, the script should fail rather than continue, as this prevents accidental duplicate data ingestion. The `IF NOT EXISTS` clause should be omitted in bulk import scenarios to ensure data integrity.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-04-05T16:02:22.473Z
Learnt from: leite08
PR: metriport/metriport#3611
File: packages/utils/src/carequality/download-directory.ts:68-82
Timestamp: 2025-04-05T16:02:22.473Z
Learning: Utility scripts in the packages/utils directory don't require comprehensive error handling like production code.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-05-27T23:51:45.100Z
Learnt from: leite08
PR: metriport/metriport#3857
File: packages/utils/src/consolidated/filter-consolidated.ts:39-68
Timestamp: 2025-05-27T23:51:45.100Z
Learning: Utility scripts in the packages/utils directory don't require comprehensive error handling or logging changes (like switching from console.log to out().log), as they are meant for local/developer environments, not production code.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-03-14T00:30:18.860Z
Learnt from: leite08
PR: metriport/metriport#3468
File: packages/utils/src/s3/list-objects.ts:17-28
Timestamp: 2025-03-14T00:30:18.860Z
Learning: Files under `./packages/utils` folder are utility scripts and don't require the same coding standards used in other packages (like error handling, structured logging, etc.).

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-04-24T02:17:02.661Z
Learnt from: leite08
PR: metriport/metriport#3412
File: packages/utils/src/patient-import/send-webhook.ts:41-55
Timestamp: 2025-04-24T02:17:02.661Z
Learning: Files under `packages/utils` don't need robust error handling as they are utility scripts meant for local/developer environments, not production code.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-05-30T14:19:02.018Z
Learnt from: leite08
PR: metriport/metriport#3924
File: packages/utils/src/open-search/ingest-patients.ts:73-74
Timestamp: 2025-05-30T14:19:02.018Z
Learning: Utility scripts in the codebase don't need to strictly follow all coding guidelines, particularly around logging practices like avoiding multi-line logs or objects as second parameters.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-03-14T00:31:01.986Z
Learnt from: leite08
PR: metriport/metriport#3468
File: packages/utils/src/s3/list-objects.ts:12-13
Timestamp: 2025-03-14T00:31:01.986Z
Learning: Files under the `./packages/utils` folder are utility scripts and don't require the same coding standards as other packages. They often contain placeholder values meant to be filled in by developers before use.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-05-31T23:09:05.757Z
Learnt from: lucasdellabella
PR: metriport/metriport#3941
File: packages/utils/src/hl7v2-notifications/unpack-patient-roster-identifier.ts:61-64
Timestamp: 2025-05-31T23:09:05.757Z
Learning: Utility scripts in the utils package may have relaxed coding standards compared to production code, particularly for error handling. When reviewing debugging scripts or one-off utilities, apply less stringent requirements for error handling patterns.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-07-27T23:42:34.014Z
Learnt from: RamilGaripov
PR: metriport/metriport#4259
File: packages/utils/src/document/reconversion-kickoff-loader.ts:78-81
Timestamp: 2025-07-27T23:42:34.014Z
Learning: For utility scripts in packages/utils, RamilGaripov prefers to keep them simple without extensive validation or error handling, as they are local developer tools where the developer controls the input and execution environment.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-03-14T01:13:34.887Z
Learnt from: leite08
PR: metriport/metriport#3469
File: packages/utils/src/s3/list-objects.ts:23-27
Timestamp: 2025-03-14T01:13:34.887Z
Learning: Files under `./packages/utils` are permitted to use `console.log` directly and are not required to use `out().log` instead, as they're exempt from certain coding standards that apply to the rest of the codebase.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-04-10T16:01:21.135Z
Learnt from: leite08
PR: metriport/metriport#3648
File: packages/utils/src/csv/find-dups.ts:14-14
Timestamp: 2025-04-10T16:01:21.135Z
Learning: Files under `packages/utils` are allowed to have empty strings for input parameters as these are meant to be manually modified by developers before running scripts in their local environments.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-08-26T02:33:26.777Z
Learnt from: leite08
PR: metriport/metriport#4469
File: packages/fhir-converter/src/lib/workers/worker.js:75-87
Timestamp: 2025-08-26T02:33:26.777Z
Learning: User leite08 confirmed that the debug logging approach in packages/fhir-converter/src/lib/workers/worker.js was already reviewed and approved in the feature PR, even though it logs both error messages and stack traces.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-08-15T18:57:29.030Z
Learnt from: lucasdellabella
PR: metriport/metriport#4382
File: packages/utils/src/hl7v2-notifications/reprocess-adt-conversion-bundles/common.ts:36-48
Timestamp: 2025-08-15T18:57:29.030Z
Learning: In utility scripts like packages/utils/src/hl7v2-notifications/reprocess-adt-conversion-bundles/common.ts, user lucasdellabella prefers to let bundle parsing errors fail fast rather than adding error handling, as it provides immediate visibility into data issues during script execution.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-04-05T16:02:31.219Z
Learnt from: leite08
PR: metriport/metriport#3611
File: packages/utils/src/carequality/download-directory.ts:33-66
Timestamp: 2025-04-05T16:02:31.219Z
Learning: In the Metriport codebase, utility scripts in the packages/utils directory have different error handling requirements compared to production code, and do not necessarily need the same level of error handling.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-08-01T18:20:14.618Z
Learnt from: keshavsaharia
PR: metriport/metriport#4161
File: packages/utils/src/surescripts/compare/dr-first.ts:253-267
Timestamp: 2025-08-01T18:20:14.618Z
Learning: For one-off utility scripts in packages/utils/src/surescripts/, it's acceptable to ignore certain code quality issues like array modification during iteration, as these scripts prioritize functionality over perfect code patterns.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-07-27T00:40:32.149Z
Learnt from: leite08
PR: metriport/metriport#4255
File: packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts:19-20
Timestamp: 2025-07-27T00:40:32.149Z
Learning: User leite08 acknowledged the duplicate inputBundle spread issue in packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts and will fix it in a follow-up PR rather than in the current patch release PR #4255.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-05-26T16:24:13.656Z
Learnt from: leite08
PR: metriport/metriport#3859
File: packages/utils/src/consolidated/seed-patient-consolidated.ts:94-94
Timestamp: 2025-05-26T16:24:13.656Z
Learning: For utility scripts in the packages/utils directory, leite08 accepts simpler string replacement approaches (like replaceAll) even when more targeted parsing approaches would be safer, indicating different code quality standards apply to utility scripts versus production code.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-05-31T23:09:07.678Z
Learnt from: lucasdellabella
PR: metriport/metriport#3941
File: packages/utils/src/hl7v2-notifications/unpack-patient-roster-identifier.ts:66-68
Timestamp: 2025-05-31T23:09:07.678Z
Learning: Scripts and utility tools may have relaxed code standards compared to production application code. Type safety improvements and other refactoring suggestions should be applied with less rigor to debugging scripts.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-04-03T02:23:58.332Z
Learnt from: leite08
PR: metriport/metriport#3595
File: packages/utils/src/carequality/download-directory.ts:36-66
Timestamp: 2025-04-03T02:23:58.332Z
Learning: In the metriport codebase, utility scripts may not require the same level of error handling as production code, especially for simple or one-off tasks like in the CQ directory download script.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-06-10T13:41:50.425Z
Learnt from: lucasdellabella
PR: metriport/metriport#3987
File: packages/utils/src/local-bulk-insert-patients.ts:216-219
Timestamp: 2025-06-10T13:41:50.425Z
Learning: For utility scripts and testing scripts in the packages/utils directory, prefer simpler error handling rather than detailed contextual error messages, as these are typically used internally and don't require the same level of robustness as production code.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-06-26T06:27:07.109Z
Learnt from: keshavsaharia
PR: metriport/metriport#4099
File: packages/utils/src/surescripts/shared.ts:52-63
Timestamp: 2025-06-26T06:27:07.109Z
Learning: For utility scripts in packages/utils, comprehensive error handling (like try-catch blocks for S3 operations and JSON parsing) is not necessary, as immediate failures can be more helpful for developers running these one-off tools.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-04-24T02:17:19.333Z
Learnt from: leite08
PR: metriport/metriport#3412
File: packages/utils/src/patient-import/send-webhook.ts:36-40
Timestamp: 2025-04-24T02:17:19.333Z
Learning: Utility scripts in the `packages/utils` directory should include proper error handling with try/catch blocks around their main functions, logging errors and exiting with non-zero status codes upon failure.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-05-27T16:10:48.223Z
Learnt from: lucasdellabella
PR: metriport/metriport#3866
File: packages/core/src/command/hl7v2-subscriptions/hl7v2-roster-generator.ts:142-151
Timestamp: 2025-05-27T16:10:48.223Z
Learning: In packages/core/src/command/hl7v2-subscriptions/hl7v2-roster-generator.ts, the user prefers to let axios errors bubble up naturally rather than adding try-catch blocks that re-throw with MetriportError wrappers, especially when the calling code already has retry mechanisms in place via simpleExecuteWithRetries.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-07-16T18:48:28.251Z
Learnt from: leite08
PR: metriport/metriport#4095
File: packages/commonwell-cert-runner/src/single-commands/document-parse.ts:14-24
Timestamp: 2025-07-16T18:48:28.251Z
Learning: For utility scripts in the packages/commonwell-cert-runner directory, comprehensive error handling (like try-catch blocks for file operations and JSON parsing) is not necessary, as these are certification testing tools rather than production code.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-05-27T15:49:16.573Z
Learnt from: lucasdellabella
PR: metriport/metriport#3866
File: packages/core/src/command/hl7v2-subscriptions/__tests__/hl7v2-roster-generator.test.ts:190-191
Timestamp: 2025-05-27T15:49:16.573Z
Learning: The user lucasdellabella prefers to keep changes focused on the PR's scope and avoid making ancillary performance improvements (like replacing delete operators with undefined assignments) that are not directly related to the main objectives of the current PR.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-07-20T01:18:28.093Z
Learnt from: leite08
PR: metriport/metriport#4215
File: packages/core/src/command/analytics-platform/config.ts:4-13
Timestamp: 2025-07-20T01:18:28.093Z
Learning: User leite08 prefers that code improvement suggestions (like consolidating duplicated schemas) be made on feature PRs rather than release PRs. Such improvements should be deferred to follow-up PRs during the release cycle.

Applied to files:

  • packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts
📚 Learning: 2025-07-18T09:33:29.581Z
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/api/src/routes/internal/analytics-platform/index.ts:28-28
Timestamp: 2025-07-18T09:33:29.581Z
Learning: In packages/api/src/routes/internal/analytics-platform/index.ts, the patientId parameter handling is intentionally inconsistent: the `/fhir-to-csv` and `/fhir-to-csv/transform` endpoints require patientId (using getFromQueryOrFail) for single patient processing, while the `/fhir-to-csv/batch` endpoint treats patientId as optional (using getFromQuery) because it's designed to run over all patients when no specific patient ID is provided.

Applied to files:

  • packages/utils/src/analytics-platform/1-fhir-to-csv.ts
📚 Learning: 2025-05-20T21:26:26.804Z
Learnt from: leite08
PR: metriport/metriport#3814
File: packages/api/src/routes/internal/medical/patient-consolidated.ts:141-174
Timestamp: 2025-05-20T21:26:26.804Z
Learning: The functionality introduced in packages/api/src/routes/internal/medical/patient-consolidated.ts is planned to be refactored in downstream PR #3857, including improvements to error handling and validation.

Applied to files:

  • packages/utils/src/analytics-platform/1-fhir-to-csv.ts
📚 Learning: 2025-03-11T20:42:46.516Z
Learnt from: thomasyopes
PR: metriport/metriport#3427
File: packages/core/src/external/ehr/api/sync-patient.ts:16-55
Timestamp: 2025-03-11T20:42:46.516Z
Learning: In the patient synchronization architecture, the flow follows this pattern: (1) `ehr-sync-patient-cloud.ts` sends messages to an SQS queue, (2) the `ehr-sync-patient` Lambda consumes these messages, and (3) the Lambda uses the `syncPatient` function to make the API calls to process the patient data.

Applied to files:

  • packages/utils/src/analytics-platform/1-fhir-to-csv.ts
📚 Learning: 2025-08-21T01:50:04.471Z
Learnt from: keshavsaharia
PR: metriport/metriport#4394
File: packages/core/src/external/quest/command/fhir-converter/fhir-converter-direct.ts:5-8
Timestamp: 2025-08-21T01:50:04.471Z
Learning: In packages/core/src/external/quest/command/fhir-converter/fhir-converter-direct.ts, the QuestFhirConverterCommandDirect class is intentionally implemented as a stub/TODO for ENG-864, as confirmed by keshavsaharia. The Quest FHIR conversion feature is being developed in phases - the current PR establishes the pipeline structure, and the next PR will implement the actual conversion logic.

Applied to files:

  • packages/utils/src/analytics-platform/1-fhir-to-csv.ts
🧬 Code graph analysis (3)
packages/utils/src/analytics-platform/snowflake/test-snowflake-connection.ts (5)
packages/shared/src/common/env-var.ts (1)
  • getEnvVarOrFail (14-20)
packages/shared/src/index.ts (2)
  • sleep (13-13)
  • errorToString (45-45)
packages/shared/src/common/date.ts (1)
  • buildDayjs (100-102)
packages/core/src/external/snowflake/commands.ts (3)
  • promisifyConnect (3-15)
  • promisifyExecute (31-53)
  • promisifyDestroy (17-29)
packages/shared/src/common/numbers.ts (1)
  • abbreviateNumber (15-26)
packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts (6)
packages/shared/src/common/env-var.ts (1)
  • getEnvVarOrFail (14-20)
packages/core/src/external/aws/s3.ts (1)
  • S3Utils (140-606)
packages/core/src/command/analytics-platform/merge-csvs/file-name.ts (1)
  • buildMergeCsvsJobPrefix (5-7)
packages/shared/src/index.ts (2)
  • sleep (13-13)
  • errorToString (45-45)
packages/shared/src/common/date.ts (1)
  • buildDayjs (100-102)
packages/core/src/external/snowflake/commands.ts (3)
  • promisifyConnect (3-15)
  • promisifyExecute (31-53)
  • promisifyDestroy (17-29)
packages/utils/src/analytics-platform/1-fhir-to-csv.ts (6)
packages/shared/src/common/date.ts (1)
  • buildDayjs (100-102)
packages/shared/src/common/env-var.ts (1)
  • getEnvVarOrFail (14-20)
packages/core/src/external/aws/s3.ts (2)
  • S3Utils (140-606)
  • executeWithRetriesS3 (94-111)
packages/shared/src/index.ts (2)
  • sleep (13-13)
  • errorToString (45-45)
packages/core/src/util/log.ts (2)
  • out (30-35)
  • log (10-23)
packages/utils/src/patient/get-ids.ts (1)
  • getAllPatientIds (22-29)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (javascript)
🔇 Additional comments (5)
packages/utils/src/analytics-platform/snowflake/3-ingest-from-merged-csvs.ts (1)

175-179: Intentional fail-fast on table creation is consistent with bulk-load semantics

Creating without IF NOT EXISTS aligns with preventing dup loads.

Confirm that creating tables even for resource types with no files is desired (schema bootstrapping). If not, move the “no files” check above CREATE TABLE.

packages/utils/src/analytics-platform/snowflake/test-snowflake-connection.ts (3)

21-29: LGTM: helpful usage doc placed after imports.

Clear instructions and correct placement per scripts guideline.


37-41: LGTM: hardened Snowflake config.

ocspFailOpen=false (fail closed) and reduced logging are sensible defaults for this script.


73-85: Manual Verification Required: Confirm SHOW TABLES Column Names in Your Snowflake Account

The code in packages/utils/src/analytics-platform/snowflake/test-snowflake-connection.ts (lines 73–85) currently assumes the SHOW TABLES command returns properties named rows and bytes. However, some Snowflake environments use row_count instead of rows. If you see “undefined rows/bytes” in your logs, you’ll need to adjust the field mapping.

Please run the following snippet in your local environment (with the snowflake-sdk installed and valid Snowflake credentials) to inspect the actual column keys returned by SHOW TABLES and update the code if necessary:

#!/bin/bash
# Print keys from the first SHOW TABLES row to verify column names
node -e '
  const snowflake = require("snowflake-sdk");
  require("dotenv").config();
  const conn = snowflake.createConnection({
    account: process.env.SNOWFLAKE_ACCOUNT,
    token: process.env.SNOWFLAKE_TOKEN,
    database: process.env.SNOWFLAKE_DB,
    schema: process.env.SNOWFLAKE_SCHEMA,
    warehouse: process.env.SNOWFLAKE_WH,
    authenticator: "PROGRAMMATIC_ACCESS_TOKEN",
    clientSessionKeepAlive: true,
  });
  conn.connect((err) => {
    if (err) {
      console.error("Connection error:", err);
      process.exit(1);
    }
    conn.execute({
      sqlText: "SHOW TABLES;",
      complete: (err, _stmt, rows) => {
        if (err) {
          console.error("Query error:", err);
          process.exit(1);
        }
        console.log("SHOW TABLES columns:", Object.keys(rows?.[0] || {}));
        conn.destroy(() => {});
      }
    });
  });
'

• If the output shows "row_count" instead of "rows", update the code to:

const rows: Array<{
  name: string;
  database_name: string;
  row_count: number;   // renamed from `rows`
  bytes: number;
  owner: string;
  kind: string;
}> = rowsRaw ?? [];

const groupByDatabase = groupBy(rows, "database_name");

• Otherwise, no changes are needed.

packages/utils/src/analytics-platform/1-fhir-to-csv.ts (1)

48-54: FIFO queue and deduplication settings are correctly configured

  • In packages/infra/lib/analytics-platform/analytics-platform-stack.ts, the call to createQueue({ …, fifo: true, … }) invokes createFifoQueue, which sets
    • queueName: <name>Queue.fifo
    • fifo: true
    confirming the queue URL will end in .fifo.
  • In createFifoQueue (packages/infra/lib/shared/sqs.ts), contentBasedDeduplication defaults to false (props.contentBasedDeduplication ?? false), ensuring per-message deduplication via messageDeduplicationId, matching the application code’s assumptions.
  • No infra changes are needed.

Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
Ref eng-743

Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
packages/utils/src/shared/ids.ts (6)

1-1: Decouple utils from core to avoid cross-package dependency (and potential cycles).

Use Node’s fs here instead of importing a core helper from utils.

-import { getFileContents } from "@metriport/core/util/fs";
+import { readFileSync } from "node:fs";

4-6: Read locally, strip BOM, and keep contents normalization local.

Switch to fs, strip an optional UTF-8 BOM, and normalize before splitting.

-  const fileContents = getFileContents(fileName);
-  const idsFromFile = fileContents
+  const fileContents = readFileSync(fileName, "utf8");
+  const normalizedContents = fileContents.replace(/^\uFEFF/, "");
+  const idsFromFile = normalizedContents

4-4: Wrap file read with cause-preserving error for better diagnostics.

Prevents opaque failures while preserving the original stack.

-  const fileContents = readFileSync(fileName, "utf8");
+  let fileContents: string;
+  try {
+    fileContents = readFileSync(fileName, "utf8");
+  } catch (err) {
+    throw new Error(`Failed to read IDs file: ${fileName}`, { cause: err as Error });
+  }

7-7: Only strip surrounding quotes; don’t remove embedded quotes.

Current replaceAll nukes all quotes; this narrows to leading/trailing quotes.

-    .map(id => id.replaceAll('"', "").replaceAll("'", "").trim())
+    .map(line => line.trim().replace(/^['"]|['"]$/g, ""))

9-9: Optionally dedupe IDs to avoid downstream duplication.

If callers don’t rely on duplicates, return unique IDs.

-  return idsFromFile;
+  return Array.from(new Set(idsFromFile));

3-10: Add minimal unit tests for edge cases.

Cover: CRLF vs LF, BOM, header line "id", surrounding quotes, empty lines, duplicates.

I can add a small ids.test.ts with fixtures exercising these cases—want me to open a follow-up?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled
  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 7ca4b73 and 8200639.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (2)
  • packages/utils/src/analytics-platform/1-fhir-to-csv.ts (1 hunks)
  • packages/utils/src/shared/ids.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/utils/src/analytics-platform/1-fhir-to-csv.ts
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{js,jsx,ts,tsx}

📄 CodeRabbit inference engine (.cursorrules)

**/*.{js,jsx,ts,tsx}: Don’t use null inside the app, only on code interacting with external interfaces/services, like DB and HTTP; convert to undefined before sending inwards into the code
Use const whenever possible
Use async/await instead of .then()
Naming: classes, enums: PascalCase
Naming: constants, variables, functions: camelCase
Naming: file names: kebab-case
Naming: Don’t use negative names, like notEnabled, prefer isDisabled
If possible, use decomposing objects for function parameters
Prefer Nullish Coalesce (??) than the OR operator (||) when you want to provide a default value
Avoid creating arrow functions
Use truthy syntax instead of in - i.e., if (data.link) not if ('link' in data)
While handling errors, keep the stack trace around: if you create a new Error (e.g., MetriportError), make sure to pass the original error as the new one’s cause so the stack trace is available upstream.
max column length is 100 chars
multi-line comments use /** */
top-level comments go after the import (save pre-import to basic file header, like license)
move literals to constants declared after imports when possible

Files:

  • packages/utils/src/shared/ids.ts
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursorrules)

Use types whenever possible

Files:

  • packages/utils/src/shared/ids.ts
**/*.ts

⚙️ CodeRabbit configuration file

**/*.ts: - Use the Onion Pattern to organize a package's code in layers

  • Try to use immutable code and avoid sharing state across different functions, objects, and systems
  • Try to build code that's idempotent whenever possible
  • Prefer functional programming style functions: small, deterministic, 1 input, 1 output
  • Minimize coupling / dependencies
  • Avoid modifying objects received as parameter
  • Only add comments to code to explain why something was done, not how it works
  • Naming
    • classes, enums: PascalCase
    • constants, variables, functions: camelCase
    • file names: kebab-case
    • table and column names: snake_case
    • Use meaningful names, so whoever is reading the code understands what it means
    • Don’t use negative names, like notEnabled, prefer isDisabled
    • For numeric values, if the type doesn’t convey the unit, add the unit to the name
  • Typescript
    • Use types
    • Prefer const instead of let
    • Avoid any and casting from any to other types
    • Type predicates: only applicable to narrow down the type, not to force a complete type conversion
    • Prefer deconstructing parameters for functions instead of multiple parameters that might be of
      the same type
    • Don’t use null inside the app, only on code interacting with external interfaces/services,
      like DB and HTTP; convert to undefined before sending inwards into the code
    • Use async/await instead of .then()
    • Use the strict equality operator ===, don’t use abstract equality operator ==
    • When calling a Promise-returning function asynchronously (i.e., not awaiting), use .catch() to
      handle errors (see processAsyncError and emptyFunction depending on the case)
    • Date and Time
      • Always use buildDayjs() to create dayjs instances
      • Prefer dayjs.duration(...) to create duration consts and keep them as duration
  • Prefer Nullish Coalesce (??) than the OR operator (||) to provide a default value
  • Avoid creating arrow functions
  • U...

Files:

  • packages/utils/src/shared/ids.ts
🧠 Learnings (1)
📓 Common learnings
Learnt from: leite08
PR: metriport/metriport#4255
File: packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts:19-20
Timestamp: 2025-07-27T00:40:32.149Z
Learning: User leite08 acknowledged the duplicate inputBundle spread issue in packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts and will fix it in a follow-up PR rather than in the current patch release PR #4255.
Learnt from: leite08
PR: metriport/metriport#4350
File: packages/utils/src/analytics-platform/2-merge-csvs.ts:58-61
Timestamp: 2025-08-20T12:40:34.100Z
Learning: In packages/utils/src/analytics-platform/2-merge-csvs.ts, the constant messageGroupId "merge-csvs" for SQS FIFO processing is intentional by user leite08, indicating they want sequential processing rather than concurrent chunk processing for the merge CSV workflow.
Learnt from: leite08
PR: metriport/metriport#4013
File: packages/core/src/fhir-to-cda/cda-templates/components/procedures.ts:104-105
Timestamp: 2025-06-12T22:53:09.606Z
Learning: User leite08 prefers responses in English only.
Learnt from: leite08
PR: metriport/metriport#3991
File: packages/core/src/external/aws/s3.ts:621-630
Timestamp: 2025-06-10T22:20:21.203Z
Learning: User leite08 prefers that all review comments are written in English.
🧬 Code graph analysis (1)
packages/utils/src/shared/ids.ts (2)
packages/fhir-converter/src/routes.js (1)
  • fileName (209-209)
packages/core/src/external/opensearch/file/file-ingestor-direct.ts (1)
  • getFileContents (78-88)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: Analyze (javascript)

Ref eng-743

Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
@leite08 leite08 mentioned this pull request Aug 28, 2025
25 tasks
export function parseTableNameFromFhirToCsvFileKey(key: string): string {
// e.g.: snowflake/fhir-to-csv/cx=cx-id/f2c=2025-08-08T02-18-56/pt=patient-id/_tmp_fhir-to-csv_output_cx-id_patient-id_condition.csv
const tableName = key.split("/")[5]?.split("_").slice(6).join("_").split(".")[0];
if (!tableName) {
Copy link
Contributor

@thomasyopes thomasyopes Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may want to validate this is real table at some point, nabd right now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean/suggest?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the table name should checked against of list of table names so that we catch an issue here vs. downstream.

Ref eng-743

Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
Ref eng-743

Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
Ref eng-743

Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts (1)

23-30: Make table-name parsing robust; remove magic indexes and underscore-coupling

Current logic relies on .slice(6) token counting, which breaks if cxId/patientId contain underscores or if the filename structure changes. Derive cxId/patientId from path segments and validate the filename prefix before slicing the table name.

 export function parseTableNameFromFhirToCsvFileKey(key: string): string {
-  // e.g.: snowflake/fhir-to-csv/cx=cx-id/f2c=2025-08-08T02-18-56/pt=patient-id/_tmp_fhir-to-csv_output_cx-id_patient-id_condition.csv
-  const tableName = key.split("/").pop()?.split("_").slice(6).join("_").split(".")[0];
-  if (!tableName) {
-    throw new MetriportError(`Failed to parse tableName from fhirToCsvFileKey`, undefined, { key });
-  }
-  return tableName;
+  // Example:
+  // snowflake/fhir-to-csv/cx=cx-id/f2c=2025-08-08T02-18-56/pt=patient-id/_tmp_fhir-to-csv_output_cx-id_patient-id_condition.csv
+  const parts = key.split("/");
+  const filename = parts[parts.length - 1];
+  const cxId = parts.find(p => p.startsWith("cx="))?.slice(3);
+  const patientId = parts.find(p => p.startsWith("pt="))?.slice(3);
+  if (!filename || !cxId || !patientId || !filename.endsWith(".csv")) {
+    throw new MetriportError("Failed to parse tableName from fhirToCsvFileKey", undefined, { key });
+  }
+  const base = filename.slice(0, -".csv".length);
+  const expectedPrefix = `_tmp_fhir-to-csv_output_${cxId}_${patientId}_";
+  if (!base.startsWith(expectedPrefix)) {
+    throw new MetriportError("Failed to parse tableName from fhirToCsvFileKey", undefined, { key });
+  }
+  const tableName = base.slice(expectedPrefix.length);
+  if (!tableName) {
+    throw new MetriportError("Failed to parse tableName from fhirToCsvFileKey", undefined, { key });
+  }
+  return tableName;
 }

If you prefer, I can add focused unit tests covering ids/tableNames with underscores and malformed keys.

🧹 Nitpick comments (6)
packages/utils/src/s3/list-first-level-subdirectories.ts (2)

47-50: Guard against missing Prefix in CommonPrefix entries

CommonPrefix.Prefix is optional; printing undefined produces noisy output. Filter or guard when logging.

-  objs?.forEach(obj => {
-    console.log(`- ${obj.Prefix}`);
-  });
+  objs?.forEach(({ Prefix }) => {
+    if (Prefix) console.log(`- ${Prefix}`);
+  });

28-45: Unify logging via out().log for consistency

Since this is a utils script (where console logs are allowed), consider using the same out().log stream for all informational logs to keep formatting consistent with other tools.

-  console.log(
-    `Getting first level subdirectories w/ prefix ${filePrefix}, bucket ${bucketName}...`
-  );
+  log(
+    `Getting first level subdirectories w/ prefix ${filePrefix}, bucket ${bucketName}...`
+  );
packages/utils/src/s3/list-objects.ts (1)

40-41: Retry listing with higher attempts like other scripts

Align retries with your other S3 list script to improve resilience on transient S3 throttling.

-  const objs = await s3.listObjects(bucketName, filePrefix);
+  const objs = await s3.listObjects(bucketName, filePrefix, { maxAttempts: 10 });
packages/core/src/external/aws/s3.ts (1)

593-621: Add retries/options and normalize prefix for first-level listing

Two small improvements:

  • Allow callers to control retry attempts like listObjects (pass options into executeWithRetriesS3).
  • Optionally normalize prefix to ensure it ends with “/” to avoid accidental sibling matches when a caller omits the trailing slash.
-  async listFirstLevelSubdirectories({
-    bucket,
-    prefix,
-    delimiter = "/",
-  }: {
-    bucket: string;
-    prefix: string;
-    delimiter?: string | undefined;
-  }): Promise<CommonPrefix[]> {
+  async listFirstLevelSubdirectories({
+    bucket,
+    prefix,
+    delimiter = "/",
+    options = {},
+  }: {
+    bucket: string;
+    prefix: string;
+    delimiter?: string | undefined;
+    options?: { maxAttempts?: number };
+  }): Promise<CommonPrefix[]> {
-    const allObjects: CommonPrefix[] = [];
+    const allObjects: CommonPrefix[] = [];
+    const normalizedPrefix = prefix.endsWith("/") ? prefix : `${prefix}/`;
     let continuationToken: string | undefined;
     do {
-      const res = await executeWithRetriesS3(() =>
-        this._s3Client.send(
+      const res = await executeWithRetriesS3(
+        () => this._s3Client.send(
           new ListObjectsV2Command({
             Bucket: bucket,
-            Prefix: prefix,
+            Prefix: normalizedPrefix,
             ...(delimiter ? { Delimiter: delimiter } : {}),
             ...(continuationToken ? { ContinuationToken: continuationToken } : {}),
           })
-        )
-      );
+        ),
+        options
+      );
packages/core/src/command/analytics-platform/merge-csvs/index.ts (2)

200-213: Avoid re-creating S3 clients inside the loop

Instantiate S3Utils once and reuse it across patients to reduce overhead.

-  await executeAsynchronously(
-    patientIds,
-    async patientId => {
-      const s3Utils = new S3Utils(region);
+  const s3Utils = new S3Utils(region);
+  await executeAsynchronously(
+    patientIds,
+    async patientId => {
       const prefixToSearch = buildFhirToCsvPatientPrefix({
         cxId,
         jobId: fhirToCsvJobId,
         patientId,
       });
       const patientFiles = await s3Utils.listObjects(sourceBucket, prefixToSearch, {
         maxAttempts: 10,
       });
       rawFileList.push(...patientFiles);
     },

418-425: Propagate gzip errors during per-file piping

If gzip errors while a file is piping, the awaiting promise won’t catch it. Add a handler to reject on gzip error here too.

     await new Promise<void>((resolve, reject) => {
       s3ReadStream.on("end", () => resolve());
       s3ReadStream.on("error", reject);
+      gzip.on("error", reject);
     });
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled
  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 674bf4c and 244f2bb.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (8)
  • packages/core/package.json (5 hunks)
  • packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts (1 hunks)
  • packages/core/src/command/analytics-platform/merge-csvs/index.ts (1 hunks)
  • packages/core/src/external/aws/lambda.ts (8 hunks)
  • packages/core/src/external/aws/s3.ts (2 hunks)
  • packages/utils/package.json (3 hunks)
  • packages/utils/src/s3/list-first-level-subdirectories.ts (1 hunks)
  • packages/utils/src/s3/list-objects.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/utils/package.json
  • packages/core/package.json
  • packages/core/src/external/aws/lambda.ts
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{js,jsx,ts,tsx}

📄 CodeRabbit inference engine (.cursorrules)

**/*.{js,jsx,ts,tsx}: Don’t use null inside the app, only on code interacting with external interfaces/services, like DB and HTTP; convert to undefined before sending inwards into the code
Use const whenever possible
Use async/await instead of .then()
Naming: classes, enums: PascalCase
Naming: constants, variables, functions: camelCase
Naming: file names: kebab-case
Naming: Don’t use negative names, like notEnabled, prefer isDisabled
If possible, use decomposing objects for function parameters
Prefer Nullish Coalesce (??) than the OR operator (||) when you want to provide a default value
Avoid creating arrow functions
Use truthy syntax instead of in - i.e., if (data.link) not if ('link' in data)
While handling errors, keep the stack trace around: if you create a new Error (e.g., MetriportError), make sure to pass the original error as the new one’s cause so the stack trace is available upstream.
max column length is 100 chars
multi-line comments use /** */
top-level comments go after the import (save pre-import to basic file header, like license)
move literals to constants declared after imports when possible

Files:

  • packages/utils/src/s3/list-first-level-subdirectories.ts
  • packages/utils/src/s3/list-objects.ts
  • packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts
  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
  • packages/core/src/external/aws/s3.ts
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursorrules)

Use types whenever possible

Files:

  • packages/utils/src/s3/list-first-level-subdirectories.ts
  • packages/utils/src/s3/list-objects.ts
  • packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts
  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
  • packages/core/src/external/aws/s3.ts
**/*.ts

⚙️ CodeRabbit configuration file

**/*.ts: - Use the Onion Pattern to organize a package's code in layers

  • Try to use immutable code and avoid sharing state across different functions, objects, and systems
  • Try to build code that's idempotent whenever possible
  • Prefer functional programming style functions: small, deterministic, 1 input, 1 output
  • Minimize coupling / dependencies
  • Avoid modifying objects received as parameter
  • Only add comments to code to explain why something was done, not how it works
  • Naming
    • classes, enums: PascalCase
    • constants, variables, functions: camelCase
    • file names: kebab-case
    • table and column names: snake_case
    • Use meaningful names, so whoever is reading the code understands what it means
    • Don’t use negative names, like notEnabled, prefer isDisabled
    • For numeric values, if the type doesn’t convey the unit, add the unit to the name
  • Typescript
    • Use types
    • Prefer const instead of let
    • Avoid any and casting from any to other types
    • Type predicates: only applicable to narrow down the type, not to force a complete type conversion
    • Prefer deconstructing parameters for functions instead of multiple parameters that might be of
      the same type
    • Don’t use null inside the app, only on code interacting with external interfaces/services,
      like DB and HTTP; convert to undefined before sending inwards into the code
    • Use async/await instead of .then()
    • Use the strict equality operator ===, don’t use abstract equality operator ==
    • When calling a Promise-returning function asynchronously (i.e., not awaiting), use .catch() to
      handle errors (see processAsyncError and emptyFunction depending on the case)
    • Date and Time
      • Always use buildDayjs() to create dayjs instances
      • Prefer dayjs.duration(...) to create duration consts and keep them as duration
  • Prefer Nullish Coalesce (??) than the OR operator (||) to provide a default value
  • Avoid creating arrow functions
  • U...

Files:

  • packages/utils/src/s3/list-first-level-subdirectories.ts
  • packages/utils/src/s3/list-objects.ts
  • packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts
  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
  • packages/core/src/external/aws/s3.ts
🧠 Learnings (24)
📓 Common learnings
Learnt from: leite08
PR: metriport/metriport#4255
File: packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts:19-20
Timestamp: 2025-07-27T00:40:32.149Z
Learning: User leite08 acknowledged the duplicate inputBundle spread issue in packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts and will fix it in a follow-up PR rather than in the current patch release PR #4255.
Learnt from: leite08
PR: metriport/metriport#4350
File: packages/utils/src/analytics-platform/2-merge-csvs.ts:58-61
Timestamp: 2025-08-20T12:40:34.100Z
Learning: In packages/utils/src/analytics-platform/2-merge-csvs.ts, the constant messageGroupId "merge-csvs" for SQS FIFO processing is intentional by user leite08, indicating they want sequential processing rather than concurrent chunk processing for the merge CSV workflow.
Learnt from: leite08
PR: metriport/metriport#4013
File: packages/core/src/fhir-to-cda/cda-templates/components/procedures.ts:104-105
Timestamp: 2025-06-12T22:53:09.606Z
Learning: User leite08 prefers responses in English only.
Learnt from: leite08
PR: metriport/metriport#3991
File: packages/core/src/external/aws/s3.ts:621-630
Timestamp: 2025-06-10T22:20:21.203Z
Learning: User leite08 prefers that all review comments are written in English.
📚 Learning: 2025-08-27T21:24:19.963Z
Learnt from: leite08
PR: metriport/metriport#4350
File: packages/utils/src/analytics-platform/1-fhir-to-csv.ts:129-133
Timestamp: 2025-08-27T21:24:19.963Z
Learning: In packages/utils/src/analytics-platform/1-fhir-to-csv.ts, the SQS FIFO messageDeduplicationId is intentionally set to just patientId (not scoped to jobId) to prevent any duplicate patient processing across job runs and retries, as confirmed by leite08. This ensures strict patient-level idempotency in the analytics pipeline.

Applied to files:

  • packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts
  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
📚 Learning: 2025-05-20T21:26:26.804Z
Learnt from: leite08
PR: metriport/metriport#3814
File: packages/api/src/routes/internal/medical/patient-consolidated.ts:141-174
Timestamp: 2025-05-20T21:26:26.804Z
Learning: The functionality introduced in packages/api/src/routes/internal/medical/patient-consolidated.ts is planned to be refactored in downstream PR #3857, including improvements to error handling and validation.

Applied to files:

  • packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts
📚 Learning: 2025-07-27T00:40:32.149Z
Learnt from: leite08
PR: metriport/metriport#4255
File: packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts:19-20
Timestamp: 2025-07-27T00:40:32.149Z
Learning: User leite08 acknowledged the duplicate inputBundle spread issue in packages/core/src/command/analytics-platform/fhir-to-csv/command/fhir-to-csv/fhir-to-csv-direct.ts and will fix it in a follow-up PR rather than in the current patch release PR #4255.

Applied to files:

  • packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts
  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
📚 Learning: 2025-08-26T02:33:26.777Z
Learnt from: leite08
PR: metriport/metriport#4469
File: packages/fhir-converter/src/lib/workers/worker.js:75-87
Timestamp: 2025-08-26T02:33:26.777Z
Learning: User leite08 confirmed that the debug logging approach in packages/fhir-converter/src/lib/workers/worker.js was already reviewed and approved in the feature PR, even though it logs both error messages and stack traces.

Applied to files:

  • packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts
  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
📚 Learning: 2025-07-18T09:33:29.581Z
Learnt from: thomasyopes
PR: metriport/metriport#4165
File: packages/api/src/routes/internal/analytics-platform/index.ts:28-28
Timestamp: 2025-07-18T09:33:29.581Z
Learning: In packages/api/src/routes/internal/analytics-platform/index.ts, the patientId parameter handling is intentionally inconsistent: the `/fhir-to-csv` and `/fhir-to-csv/transform` endpoints require patientId (using getFromQueryOrFail) for single patient processing, while the `/fhir-to-csv/batch` endpoint treats patientId as optional (using getFromQuery) because it's designed to run over all patients when no specific patient ID is provided.

Applied to files:

  • packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts
📚 Learning: 2025-07-16T00:54:56.156Z
Learnt from: leite08
PR: metriport/metriport#4193
File: packages/carequality-sdk/src/client/carequality-fhir.ts:198-208
Timestamp: 2025-07-16T00:54:56.156Z
Learning: User leite08 accepted the risk of parameter mutation in packages/carequality-sdk/src/client/carequality-fhir.ts when modifying org.active = false directly on input parameters, choosing to proceed despite the violation of the "Avoid modifying objects received as parameter" coding guideline.

Applied to files:

  • packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts
📚 Learning: 2025-08-25T23:28:41.385Z
Learnt from: keshavsaharia
PR: metriport/metriport#4459
File: packages/core/src/external/quest/fhir/patient.ts:22-24
Timestamp: 2025-08-25T23:28:41.385Z
Learning: FHIR resources should have their ID field determined by `uuidv7()` generated UUIDs. The import should be: `import { uuidv7 } from "metriport/shared/util/uuid-v7";`. External system IDs should not be used directly as FHIR resource IDs, even when sanitized, but should instead be preserved in the identifier field for reference mapping.

Applied to files:

  • packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts
📚 Learning: 2025-08-20T12:40:34.100Z
Learnt from: leite08
PR: metriport/metriport#4350
File: packages/utils/src/analytics-platform/2-merge-csvs.ts:58-61
Timestamp: 2025-08-20T12:40:34.100Z
Learning: In packages/utils/src/analytics-platform/2-merge-csvs.ts, the constant messageGroupId "merge-csvs" for SQS FIFO processing is intentional by user leite08, indicating they want sequential processing rather than concurrent chunk processing for the merge CSV workflow.

Applied to files:

  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
📚 Learning: 2025-08-15T03:55:25.197Z
Learnt from: leite08
PR: metriport/metriport#4350
File: packages/core/src/command/analytics-platform/merge-csvs/index.ts:336-363
Timestamp: 2025-08-15T03:55:25.197Z
Learning: For bulk upload operations (non-transactional flows), logging error details is sufficient and detailed error aggregation in thrown exceptions is not necessary, unlike transactional flows where detailed error information in exceptions may be more important.

Applied to files:

  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
📚 Learning: 2025-04-24T02:02:14.646Z
Learnt from: leite08
PR: metriport/metriport#3412
File: packages/core/src/command/patient-import/steps/result/patient-import-result-cloud.ts:16-18
Timestamp: 2025-04-24T02:02:14.646Z
Learning: When logging payloads, it's acceptable to include them in logs if they only contain non-sensitive identifiers (like IDs) and are small enough to not create multi-line logs.

Applied to files:

  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
📚 Learning: 2025-08-25T15:52:07.834Z
Learnt from: leite08
PR: metriport/metriport#4451
File: packages/core/src/external/ehr/api/job/start-contribute-bundles-job.ts:44-45
Timestamp: 2025-08-25T15:52:07.834Z
Learning: User leite08 emphasizes the importance of logging 4xx errors for operational visibility and filtering in log insights, even when treating them as no-ops.

Applied to files:

  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
📚 Learning: 2025-08-15T18:57:29.030Z
Learnt from: lucasdellabella
PR: metriport/metriport#4382
File: packages/utils/src/hl7v2-notifications/reprocess-adt-conversion-bundles/common.ts:36-48
Timestamp: 2025-08-15T18:57:29.030Z
Learning: In utility scripts like packages/utils/src/hl7v2-notifications/reprocess-adt-conversion-bundles/common.ts, user lucasdellabella prefers to let bundle parsing errors fail fast rather than adding error handling, as it provides immediate visibility into data issues during script execution.

Applied to files:

  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
📚 Learning: 2025-04-10T21:17:35.036Z
Learnt from: husainsyed
PR: metriport/metriport#3655
File: packages/shared/src/validators/driver-license/index.ts:1-89
Timestamp: 2025-04-10T21:17:35.036Z
Learning: MetriportError constructor takes three separate parameters: message (string), cause (optional unknown), and additionalInfo (optional object), not a single object with these properties. The correct instantiation is: new MetriportError(message, cause, additionalInfo).

Applied to files:

  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
📚 Learning: 2025-03-29T00:01:59.998Z
Learnt from: thomasyopes
PR: metriport/metriport#3545
File: packages/core/src/command/write-to-storage/s3/write-to-s3-cloud.ts:12-20
Timestamp: 2025-03-29T00:01:59.998Z
Learning: In the Metriport codebase, error handling is often managed at higher levels in the call stack ("further up the chain") rather than within individual methods.

Applied to files:

  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
📚 Learning: 2025-04-12T23:51:31.884Z
Learnt from: lucasdellabella
PR: metriport/metriport#3576
File: packages/core/src/command/hl7-notification/hl7-notification-router-cloud.ts:10-33
Timestamp: 2025-04-12T23:51:31.884Z
Learning: In the Metriport codebase, the preferred approach for error handling is to let errors bubble up to be caught by higher-level components rather than handling them within lower-level methods, especially in the HL7 notification router implementations.

Applied to files:

  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
📚 Learning: 2025-04-24T02:03:04.234Z
Learnt from: leite08
PR: metriport/metriport#3412
File: packages/api/src/command/medical/patient/patient-import/mapping/create.ts:0-0
Timestamp: 2025-04-24T02:03:04.234Z
Learning: For the Metriport codebase, errors should be allowed to bubble up instead of being caught and wrapped at each function level. This is their established error handling pattern.

Applied to files:

  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
📚 Learning: 2025-05-08T19:41:36.533Z
Learnt from: thomasyopes
PR: metriport/metriport#3788
File: packages/api/src/external/ehr/shared/utils/bundle.ts:83-93
Timestamp: 2025-05-08T19:41:36.533Z
Learning: In the Metriport codebase, the team prefers to let errors bubble up naturally in some cases rather than adding explicit error handling at every layer, as demonstrated in the refreshEhrBundle function in the bundle.ts file.

Applied to files:

  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
📚 Learning: 2025-04-23T21:31:01.551Z
Learnt from: leite08
PR: metriport/metriport#3709
File: packages/api/src/external/ehr/canvas/command/bundle/fetch-ehr-bundle.ts:63-72
Timestamp: 2025-04-23T21:31:01.551Z
Learning: While the general pattern in the metriport codebase is to let errors bubble up from factories to callers, non-idempotent operations might require special, individual error handling at the factory implementation level to ensure data consistency and prevent unintended side effects.

Applied to files:

  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
📚 Learning: 2025-06-04T00:13:25.034Z
Learnt from: leite08
PR: metriport/metriport#3912
File: packages/core/src/external/opensearch/lexical/fhir-searcher.ts:160-173
Timestamp: 2025-06-04T00:13:25.034Z
Learning: In the Metriport codebase, when application code fails, there's no need to worry about publishing metrics. Metrics reporting is not a priority when the main application logic encounters errors.

Applied to files:

  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
📚 Learning: 2025-03-20T22:39:22.897Z
Learnt from: thomasyopes
PR: metriport/metriport#3489
File: packages/api/src/routes/ehr/elation/auth/middleware.ts:0-0
Timestamp: 2025-03-20T22:39:22.897Z
Learning: For Metriport's authentication/authorization code, keep error messages generic (like using a plain ForbiddenError without details) to avoid exposing internal details to clients.

Applied to files:

  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
📚 Learning: 2025-04-28T23:10:42.561Z
Learnt from: leite08
PR: metriport/metriport#3749
File: packages/api/src/command/medical/patient/patient-import/get.ts:70-74
Timestamp: 2025-04-28T23:10:42.561Z
Learning: Error messages should have static messages. Dynamic data like IDs should be added to MetriportError's `additionalInfo` property rather than interpolated into the error message string.

Applied to files:

  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
📚 Learning: 2025-05-08T19:41:19.510Z
Learnt from: thomasyopes
PR: metriport/metriport#3788
File: packages/api/src/external/ehr/shared/utils/bundle.ts:73-82
Timestamp: 2025-05-08T19:41:19.510Z
Learning: For the Metriport codebase, the team prefers letting errors bubble up the call stack naturally rather than adding explicit try-catch blocks at lower levels of abstraction.

Applied to files:

  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
📚 Learning: 2025-04-16T15:45:45.042Z
Learnt from: leite08
PR: metriport/metriport#3590
File: packages/core/src/command/consolidated/contribution-bundle-create.ts:215-215
Timestamp: 2025-04-16T15:45:45.042Z
Learning: In TypeScript code, all functions should include explicit return types, even when they return `void` or `Promise<void>`. For example, in `uploadFullContributionBundle`, the return type should be explicitly defined (e.g., `Promise<void>`). Explicit return types improve code readability, type safety, and maintainability.

Applied to files:

  • packages/core/src/command/analytics-platform/merge-csvs/index.ts
🧬 Code graph analysis (4)
packages/utils/src/s3/list-first-level-subdirectories.ts (4)
packages/shared/src/common/env-var.ts (1)
  • getEnvVarOrFail (14-20)
packages/core/src/util/log.ts (2)
  • out (30-35)
  • log (10-23)
packages/shared/src/common/date.ts (1)
  • buildDayjs (100-102)
packages/core/src/external/aws/s3.ts (2)
  • s3 (156-158)
  • S3Utils (141-622)
packages/utils/src/s3/list-objects.ts (5)
packages/shared/src/common/env-var.ts (1)
  • getEnvVarOrFail (14-20)
packages/shared/src/index.ts (1)
  • sleep (13-13)
packages/core/src/util/log.ts (2)
  • out (30-35)
  • log (10-23)
packages/shared/src/common/date.ts (1)
  • buildDayjs (100-102)
packages/core/src/external/aws/s3.ts (1)
  • s3 (156-158)
packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts (1)
packages/shared/src/index.ts (1)
  • MetriportError (43-43)
packages/core/src/command/analytics-platform/merge-csvs/index.ts (6)
packages/shared/src/util/uuid-v7.ts (1)
  • uuidv4 (384-386)
packages/core/src/util/log.ts (2)
  • out (30-35)
  • log (10-23)
packages/core/src/external/aws/s3.ts (2)
  • S3Utils (141-622)
  • executeWithRetriesS3 (95-112)
packages/core/src/command/analytics-platform/merge-csvs/file-name.ts (3)
  • buildMergeCsvsJobPrefix (5-7)
  • buildMergeInfoPrefix (9-11)
  • buildMergeCsvsFileGroupKey (13-24)
packages/core/src/command/analytics-platform/fhir-to-csv/file-name.ts (2)
  • buildFhirToCsvPatientPrefix (11-21)
  • parseTableNameFromFhirToCsvFileKey (23-30)
packages/shared/src/index.ts (2)
  • errorToString (45-45)
  • MetriportError (43-43)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: Analyze (javascript)
🔇 Additional comments (3)
packages/utils/src/s3/list-objects.ts (1)

31-33: Good UX: clear usage + exit on bad args

The early usage message and exit(1) look good.

packages/core/src/external/aws/s3.ts (1)

551-576: listObjects: options passthrough with retries LGTM

Adding the options param (e.g., maxAttempts) and wiring it to executeWithRetriesS3 is solid and backward compatible.

packages/core/src/command/analytics-platform/merge-csvs/index.ts (1)

343-371: Inner throw won’t abort remaining tasks—no change needed
executeAsynchronously continues running all callbacks even if one rejects, so rethrowing inside the worker does not short-circuit the loop.

Likely an incorrect or invalid review comment.

Comment on lines +198 to +213
const rawFileList: AWS.S3.ObjectList = [];
await executeAsynchronously(
patientIds,
async patientId => {
const s3Utils = new S3Utils(region);
const prefixToSearch = buildFhirToCsvPatientPrefix({
cxId,
jobId: fhirToCsvJobId,
patientId,
});
const patientFiles = await s3Utils.listObjects(sourceBucket, prefixToSearch, {
maxAttempts: 10,
});
rawFileList.push(...patientFiles);
},
{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Type error: AWS types used without import

AWS.S3.ObjectList is referenced but AWS isn’t imported in this module, which will fail type-checking unless you rely on non-standard global types.

Two fixes (pick one):

  • Import types only:
+import type * as AWS from "aws-sdk";
  • Or avoid SDK types here:
-  const rawFileList: AWS.S3.ObjectList = [];
+  const rawFileList: { Key?: string; Size?: number }[] = [];
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const rawFileList: AWS.S3.ObjectList = [];
await executeAsynchronously(
patientIds,
async patientId => {
const s3Utils = new S3Utils(region);
const prefixToSearch = buildFhirToCsvPatientPrefix({
cxId,
jobId: fhirToCsvJobId,
patientId,
});
const patientFiles = await s3Utils.listObjects(sourceBucket, prefixToSearch, {
maxAttempts: 10,
});
rawFileList.push(...patientFiles);
},
{
// At the top of packages/core/src/command/analytics-platform/merge-csvs/index.ts, add:
import type * as AWS from "aws-sdk";
const rawFileList: AWS.S3.ObjectList = [];
await executeAsynchronously(
patientIds,
async patientId => {
const s3Utils = new S3Utils(region);
const prefixToSearch = buildFhirToCsvPatientPrefix({
cxId,
jobId: fhirToCsvJobId,
patientId,
});
const patientFiles = await s3Utils.listObjects(sourceBucket, prefixToSearch, {
maxAttempts: 10,
});
rawFileList.push(...patientFiles);
},
{
// …
}
);
🤖 Prompt for AI Agents
packages/core/src/command/analytics-platform/merge-csvs/index.ts around lines
198-213: the code references AWS.S3.ObjectList but never imports AWS types,
causing a TypeScript error; fix by either adding a types-only import (e.g.
import type { S3 } from 'aws-sdk'; then use S3.ObjectList for rawFileList) or
remove SDK-specific typing and use a local/primitive type (e.g. const
rawFileList: unknown[] or Array<Record<string, any>>) so no AWS types are
required; apply one of these two options consistently where AWS types are
referenced in this file.

Comment on lines +442 to +453
const gzippedContent = Buffer.concat(compressedChunks);
const totalSize = gzippedContent.length;
compressedChunks = [];

log(`Compressed, uploading to S3...`);
await s3Utils.uploadFile({
bucket: destinationBucket,
key: outputKey,
file: gzippedContent,
contentType: "application/gzip",
});

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid buffering entire gzip into memory; stream-upload to S3

Buffering gzipped content for large merges risks OOM in Lambda. Stream gzip output directly to S3 using a PassThrough and multipart upload.

-  const gzippedContent = Buffer.concat(compressedChunks);
-  const totalSize = gzippedContent.length;
-  compressedChunks = [];
-
-  log(`Compressed, uploading to S3...`);
-  await s3Utils.uploadFile({
-    bucket: destinationBucket,
-    key: outputKey,
-    file: gzippedContent,
-    contentType: "application/gzip",
-  });
+  const totalSize = Buffer.concat(compressedChunks).length; // if you still want an approximate size
+  compressedChunks = []; // release memory ASAP
+
+  log(`Compressed, uploading to S3 (streaming)...`);
+  const { PassThrough } = await import("stream");
+  const pass = new PassThrough();
+  // start upload before piping
+  const uploadPromise = executeWithRetriesS3(() =>
+    s3Utils.s3
+      .upload({
+        Bucket: destinationBucket,
+        Key: outputKey,
+        Body: pass,
+        ContentType: "application/gzip",
+      })
+      .promise(),
+    { maxAttempts: 5 }
+  );
+  // pipe gzip output to S3
+  gzip.pipe(pass);
+  // gzip was already ended above; wait for upload to complete
+  await uploadPromise;

If you need exact compressed size, HEAD the object after upload or compute while streaming (e.g., wrap a counting stream).

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In packages/core/src/command/analytics-platform/merge-csvs/index.ts around lines
442 to 453, the code currently concatenates compressedChunks into a single
Buffer and uploads it, which can OOM for large files; replace the buffer-based
approach by piping the gzip output into a streamable upload: create a
PassThrough (or similar Readable) and stream the gzip compressor output into it,
then call S3 multipart upload / s3.upload with the PassThrough as the Body to
avoid buffering the whole payload in memory. If you need the final compressed
size, either wrap the stream with a counting transform to record bytes as you
stream, or perform a HEAD request after upload to read Content-Length; clear
compressedChunks and avoid Buffer.concat altogether. Ensure error/finish events
propagate so failed uploads are handled and the stream is closed on error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants