-
Notifications
You must be signed in to change notification settings - Fork 72
2330 Bulk import v1 #3412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2330 Bulk import v1 #3412
Conversation
WalkthroughThis update introduces a comprehensive redesign and expansion of the bulk patient import feature across the system. It adds new database tables and models for tracking bulk import jobs and patient mappings, implements robust S3 storage and retrieval utilities, and refactors the entire import pipeline to support concurrency-safe status updates, granular error handling, and flexible dry-run operations. The API, Lambda handlers, and CLI utilities are updated to use new domain types, schemas, and endpoint structures. Webhook support is extended to cover bulk imports, and new endpoints and internal routes are provided for job lifecycle management. Supporting libraries and shared utilities are enhanced with new types, error handling, and configuration mechanisms. Numerous tests, documentation updates, and infrastructure changes accompany the core logic revisions. Changes
Sequence Diagram(s)Bulk Patient Import Job LifecyclesequenceDiagram
participant Client
participant API
participant DB
participant S3
participant LambdaParse
participant LambdaResult
participant Webhook
Client->>API: POST /patient/bulk (create job)
API->>DB: Insert patient_import job
API->>S3: Create job record & presigned upload URL
API-->>Client: Return job info + upload URL
Client->>S3: Upload CSV via presigned URL
S3-->>LambdaParse: S3 event triggers parse Lambda
LambdaParse->>S3: Read uploaded CSV
LambdaParse->>API: Update job status to 'processing'
LambdaParse->>S3: Store parsed patient records
LambdaParse->>API: Update job status, counters
alt Dry run
LambdaParse->>LambdaResult: Trigger result Lambda (dry run)
else Not dry run
LambdaParse->>API: For each patient, create & map
API->>S3: Update patient record status
API->>API: Update job counters
API->>LambdaResult: Trigger result Lambda when done
end
LambdaResult->>S3: Consolidate results, store result CSV
LambdaResult->>API: Update job status to 'completed'
LambdaResult->>Webhook: Send bulk patient import webhook (with result URL)
Bulk Patient Import Webhook FlowsequenceDiagram
participant API
participant Webhook
participant S3
API->>S3: Generate presigned URL for result CSV
API->>Webhook: POST webhook with jobId, status, result URL
Webhook-->>API: 200 OK
Patient Record Status Update (Concurrency-Safe)sequenceDiagram
participant Lambda
participant API
participant DB
Lambda->>API: Update patient entry status (successful/failed)
API->>DB: Atomic increment of counters
API->>DB: Fetch updated job status
API-->>Lambda: Return updated counters/status
Note: Additional diagrams could be produced for mapping creation, error flows, or detailed S3 interactions if needed. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
0bc2b63
to
86774f6
Compare
095480c
to
5d38778
Compare
8f2411a
to
bd37d95
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to packages/shared/src/domain/patient/patient-import/schemas.ts
2ffaad8
to
7535f46
Compare
f41f74f
to
ac55bb4
Compare
7535f46
to
2358ee9
Compare
ac55bb4
to
72dee67
Compare
1536c2c
to
b74f9d8
Compare
72dee67
to
02d642a
Compare
Ref. metriport/metriport-internal#2330 Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
Ref. metriport/metriport-internal#2330 Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (10)
packages/api/src/sequelize/migrations/2025-04-28_02_create-patient_import_mapping-table.ts (2)
11-13
: Comment should be more descriptive.The comment "Use 'Promise.all' when changes are independent of each other" doesn't match what the code is actually doing. This migration is using sequential operations within a transaction, not parallel Promise.all execution.
-// Use 'Promise.all' when changes are independent of each other +// Create patient import mapping table to associate patient records with bulk import jobs
51-57
: Use dynamic constraint name from PostgreSQL.Rather than hardcoding the constraint name, let PostgreSQL dynamically generate it to avoid potential naming conflicts when the migration is applied to different environments.
- await queryInterface.addConstraint(tableName, { - name: constraintName, - fields: constraintFieldNames, - type: "unique", - transaction, - }); + await queryInterface.addConstraint(tableName, { + fields: constraintFieldNames, + type: "unique", + transaction, + });packages/api/src/sequelize/migrations/2025-04-28_01_alter-patient_import-add-change-log.ts (3)
15-16
: Comment doesn't match implementation.The comment suggests using Promise.all for independent changes, but the implementation doesn't use Promise.all since there's only a single operation.
-// Use 'Promise.all' when changes are independent of each other +// Add change logging trigger to patient_import table
16-20
: Consider validating trigger existence before creation.The migration could fail if the trigger already exists. Consider adding a condition to check if the trigger already exists before trying to create it.
export const up: Migration = async ({ context: queryInterface }) => { await queryInterface.sequelize.transaction(async transaction => { + // Check if trigger exists first + const [results] = await queryInterface.sequelize.query( + `SELECT 1 FROM pg_trigger WHERE tgname = '${triggerName}'`, + { transaction } + ); + if ((results as any[]).length === 0) { await queryInterface.sequelize.query(getChangeTriggerCreateCmd(), { transaction }); + } }); };
22-26
: Consider using a transaction-aware query for down migration.For consistency with the approach in
up
, consider using the raw query approach withsequelize.query
instead of thedropTrigger
method, as this ensures consistent transaction handling.export const down: Migration = ({ context: queryInterface }) => { return queryInterface.sequelize.transaction(async transaction => { - await queryInterface.dropTrigger(tableName, triggerName, { transaction }); + await queryInterface.sequelize.query( + `DROP TRIGGER IF EXISTS ${triggerName} ON ${tableName}`, + { transaction } + ); }); };packages/api/src/sequelize/migrations/2025-04-28_00_create-patient_import-table.ts (5)
7-8
: Comment doesn't match implementation.The comment suggests using Promise.all for independent changes, but the implementation doesn't use Promise.all since there's only a single createTable operation.
-// Use 'Promise.all' when changes are independent of each other +// Create patient_import table to store bulk import job status and metadata
33-36
: Consider adding length limit to reason field.The
reason
field is defined as a string without length constraints. Consider adding a maximum length to prevent excessively large values.reason: { type: DataTypes.STRING, + // Specify reasonable length limit for error messages + type: DataTypes.STRING(1024), allowNull: true, },
47-61
: Add comments explaining counter fields.The counter fields (
total
,successful
,failed
) would benefit from brief comments explaining their purpose in the context of patient imports.total: { type: DataTypes.INTEGER, allowNull: false, defaultValue: 0, + // Total number of patient records in the import job }, successful: { type: DataTypes.INTEGER, allowNull: false, defaultValue: 0, + // Count of successfully imported patient records }, failed: { type: DataTypes.INTEGER, allowNull: false, defaultValue: 0, + // Count of failed patient record imports },
62-71
: Consider adding JSONB validation or schema documentation.The JSONB fields
paramsCx
andparamsOps
don't have any documentation about their expected structure. Consider adding comments documenting the expected schema.paramsCx: { type: DataTypes.JSONB, field: "params_cx", allowNull: false, + // Customer-specific parameters for the import job + // Expected schema: { ... } }, paramsOps: { type: DataTypes.JSONB, field: "params_ops", allowNull: false, + // Operational parameters for the import job + // Expected schema: { ... } },
13-72
: Consider adding indexes for frequent query patterns.The table might benefit from additional indexes on frequently queried columns like
cx_id
,facility_id
, orstatus
to improve query performance, especially for listing/filtering operations.// After creating the table await shared.createTable( // ... existing code ); +// Add indexes for frequently queried columns +await queryInterface.addIndex(jobTableName, { + fields: ["cx_id"], + transaction, +}); +await queryInterface.addIndex(jobTableName, { + fields: ["facility_id"], + transaction, +}); +await queryInterface.addIndex(jobTableName, { + fields: ["status"], + transaction, +});
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Lite
📒 Files selected for processing (3)
packages/api/src/sequelize/migrations/2025-04-28_00_create-patient_import-table.ts
(1 hunks)packages/api/src/sequelize/migrations/2025-04-28_01_alter-patient_import-add-change-log.ts
(1 hunks)packages/api/src/sequelize/migrations/2025-04-28_02_create-patient_import_mapping-table.ts
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.ts`: - Use the Onion Pattern to organize a package's code in layers - Try to use immutable code and avoid sharing state across different functions, objects, and systems - Try...
**/*.ts
: - Use the Onion Pattern to organize a package's code in layers
- Try to use immutable code and avoid sharing state across different functions, objects, and systems
- Try to build code that's idempotent whenever possible
- Prefer functional programming style functions: small, deterministic, 1 input, 1 output
- Minimize coupling / dependencies
- Avoid modifying objects received as parameter
- Only add comments to code to explain why something was done, not how it works
- Naming
- classes, enums:
PascalCase
- constants, variables, functions:
camelCase
- file names:
kebab-case
- table and column names:
snake_case
- Use meaningful names, so whoever is reading the code understands what it means
- Don’t use negative names, like
notEnabled
, preferisDisabled
- For numeric values, if the type doesn’t convey the unit, add the unit to the name
- Typescript
- Use types
- Prefer
const
instead oflet
- Avoid
any
and casting fromany
to other types- Type predicates: only applicable to narrow down the type, not to force a complete type conversion
- Prefer deconstructing parameters for functions instead of multiple parameters that might be of
the same type- Don’t use
null
inside the app, only on code interacting with external interfaces/services,
like DB and HTTP; convert toundefined
before sending inwards into the code- Use
async/await
instead of.then()
- Use the strict equality operator
===
, don’t use abstract equality operator==
- When calling a Promise-returning function asynchronously (i.e., not awaiting), use
.catch()
to
handle errors (seeprocessAsyncError
andemptyFunction
depending on the case)- Date and Time
- Always use
buildDayjs()
to createdayjs
instances- Prefer
dayjs.duration(...)
to create duration consts and keep them asduration
- Prefer Nullish Coalesce (??) than the OR operator (||) to provide a default value
- Avoid creating arrow functions
- Use truthy syntax instead of
in
- i.e.,if (data.link)
notif ('link' in data)
- Error handling
- Pass the original error as the new one’s
cause
so the stack trace is persisted- Error messages should have a static message - add dynamic data to MetriportError's
additionalInfo
prop- Avoid sending multiple events to Sentry for a single error
- Global constants and variables
- Move literals to constants declared after imports when possible (avoid magic numbers)
- Avoid shared, global objects
- Avoid using
console.log
andconsole.error
in packages other than utils, infra and shared,
and try to useout().log
instead- Avoid multi-line logs
- don't send objects as a second parameter to
console.log()
orout().log()
- don't create multi-line strings when using
JSON.stringify()
- Use
eslint
to enforce code style- Use
prettier
to format code- max column length is 100 chars
- multi-line comments use
/** */
- scripts: top-level comments go after the import
packages/api/src/sequelize/migrations/2025-04-28_01_alter-patient_import-add-change-log.ts
packages/api/src/sequelize/migrations/2025-04-28_00_create-patient_import-table.ts
packages/api/src/sequelize/migrations/2025-04-28_02_create-patient_import_mapping-table.ts
🧠 Learnings (1)
packages/api/src/sequelize/migrations/2025-04-28_02_create-patient_import_mapping-table.ts (1)
Learnt from: leite08
PR: metriport/metriport#3412
File: packages/api/src/models/medical/patient-import-mapping.ts:16-41
Timestamp: 2025-04-26T22:17:45.228Z
Learning: The patient_import_mapping table already has a unique constraint on the fields "job_id", "row_number", and "data_pipeline_request_id" defined in the migration file 2025-04-24_02_create-patient_import_mapping-table.ts, which ensures data integrity.
🧬 Code Graph Analysis (1)
packages/api/src/sequelize/migrations/2025-04-28_02_create-patient_import_mapping-table.ts (1)
packages/api/src/sequelize/migrations/2025-04-28_00_create-patient_import-table.ts (2)
up
(8-76)down
(78-82)
🔇 Additional comments (4)
packages/api/src/sequelize/migrations/2025-04-28_02_create-patient_import_mapping-table.ts (2)
65-71
: LGTM: Down migration handles teardown properly.The down migration correctly reverses the operations in the proper order: first removing the index, then the constraint, and finally dropping the table. This ensures clean rollback without foreign key or dependency errors.
8-9
:❌ Incorrect review comment
Potential conflict with existing migration.
Based on the retrieved learning, it appears this table and constraint were already defined in a migration from 2025-04-24 (
2025-04-24_02_create-patient_import_mapping-table.ts
). Verify that this isn't a duplicate migration to avoid conflicts.
🏁 Script executed:
#!/bin/bash # Check for potential duplicate migrations fd ".*_create-patient_import_mapping-table.ts" packages/api/src/sequelize/migrationsLength of output: 177
No duplicate migration found
Thefd
search inpackages/api/src/sequelize/migrations
only returns the2025-04-28_02_create-patient_import_mapping-table.ts
file—there is no earlier2025-04-24_02_create-patient_import_mapping-table.ts
. You can safely ignore the potential conflict raised.Likely an incorrect or invalid review comment.
packages/api/src/sequelize/migrations/2025-04-28_01_alter-patient_import-add-change-log.ts (1)
6-13
: LGTM: Well-structured function for trigger creation.The function
getChangeTriggerCreateCmd()
provides a clean way to generate the SQL command for creating the trigger, making the code more maintainable.packages/api/src/sequelize/migrations/2025-04-28_00_create-patient_import-table.ts (1)
78-82
: LGTM: Down migration properly drops the table.The down migration correctly drops the table within a transaction, ensuring proper cleanup.
Ticket: https://github.com/metriport/metriport-internal/issues/2330
Dependencies
Description
Testing
processing
statuscompleted
status and link to downloadresult.csv
fileprocessing
statuscompleted
status and link to downloadresult.csv
fileprocessing
statuscompleted
status and link to downloadresult.csv
fileRelease Plan
Summary by CodeRabbit
New Features
Improvements
Bug Fixes
Chores
Tests
Documentation