Skip to content

Conversation

leite08
Copy link
Member

@leite08 leite08 commented May 10, 2025

Dependencies

Description

Consolidated search by keyword.

Downstream PRs have

  • code to improve the export of FHIR resources to text (that gets ingested into OS for the search)
  • the remaining code to ingest as part of data-pipeline and have the ingestion and search in lambda.

Testing

  • staging
    • Get consolidated returns bundle
    • Trigger consolidated query regenerate a bundle
    • AI Summary is included in consolidated bundle

Release Plan

This has to be merged along downstream - it won't break anything is shipped without it, just that we shouldn't use the ingestion/search in production w/o downstream.

  • Merge upstream
  • Merge this
  • Merge downstream

Summary by CodeRabbit

  • New Features

    • Added internal API endpoints for managing consolidated patient queries and ingesting data into search indexes.
    • Introduced lexical search capability for consolidated patient data with optimized OpenSearch integration.
    • Added utilities for extracting and cleaning text from FHIR DiagnosticReport resources and HTML content.
    • Introduced standardized ID generation and bulk request creation for OpenSearch ingestion.
    • Added support for deleting patient data from OpenSearch indexes.
  • Improvements

    • Unified and modularized search and ingestion logic with enhanced logging and error handling.
    • Enhanced consolidated patient data retrieval with synchronous and asynchronous options.
    • Improved AI summarization and FHIR bundle slimming processes.
    • Updated configuration management and environment variables for search services.
    • Refined codeable concept formatting and deny-list filtering for medical coding systems.
    • Enhanced consolidated search endpoint with unified search logic and removed environment restrictions.
    • Improved ingestion error handling and added ingestion for lexical search index initialization.
  • Bug Fixes

    • Fixed issues in DiagnosticReport text extraction and note cleanup.
    • Corrected FHIR bundle total count handling and improved test coverage.
  • Refactor

    • Restructured OpenSearch modules, removing deprecated code and consolidating types.
    • Renamed and reorganized functions, classes, and imports for consistency and clarity.
    • Simplified content cleanup logic by replacing multiple regex steps with a dedicated HTML tag removal utility.
    • Reorganized import paths to deeper relative structures reflecting module reorganization.
    • Replaced local implementations with external core package imports for consolidated data handling.
  • Chores

    • Added documentation comments and new tests for formatting utilities.
    • Updated infrastructure scripts to use fully qualified HTTPS endpoints and added lexical search environment variables.
    • Adjusted import paths across packages to reflect new module organization.
    • Deprecated internal error processing function with guidance to use shared utilities.

Copy link

linear bot commented May 10, 2025

Copy link

coderabbitai bot commented May 10, 2025

Walkthrough

This update introduces a comprehensive refactor and modularization of the OpenSearch-based search and ingestion infrastructure for consolidated FHIR resources. Key changes include the introduction of new modules for lexical search, resource indexing, and ingestion, as well as the removal or relocation of legacy and redundant files. The API is updated to delegate consolidated patient operations to a new internal router, and environment/configuration handling is improved for search endpoints and index names. Additional enhancements include improved logging, new utility functions for text processing, and expanded test coverage for codeable concept formatting.

Changes

File(s) Change Summary
packages/api/src/command/hie/unlink-patient-from-organization.ts, packages/api/src/command/medical/patient/consolidated-delete.ts, packages/api/src/external/carequality/document/process-outbound-document-retrieval-resps.ts, packages/api/src/external/cda/generate-ccd.ts, packages/api/src/external/commonwell/document/document-query.ts, packages/api/src/routes/medical/document.ts, packages/api/src/routes/medical/patient.ts, packages/utils/src/open-search/cleanup.ts, packages/utils/src/open-search/re-populate.ts, packages/utils/src/open-search/semantic-ingest.ts, packages/utils/src/open-search/semantic-search.ts Updated import paths for OpenSearch and consolidated search/ingest modules to reflect new structure. Some logic changes to parameter passing in ingestion functions.
packages/api/src/command/medical/patient/consolidated-get.ts Removed local implementations of consolidated patient data functions, now imported from core module. Removed related types and unused imports.
packages/api/src/errors/index.ts Added deprecation JSDoc to processAsyncError.
packages/api/src/routes/internal/medical/patient-consolidated.ts Added new router with endpoints for consolidated patient query management and OpenSearch ingestion.
packages/api/src/routes/internal/medical/patient.ts Removed /internal/patient/consolidated/query route and related logic; now delegates to new consolidated router.
packages/api/src/shared/config.ts Removed static methods for search-related environment variables.
packages/core/src/command/ai-brief/filter.ts Improved function naming, reduced redundant serialization, exported buildSlimmerPayload, and updated internal helper function names for clarity.
packages/core/src/command/consolidated/consolidated-filter.ts Switched to new consolidated file getter, improved logging with timing, and refactored bundle entry construction.
packages/core/src/command/consolidated/consolidated-get.ts Added new types and functions for consolidated patient data retrieval (sync/async), renamed legacy function, and enhanced error handling.
packages/core/src/command/consolidated/get-snapshot.ts, packages/core/src/command/consolidated/snapshot-on-s3.ts, packages/core/src/external/fhir/consolidated/consolidated.ts Explicitly annotated optional requestId parameters as `string
packages/core/src/command/consolidated/search/document-reference/ingest.ts Changed ingestion function to accept entryId instead of full document, updated logging and parameter usage.
packages/core/src/command/consolidated/search/document-reference/search.ts Updated import paths, added JSDoc, and introduced local doc status helper functions.
packages/core/src/command/consolidated/search/fhir-resource/ingest-lexical.ts, packages/core/src/command/consolidated/search/fhir-resource/ingest-semantic.ts, packages/core/src/command/consolidated/search/fhir-resource/search-consolidated-direct.ts, packages/core/src/command/consolidated/search/fhir-resource/search-consolidated-factory.ts, packages/core/src/command/consolidated/search/fhir-resource/search-consolidated.ts, packages/core/src/command/consolidated/search/fhir-resource/search-lexical.ts Added new modules for lexical/semantic ingestion and search, including direct search implementation and factory.
packages/core/src/external/aws/document-signing/bulk-sign.ts Changed import path for searchDocuments.
packages/core/src/external/fhir/export/string/resources/diagnostic-report.ts Enhanced presented forms extraction, improved formatting, and added helper for single-line conversion.
packages/core/src/external/fhir/export/string/shared/__tests__/codeable-concept.test.ts Added comprehensive tests for codeable concept formatting.
packages/core/src/external/fhir/export/string/shared/codeable-concept.ts Refactored formatting logic for codeable concepts, improved handling of codings and text.
packages/core/src/external/fhir/export/string/shared/deny.ts Expanded deny list, added allowed system check, and exported new utility function.
packages/core/src/external/fhir/resources/diagnostic-report.ts New module for extracting and cleaning text from DiagnosticReport presented forms.
packages/core/src/external/html/remove-tags.ts New utility function for removing HTML tags and stopwords from text.
packages/core/src/external/opensearch/file/file-ingestor-direct.ts Refactored content cleanup to use new HTML tag remover, simplified logic, made index existence check public.
packages/core/src/external/opensearch/file/file-ingestor-sqs.ts, packages/core/src/external/opensearch/file/file-remover-direct.ts, packages/core/src/external/opensearch/file/file-search-connector-factory.ts Updated import paths, removed "https://" prefix from endpoint assignment.
packages/core/src/external/opensearch/file/file-ingestor.ts, packages/core/src/external/opensearch/file/file-searcher-direct.ts, packages/core/src/external/opensearch/file/file-searcher.ts Updated types, imports, and type annotations for searcher/ingestor classes and interfaces.
packages/core/src/external/opensearch/index-based-on-file.ts, packages/core/src/external/opensearch/index-based-on-resource.ts New modules defining index field types and search result types for file/resource-based indexing.
packages/core/src/external/opensearch/index.ts Removed document status/type helpers, refactored config and response types, added generics for OpenSearch responses.
packages/core/src/external/opensearch/lexical/lexical-search.ts, packages/core/src/external/opensearch/lexical/lexical-searcher.ts New modules for lexical search query construction and searcher class.
packages/core/src/external/opensearch/semantic/hybrid-create.ts New (commented) template for hybrid index creation.
packages/core/src/external/opensearch/semantic/semantic-searcher.ts Refactored to use shared types, updated class and type names, improved logging.
packages/core/src/external/opensearch/shared/bulk.ts Added utility for building OpenSearch bulk requests from resources.
packages/core/src/external/opensearch/shared/delete.ts New module for building OpenSearch delete queries.
packages/core/src/external/opensearch/shared/filters.ts New utility for generating patient-based filters.
packages/core/src/external/opensearch/shared/id.ts New utility for generating OpenSearch entry IDs.
packages/core/src/external/opensearch/shared/log.ts New logging utility for OpenSearch modules.
packages/core/src/external/opensearch/text-ingestor.ts Refactored ingestion class, improved logging, added delete support, updated types, and modularized helpers.
packages/core/src/util/config.ts Added method for lexical search index name retrieval.
packages/core/src/util/error/shared.ts Updated processAsyncError to support using message as error title.
packages/infra/lib/api-stack.ts, packages/infra/lib/api-stack/api-service.ts, packages/infra/lib/api-stack/ccda-search-connector.ts Updated search endpoint handling to use fully qualified URLs, added lexical index env var.
packages/lambdas/src/sqs-to-opensearch-xml.ts Updated import paths, removed "https://" prefix from endpoint.
packages/utils/src/fhir/ai-brief/build-slim-bundle.ts New script for building slim FHIR bundles from files.
packages/utils/src/open-search/semantic-text-from-consolidated.ts Enhanced script to support both local file and API-based modes for extracting consolidated text.
packages/core/src/external/ehr/bundle/job/create-resource-diff-bundles/steps/compute/ehr-compute-resource-diff-bundles-local.ts Updated to use new consolidated file getter.
packages/core/src/external/opensearch/cda.ts, packages/core/src/external/opensearch/file-searcher.ts, packages/core/src/external/opensearch/semantic/ingest.ts, packages/core/src/external/opensearch/semantic/shared.ts Deleted legacy modules for stopwords, file searcher types, semantic ingest, and shared semantic helpers.

Sequence Diagram(s)

Lexical Ingestion Flow

sequenceDiagram
    participant API
    participant IngestLexical
    participant OpenSearchTextIngestor
    participant S3
    participant Config

    API->>IngestLexical: ingestLexical({ patient, onItemError? })
    IngestLexical->>Config: get OpenSearch config
    IngestLexical->>OpenSearchTextIngestor: new OpenSearchTextIngestor(config)
    IngestLexical->>S3: getConsolidatedAsText(patient)
    IngestLexical->>OpenSearchTextIngestor: delete({ cxId, patientId })
    IngestLexical->>OpenSearchTextIngestor: ingestBulk({ cxId, patientId, resources })
    OpenSearchTextIngestor-->>IngestLexical: Map of error counts
    IngestLexical-->>API: Log result
Loading

Lexical Search Flow

sequenceDiagram
    participant API
    participant SearchConsolidatedDirect
    participant OpenSearchLexicalSearcher
    participant S3

    API->>SearchConsolidatedDirect: search({ patient, query })
    alt query present
        SearchConsolidatedDirect->>OpenSearchLexicalSearcher: search({ cxId, patientId, query })
        OpenSearchLexicalSearcher-->>SearchConsolidatedDirect: SearchResult[]
        SearchConsolidatedDirect->>S3: Upload search result and query
        S3-->>SearchConsolidatedDirect: Signed URL
        SearchConsolidatedDirect-->>API: { url, resourceCount }
    else no query
        SearchConsolidatedDirect->>getConsolidatedPatientData: getConsolidatedPatientData({ patient })
        getConsolidatedPatientData-->>SearchConsolidatedDirect: Bundle
        SearchConsolidatedDirect->>S3: Upload result
        S3-->>SearchConsolidatedDirect: Signed URL
        SearchConsolidatedDirect-->>API: { url, resourceCount }
    end
Loading

API Router Delegation for Consolidated Patient Operations

sequenceDiagram
    participant API_Router
    participant PatientConsolidatedRouter
    participant DB
    participant OpenSearch

    API_Router->>PatientConsolidatedRouter: POST /internal/patient/consolidated/query or /search/ingest
    PatientConsolidatedRouter->>DB: Query/update consolidated queries, patients
    PatientConsolidatedRouter->>OpenSearch: Ingest consolidated resources (if /search/ingest)
    PatientConsolidatedRouter-->>API_Router: JSON response
Loading

Possibly related PRs

  • ENG-191 Only load the bundle when converting #3692: The main PR and the retrieved PR both modify the handling and import of the getConsolidatedPatientData function and related consolidated patient data retrieval logic, including changes to how bundles are loaded and used during consolidation processes, indicating a direct relation in consolidating patient data retrieval and processing.
  • eng-41 patient in bundle as a bundle entry #3796: Both PRs modify how consolidated patient data is retrieved and bundled, with this PR refactoring imports and implementations and the related PR adjusting patient resource bundling in FHIR bundles.
  • feat(api): adding internal route to recreate consolidated bundle #3841: Related through consolidated patient data handling, this PR replaces local implementations with external imports while the related PR adds exports and internal API routes for consolidated bundle recreation.

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

npm warn config production Use --omit=dev instead.
npm error code ERR_SSL_WRONG_VERSION_NUMBER
npm error errno ERR_SSL_WRONG_VERSION_NUMBER
npm error request to https://10.0.0.28:4873/punycode/-/punycode-2.3.1.tgz failed, reason: C05CE922747F0000:error:0A00010B:SSL routines:ssl3_get_record:wrong version number:../deps/openssl/openssl/ssl/record/ssl3_record.c:354:
npm error
npm error A complete log of this run can be found in: /.npm/_logs/2025-05-23T19_53_49_634Z-debug-0.log

Note

⚡️ AI Code Reviews for VS Code, Cursor, Windsurf

CodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback.
Learn more here.

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@leite08 leite08 changed the base branch from develop to eng-41-improve-bulk-ingest May 10, 2025 17:14
@leite08 leite08 force-pushed the eng-41-improve-bulk-ingest branch from 7eca2da to 6eef068 Compare May 10, 2025 17:15
@leite08 leite08 force-pushed the eng-268-consolidated-search-by-keyword branch 8 times, most recently from 91b6c81 to 876006a Compare May 13, 2025 21:26
@leite08 leite08 force-pushed the eng-41-improve-bulk-ingest branch from 6eef068 to cc24edb Compare May 13, 2025 21:27
@leite08 leite08 force-pushed the eng-268-consolidated-search-by-keyword branch from 876006a to 207a162 Compare May 13, 2025 21:31
@leite08 leite08 force-pushed the eng-268-consolidated-search-by-keyword branch from 207a162 to 2710ec3 Compare May 13, 2025 21:44
@leite08 leite08 force-pushed the eng-41-improve-bulk-ingest branch from cc24edb to 4af4911 Compare May 14, 2025 15:41
@leite08 leite08 force-pushed the eng-268-consolidated-search-by-keyword branch from 2e93ca9 to a350d94 Compare May 14, 2025 15:43
@leite08 leite08 force-pushed the eng-41-improve-bulk-ingest branch from 4af4911 to c131be2 Compare May 16, 2025 12:52
@leite08 leite08 force-pushed the eng-268-consolidated-search-by-keyword branch 3 times, most recently from 83ca5fe to 1e795c2 Compare May 16, 2025 13:52
This was referenced Jul 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants