Skip to content

Azure/osdu-data-load-tno

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

OSDU Data Load TNO - C# Implementation

An improved C# application for loading TNO (Netherlands Organisation for Applied Scientific Research) data into the OSDU platform.

Key Features

  • Simple CLI Interface - Three intuitive commands to get you started
  • Automatic Processing - Handles all TNO data types in the correct dependency order
  • File Upload Support - Complete 4-step OSDU file upload workflow
  • Secure Authentication - Uses Azure Identity for passwordless authentication
  • Progress Tracking - Real-time progress updates and detailed logging
  • Error Resilience - Comprehensive retry policies and error handling
  • Clean Architecture - CQRS pattern with proper separation of concerns

Data Loading Process Overview

The application follows a comprehensive 6-step process to load TNO data into OSDU:

  1. Downloads TNO Dataset Files - Retrieves official TNO test data from GitLab repository
  2. Creates Legal Tag - Establishes required legal compliance tags for data governance
  3. Uploads Files to OSDU - Executes 4-step file upload workflow:
    • Requests file upload URL from File API
    • Uploads file content to storage
    • Submits metadata to File Service
    • Maintains registry of uploaded files with IDs and versions
  4. Generates Non-Work Product Manifests - Creates manifests for master data:
    • Uses CSV templates to generate individual manifests for each data row
    • Processes reference data, wells, wellbores, and related entities
  5. Generates Work Product Manifests - Creates work product metadata:
    • Iterates through uploaded files registry
    • Retrieves JSON metadata from work product folders
    • Updates manifests with legal tags, ACL permissions, and data partition IDs
  6. Uploads Manifests - Submits all manifests to OSDU in correct dependency order

For detailed information about each step, see Data Load Process Documentation.

Quick Start

1. Prerequisites

Before you begin, ensure you have:

  • .NET 9.0 or later installed
  • Azure CLI for authentication: az login --tenant your-tenant-id
  • Azure Developer CLI (azd) for deployments
  • OSDU Platform Access with users.datalake.ops and users@<data partition>.dataservices.energy roles role
  • Visual Studio or VS Code (optional, for development)

2. Configure the Application

Update appsettings.json in the src/OSDU.DataLoad.Console/ directory with your OSDU instance details:

{
  "Osdu": {
    "BaseUrl": "https://your-osdu-instance.com",
    "TenantId": "your-tenant-id",
    "ClientId": "your-client-id", 
    "DataPartition": "your-data-partition",
    "LegalTag": "{DataPartition}-your-legal-tag",
    "AclViewer": "data.default.viewers@{DataPartition}.dataservices.energy",
    "AclOwner": "data.default.owners@{DataPartition}.dataservices.energy"
  }
}

Note: You can provide environment variables instead. See: Configuration Guide

3. Build and Run

# Navigate to the console project
cd src/OSDU.DataLoad.Console

# Build the solution
dotnet build

# Run commands directly
dotnet run -- help
dotnet run -- download --destination "~/osdu-data/tno"
dotnet run -- load --source "~/osdu-data/tno"

Available Commands

Default Behavior (No Arguments)

# Run without any arguments - downloads data if needed, then loads it
dotnet run

When run without arguments, the application will:

  1. Check for TNO data in ~/osdu-data/tno/ (user home directory)
  2. Download the test data if not present (~2.2GB)
  3. Load all data types into OSDU platform automatically

This is the easiest way to get started - just configure your OSDU settings and run!

Help Command

# From console project directory (recommended)
dotnet run -- help

# Or from src directory
dotnet run --project OSDU.DataLoad.Console --working-directory OSDU.DataLoad.Console -- help

Shows available commands, usage examples, and current configuration status.

Download TNO Test Data

# Download ~2.2GB of official test data (from console project directory)
dotnet run -- download --destination "~/osdu-data/tno"

# Overwrite existing data
dotnet run -- download --destination "~/osdu-data/tno" --overwrite

Load Data

# Load all TNO data types in dependency order (from console project directory)
dotnet run -- load --source "~/osdu-data/tno"

Azure Deployments

Configure Environment

  1. Create an azd environment

    # Navigate to the project root
    azd init -e dev
  2. Configure the environment variables

    azd env set OSDU_TenantId $(az account show --query tenantId -o tsv )
    azd env set AZURE_SUBSCRIPTION_ID <Azure subscription id>
    azd env set AZURE_LOCATION <Azure Region>
    azd env set OSDU_BaseUrl <https://your-osdu-instance.com>
    azd env set OSDU_ClientId <your-client-ID>
    azd env set OSDU_DataPartition <your-data-partition>
    azd env set OSDU_LegalTag <{DataPartition}-your-legal-tag>
    azd env set OSDU_AclViewer <data.default.viewers@{DataPartition}.dataservices.energy>
    azd env set OSDU_AclOwner <data.default.owners@{DataPartition}.dataservices.energy>

Deploy the Infrastructure

azd provision

Assign managed identity users.datalake.ops role

Important: Get the object ID of the managed identity and assign it users.datalake.ops and users@<data partition>.dataservices.energy roleson your data partition.

Deploy the Application and monitor the container's console output

azd deploy

Additional Resources

For detailed information on specific topics, see our documentation:


Common Issues and Solutions

1. Authentication Failures

Symptoms: HTTP 401 errors, "Failed to authenticate" messages

Solutions:

  • Azure CLI: Ensure you're logged in: az login --tenant your-tenant-id
  • Permissions: Verify you have the users.datalake.ops and users@<data partition>.dataservices.energy roles role in OSDU
  • Configuration: Check TenantId and ClientId in configuration
  • Managed Identity: Verify Managed Identity is configured (when running on Azure)
  • Scope: Ensure the scope is correctly set to {ClientId}/.default
  • Environment Variables: Verify AZURE_CLIENT_ID, AZURE_TENANT_ID are set correctly

2. Performance Issues

Symptoms: Slow upload speeds, timeouts

Solutions:

  • Run upload in Azure: See Azure Deployments
  • Adjust batch size: Adjust the MasterDataManifestSubmissionBatchSize value to increae the number of manifests submitted in a single workflow request.

3. File Upload - Metadata Issues

Symptoms: The file is uploaded and metadata is created, but /v2/records/{id} returns 404

fail: OSDU.DataLoad.Infrastructure.Services.OsduHttpClient[0]
      [2e82ab6a] GET https://pm44a0805b33bc4.oep.ppe.azure-int.net/api/storage/v2/records/opendes:dataset--File.Generic:e4f2b1ee-2732-4259-ab47-d30ff4c2a095 failed with status NotFound
fail: OSDU.DataLoad.Infrastructure.Services.OsduHttpClient[0]
      [2e82ab6a] Step 4 Failed: Could not retrieve record version for FileID: opendes:dataset--File.Generic:e4f2b1ee-2732-4259-ab47-d30ff4c2a095

Solutions:

  • Restart the OSDU-Storage pods

4. No container app logs

Symptoms: No logs in the container app. You may see a kubernetes error.

Solutions:

  • Redeploy: Redeploy the container with az deploy

Contributing

This solution follows Clean Architecture and CQRS principles. For detailed information on contributing:

  • Review the existing code patterns and structure
  • Follow established naming conventions
  • Add appropriate unit tests for new features
  • Update documentation as needed

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

OSDU is a trademark of The Open Group.

About

Data loading process for OSDU on Azure

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 8