An orchestration platform for the development, production, and observation of data assets.
-
Updated
Aug 7, 2025 - Python
An orchestration platform for the development, production, and observation of data assets.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
The open source ELT framework powered by Apache Arrow
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
Flink CDC is a streaming data integration tool
AtroCore is an open-source Data Platform, Data Management and Master Data Management (MDM) software, which can be used to quickly create any business application.
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
Privacy and Security focused Segment-alternative, in Golang and React
Upserts, Deletes And Incremental Processing on Big Data.
Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift, Databricks) in real-time.
Insightful Tutorials and Papers about Knowledge Graphs
Production-grade OAuth token manager for Instacart Ads API. Automate refresh token generation for analytics platforms, data warehouses & custom apps.
SeaTunnel is a multimodal, high-performance, distributed, massive data integration tool.
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
Creating a FAIR Linked Data corpus for the BELTRANS research project about Belgian book translations NL-FR and FR-NL between 1970 and 2020
🎓 NAHS Student Transition Management System | Production-ready Google Apps Script solution with intelligent duplicate detection, multi-source data integration, and automated sheet processing. Features enhanced teacher input processing, smart data precedence, and comprehensive validation for Alternative High School student transitions.
🦀 event stream processing for developers to collect and transform data in motion to power responsive data intensive applications.
The Common Core Ontology Repository holds the current released version of the Common Core Ontology suite.
Framework and command-line tools for integrating FollowTheMoney data streams from multiple sources
Add a description, image, and links to the data-integration topic page so that developers can more easily learn about it.
To associate your repository with the data-integration topic, visit your repo's landing page and select "manage topics."