Moonlink is an Iceberg-native ingestion engine bringing streaming inserts and upserts to your lakehouse.
Ingest Postgres CDC, event streams (Kafka), and OTEL into Iceberg without complex maintenance and compaction.
Moonlink buffers, caches, and indexes data so Iceberg tables stay read-optimized.
โโโโโโโโโโโmoonlinkโโโโโโโโโโโโ
โ โโโโโโโโโโโโโโโโโโโโโโโโโ โ โโโโโโโโIcebergโโโโโโโโ
โ โ โ โ โ obj. store โ
Postgres โโโโบโ โโ โ โ โ โ โ โ โ โ โ โ โโ โ โโโโโโโโโโ โโโโโโโโโโโโ
โ โ โ โ โโ โ โ โโ
Kafka โโโโบโ โโ index โ โ cache โโ โโโโบโ index โ โ parquet โโ
โ โ โ โ โโ โ โ โโ
Events โโโโบโ โโ โ โ โ โ โ โ โ โ โ โ โโ โ โโโโโโโโโโ โโโโโโโโโโโโ
โ โ nvme โ โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโ โ โโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Note: Moonlink is in preview. Expect changes. Join our Community to stay updated!
Traditional ingestion tools write data and metadata files per update into Iceberg. That's fine for slow-changing data, but on real-time streams it causes:
- Tiny data files โ frequent commits create thousands of small Parquet files
- Metadata explosion โ equality-deletes compound this problem
which leads to:
- Slow read performance โ query planning overhead scales with file count
- Manual maintenance โ periodic Spark jobs for compaction/cleanup
Moonlink minimizes write amplification and metadata churn by buffering incoming data, building indexes and caches on NVMe, and committing read-optimized files and deletion vectors to Iceberg.
Inserts are buffered and flushed as size-tuned Parquet
โโโโmoonlinkโโโโ โโโโโicebergโโโโ
โโโ โ โ โ โ โ โโ โโโ โ โ โ โ โ โโ
raw insertโ โ โ โ
โโโโโโโโโบ โโ Arrow โโโโบโโ Parquet โโ
โ โ โ โ
โโโ โ โ โ โ โ โโ โโโ โ โ โ โ โ โโ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
Deletes are mapped to deletion vectors using an index built on row positions
โโโโmoonlinkโโโโ โโโโโicebergโโโโ
โโโ โ โโโ โ โ โโ โโโ โ โ โ โ โ โโ
raw deletesโ โ โ โ
โโโโโโโโโบโโ index โโโโโบโโ deletion โโ
โ โ โ vectors โ
โโโ โ โโโ โ โ โโ โโโ โ โ โ โ โ โโ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
Moonlink supports multiple input sources for ingest:
- PostgreSQL CDC โ ingest via logical replication with millisecond-level latency
- REST API โ simple HTTP endpoint for direct event ingestion
- Kafka โ sink support coming soon
- OTEL โ sink support on the roadmap
Moonlink commits data as Iceberg v3 tables with deletion vectors. These tables can be queried from any Iceberg-compatible engine.
Engines
- DuckDB
- Apache Spark
- Postgres with
pg_duckdb
orpg_mooncake
Catalogs
- AWS Glue โ coming soon
- Unity Catalog โ coming soon
For workloads requiring sub-second visibility into new data, Moonlink supports real-time querying:
- DuckDB โ with the
duckb_mooncake
extension. - Postgres โ with the
pg_mooncake
extension. - DataFusion โ with
Moonlink Datafusion
Clone the repository and build the service binary:
git clone https://github.com/Mooncake-Labs/moonlink.git
cd moonlink
cargo build --release --bin moonlink_service
Start the Moonlink service, which will store data in the ./data
directory:
./target/release/moonlink_service ./data
Check that the service is running properly:
curl http://localhost:3030/health
Create a table with a defined schema. Here's an example creating a users
table:
curl -X POST http://localhost:3030/tables/users \
-H "Content-Type: application/json" \
-d '{
"database": "my_database",
"table": "users",
"schema": [
{"name": "id", "data_type": "int32", "nullable": false},
{"name": "name", "data_type": "string", "nullable": false},
{"name": "email", "data_type": "string", "nullable": true},
{"name": "age", "data_type": "int32", "nullable": true},
{"name": "created_at", "data_type": "date32", "nullable": true}
],
"table_config": {}
}'
Insert data into the created table:
curl -X POST http://localhost:3030/ingest/users \
-H "Content-Type: application/json" \
-d '{
"operation": "insert",
"request_mode": "async",
"data": {
"id": 1,
"name": "Alice Johnson",
"email": "alice@example.com",
"age": 30,
"created_at": "2024-01-01"
}
}'
Roadmap (nearโterm):
- Kafka sink preview
- Schema evolution from Postgres and Kafka
- Catalog integrations (AWS Glue, Unity Catalog)
- REST API stabilization (Insert, Upsert into Iceberg directly)
Weโre grateful for our contributors. If you'd like to help improve Moonlink, join our community.
๐ฅฎ