Moonlink 🥮

managed ingestion engine for Apache Iceberg

Overview

Moonlink is an Iceberg-native ingestion engine bringing streaming inserts and upserts to your lakehouse.

Ingest Postgres CDC, event streams (Kafka), and OTEL into Iceberg without complex maintenance and compaction.

Moonlink buffers, caches, and indexes data so Iceberg tables stay read-optimized.


             ┌──────────moonlink───────────┐                         
             │  ┌───────────────────────┐  │  ┌───────Iceberg───────┐
             │  │                       │  │  │      obj. store     │
Postgres ───►│  │┌ ─ ─ ─ ─ ┐ ┌ ─ ─ ─ ─ ┐│  │  │┌───────┐ ┌─────────┐│
             │  │                       │  │  ││       │ │         ││
Kafka    ───►│  ││  index  │ │  cache  ││  ├──►│ index │ │ parquet ││
             │  │                       │  │  ││       │ │         ││
Events   ───►│  │└ ─ ─ ─ ─ ┘ └ ─ ─ ─ ─ ┘│  │  │└───────┘ └─────────┘│
             │  │                  nvme │  │  │                     │
             │  └───────────────────────┘  │  └─────────────────────┘
             └─────────────────────────────┘

Note: Moonlink is in preview. Expect changes. Join our Community to stay updated!

Why Moonlink?

Traditional ingestion tools write data and metadata files per update into Iceberg. That's fine for slow-changing data, but on real-time streams it causes:

Tiny data files — frequent commits create thousands of small Parquet files
Metadata explosion — equality-deletes compound this problem

which leads to:

Slow read performance — query planning overhead scales with file count
Manual maintenance — periodic Spark jobs for compaction/cleanup

Moonlink minimizes write amplification and metadata churn by buffering incoming data, building indexes and caches on NVMe, and committing read-optimized files and deletion vectors to Iceberg.

Inserts are buffered and flushed as size-tuned Parquet

          ┌───moonlink───┐  ┌────iceberg───┐
          │┌─ ─ ─ ─ ─ ─ ┐│  │┌─ ─ ─ ─ ─ ─ ┐│
raw insert│              │  │              │
────────► ││   Arrow    │├─►││   Parquet  ││
          │              │  │              │
          │└─ ─ ─ ─ ─ ─ ┘│  │└─ ─ ─ ─ ─ ─ ┘│
          └──────────────┘  └──────────────┘

Deletes are mapped to deletion vectors using an index built on row positions

           ┌───moonlink───┐   ┌────iceberg───┐
           │┌─ ─ ─── ─ ─ ┐│   │┌─ ─ ─ ─ ─ ─ ┐│
raw deletes│              │   │              │
  ────────►││   index    │├──►││  deletion  ││
           │              │   │   vectors    │
           │└─ ─ ─── ─ ─ ┘│   │└─ ─ ─ ─ ─ ─ ┘│
           └──────────────┘   └──────────────┘

Write Paths

Moonlink supports multiple input sources for ingest:

PostgreSQL CDC — ingest via logical replication with millisecond-level latency
REST API — simple HTTP endpoint for direct event ingestion
Kafka — sink support coming soon
OTEL — sink support on the roadmap

Read Path

Moonlink commits data as Iceberg v3 tables with deletion vectors. These tables can be queried from any Iceberg-compatible engine.

Engines

DuckDB
Apache Spark
Postgres with pg_duckdb or pg_mooncake

Catalogs

AWS Glue — coming soon
Unity Catalog — coming soon

Real-Time Reads (<s freshness)

For workloads requiring sub-second visibility into new data, Moonlink supports real-time querying:

DuckDB — with the duckb_mooncake extension.
Postgres — with the pg_mooncake extension.
DataFusion – with Moonlink Datafusion

Quick Start

1. Clone & Build

Clone the repository and build the service binary:

git clone https://github.com/Mooncake-Labs/moonlink.git
cd moonlink
cargo build --release --bin moonlink_service

2. Start the Moonlink Service

Start the Moonlink service, which will store data in the ./data directory:

./target/release/moonlink_service ./data

3. Verify Service Health

Check that the service is running properly:

curl http://localhost:3030/health

4. Create a Table

Create a table with a defined schema. Here's an example creating a users table:

curl -X POST http://localhost:3030/tables/users \
  -H "Content-Type: application/json" \
  -d '{
    "database": "my_database",
    "table": "users",
    "schema": [
      {"name": "id", "data_type": "int32", "nullable": false},
      {"name": "name", "data_type": "string", "nullable": false},
      {"name": "email", "data_type": "string", "nullable": true},
      {"name": "age", "data_type": "int32", "nullable": true},
      {"name": "created_at", "data_type": "date32", "nullable": true}
    ],
    "table_config": {}
  }'

5. Insert Data

Insert data into the created table:

curl -X POST http://localhost:3030/ingest/users \
  -H "Content-Type: application/json" \
  -d '{
    "operation": "insert",
    "request_mode": "async",
    "data": {
      "id": 1,
      "name": "Alice Johnson",
      "email": "alice@example.com",
      "age": 30,
      "created_at": "2024-01-01"
    }
  }'

Roadmap and Contributing

Roadmap (near‑term):

Kafka sink preview
Schema evolution from Postgres and Kafka
Catalog integrations (AWS Glue, Unity Catalog)
REST API stabilization (Insert, Upsert into Iceberg directly)

We’re grateful for our contributors. If you'd like to help improve Moonlink, join our community.

🥮

Name		Name	Last commit message	Last commit date
Latest commit History 962 Commits
.config		.config
.devcontainer		.devcontainer
.github		.github
deploy		deploy
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.typos.toml		.typos.toml
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
Dockerfile.aarch64		Dockerfile.aarch64
Dockerfile.amd64		Dockerfile.amd64
LICENSE		LICENSE
README.md		README.md
deny.toml		deny.toml
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Moonlink 🥮

Overview

Why Moonlink?

Write Paths

Read Path

Real-Time Reads (<s freshness)

Quick Start

1. Clone & Build

2. Start the Moonlink Service

3. Verify Service Health

4. Create a Table

5. Insert Data

Roadmap and Contributing

About

Uh oh!

Releases

Packages

Contributors 22

Languages

License

Mooncake-Labs/moonlink

Folders and files

Latest commit

History

Repository files navigation

Moonlink 🥮

Overview

Why Moonlink?

Write Paths

Read Path

Real-Time Reads (<s freshness)

Quick Start

1. Clone & Build

2. Start the Moonlink Service

3. Verify Service Health

4. Create a Table

5. Insert Data

Roadmap and Contributing

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 22

Languages

Packages