Admin: misc improvements on admin server and workers. EC now works. #7055

chrislusf · 2025-07-30T18:31:21Z

UI changes, componentized configuration fields
Configuration saved in protobuf format
EC worker implementations

…nto their own dedicated package

Current Status ✅ Master: Healthy and running (port 9333) ✅ Filer: Healthy and running (port 8888) ✅ Volume Servers: All 6 servers running (ports 8080-8085) 🔄 Admin/Workers: Will start when dependencies are ready

but: 1. one volume should not be repeatedly worked on. 2. ec shards needs to be distributed and source data should be deleted.

* distribution of ec shards needs improvement * need configuration to enable ec

…stants

🎯 Planning: ActiveTopology creates DestinationPlan with specific TargetDisk 📦 Task Creation: maintenance_integration.go creates ECDestination with DiskId 🚀 Task Execution: EC task passes DiskId in VolumeEcShardsCopyRequest 💾 Volume Server: Receives disk_id and stores shards on specific disk (vs.store.Locations[req.DiskId]) 📂 File System: EC shards and metadata land in the exact disk directory planned

gemini-code-assist

Summary of Changes

Hello @chrislusf, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

I've implemented a series of improvements to the admin server and its interaction with workers, with a strong focus on Erasure Coding (EC) capabilities. This work includes a significant overhaul of the configuration management system, moving to a more robust protobuf-based, schema-driven approach. I've also added extensive Docker-based testing infrastructure to ensure the reliability of new EC worker implementations. Furthermore, the user interface has been enhanced to provide more detailed insights into EC volumes and shards, as well as comprehensive collection-level information.

Highlights

Modular Configuration System: The core configuration system has been refactored to use Protocol Buffers and a new schema-driven approach. This makes configurations more robust, type-safe, and easier to validate and extend in the future. The UI now dynamically renders configuration forms based on these schemas.
Enhanced Erasure Coding (EC) Support: The pull request significantly advances Erasure Coding (EC) functionality. This includes new task detection for EC needs, a refined assignment process for EC workers, and detailed UI pages to monitor individual EC shards and their distribution across the cluster, as well as aggregated views of EC volumes.
Comprehensive Docker Testing Environment: A comprehensive Docker-based testing environment has been introduced, allowing for easy setup and testing of EC and vacuum worker operations. This environment includes all necessary SeaweedFS components (master, volume servers, filer, admin, workers) and tools for load generation and monitoring.
Improved Maintenance Task Management: The maintenance manager now integrates with an 'active topology' and 'pending operations' tracker. This allows the system to make more intelligent decisions about task planning, considering the current cluster state and ongoing operations to avoid conflicts and optimize resource utilization.
Refined UI for Cluster Management: The UI has been updated with new pages for EC shards, EC volumes, and detailed collection views. These enhancements provide administrators with a much clearer and more granular understanding of their data distribution and the status of maintenance operations.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant set of features for admin server and worker management, particularly for Erasure Coding (EC) tasks. The changes include a new design document, extensive testing infrastructure via Docker and Makefiles, a schema-driven configuration system, and new UI pages for managing EC volumes and collections. The overall design is robust and the move to protobuf-based, schema-driven configuration is a major improvement. I've identified a few areas for improvement, including a potential bug in a test script, a correctness issue in a reflection helper, and some inconsistencies in documentation and naming that could cause confusion. Addressing these points will enhance the quality and maintainability of this new functionality.

gemini-code-assist · 2025-07-30T18:35:04Z

weed/admin/config/schema.go

+func isZeroValue(v reflect.Value) bool {
+	switch v.Kind() {
+	case reflect.Bool:
+		return !v.Bool()
+	case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
+		return v.Int() == 0
+	case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64, reflect.Uintptr:
+		return v.Uint() == 0
+	case reflect.Float32, reflect.Float64:
+		return v.Float() == 0
+	case reflect.String:
+		return v.String() == ""
+	case reflect.Slice, reflect.Map, reflect.Array:
+		return v.IsNil() || v.Len() == 0
+	case reflect.Interface, reflect.Ptr:
+		return v.IsNil()
+	}
+	return false
+}


The custom isZeroValue implementation is not fully correct and reinvents functionality already provided by the Go standard library. For example, v.IsNil() will panic on non-pointer array types. The reflect.Value.IsZero() method, available since Go 1.13, correctly checks if a value is the zero value for its type across all kinds and is more robust and maintainable.

func isZeroValue(v reflect.Value) bool { return v.IsZero() }

gemini-code-assist · 2025-07-30T18:35:04Z

docker/admin_integration/Dockerfile.local

@@ -0,0 +1,18 @@
+FROM alpine:latest


Using the latest tag for a base image can lead to non-reproducible builds, as it may point to different versions of Alpine over time. It's a best practice to pin to a specific version (e.g., alpine:3.19) to ensure build consistency and avoid unexpected breakages.

FROM alpine:3.19

docker/admin_integration/EC-TESTING-README.md

gemini-code-assist · 2025-07-30T18:35:05Z

docker/admin_integration/Makefile

+clean: ## Stop and remove all containers, networks, and volumes
+	@echo "🧹 Cleaning up SeaweedFS test environment..."
+	@docker-compose -f $(COMPOSE_FILE) down -v --remove-orphans
+	@docker system prune -f


The docker system prune -f command is very aggressive and can remove any stopped containers, networks, and dangling images on the user's system, not just those related to this project. This could lead to unexpected data loss. The docker-compose -f $(COMPOSE_FILE) down -v --remove-orphans command is sufficient for cleaning up the environment. Consider removing this line to make the clean target safer.

gemini-code-assist · 2025-07-30T18:35:05Z

docker/admin_integration/Makefile

+	@echo "Creating large files with random data to trigger EC (targeting ~60MB total to exceed 50MB limit)..."
+	@for i in {1..12}; do \
+		echo "Creating 5MB random file $$i..."; \
+		docker run --rm --network admin_integration_seaweed_net -v /tmp:/tmp --entrypoint sh chrislusf/seaweedfs:local -c "dd if=/dev/urandom of=/tmp/largefile$$i.dat bs=1M count=5 2>/dev/null && weed upload -master=master:9333 /tmp/largefile$$i.dat && rm /tmp/largefile$$i.dat"; \


Redirecting stderr of the dd command to /dev/null suppresses all error messages, which can hide problems like disk space issues or permissions errors during test data generation. It's better to allow errors to be visible for easier debugging.

docker run --rm --network admin_integration_seaweed_net -v /tmp:/tmp --entrypoint sh chrislusf/seaweedfs:local -c "dd if=/dev/urandom of=/tmp/largefile$$i.dat bs=1M count=5 && weed upload -master=master:9333 /tmp/largefile$$i.dat && rm /tmp/largefile$$i.dat"; \

gemini-code-assist · 2025-07-30T18:35:05Z

docker/admin_integration/test-integration.sh

+if docker-compose -f docker-compose-ec-test.yml config | grep -q "workingDir=/work"; then
+    echo -e "${GREEN}✅ Worker working directory option is properly configured${NC}"
+else
+    echo -e "${RED}❌ Worker working directory option is missing${NC}"
+    exit 1


This check for workingDir=/work will fail because the docker-compose-ec-test.yml file does not define a workingDir for the worker services, nor is the -workingDir flag passed to the worker command. To align with the test's expectation and the feature described in the README, the worker command in the compose file should be updated to include this setting.

gemini-code-assist · 2025-07-30T18:35:05Z

weed/admin/handlers/cluster_handlers.go

+}
+
+// ShowClusterEcShards handles the cluster EC shards page (individual shards view)
+func (h *ClusterHandlers) ShowClusterEcShards(c *gin.Context) {


The function name ShowClusterEcShards is misleading as it fetches and renders a list of EC volumes, not individual shards. It calls GetClusterEcVolumes and uses the app.ClusterEcVolumes template. To improve code clarity and maintainability, consider renaming this function to ShowClusterEcVolumes to accurately reflect its purpose.

weed/admin/handlers/cluster_handlers.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

chrislusf added 30 commits July 24, 2025 00:37

initial design

add1224

added simulation as tests

a0874d2

reorganized the codebase to move the simulation framework and tests i…

0d8e1f8

…nto their own dedicated package

integration test. ec worker task

30ad65b

remove "enhanced" reference

a4cf55a

start master, volume servers, filer

167ab29

Current Status ✅ Master: Healthy and running (port 9333) ✅ Filer: Healthy and running (port 8888) ✅ Volume Servers: All 6 servers running (ports 8080-8085) 🔄 Admin/Workers: Will start when dependencies are ready

generate write load

688a910

tasks are assigned

2941cdf

admin start wtih grpc port. worker has its own working directory

cd28ce7

Update .gitignore

f10c630

working worker and admin. Task detection is not working yet.

a1966e9

compiles, detection uses volumeSizeLimitMB from master

16a29b5

compiles

e7084c5

worker retries connecting to admin

01e8cc4

build and restart

6cf7875

rendering pending tasks

960acc0

skip task ID column

14e9518

sticky worker id

afdca51

test canScheduleTaskNow

63817fe

worker reconnect to admin

c557ec0

clean up logs

a3d7270

worker register itself first

19df3fc

worker can run ec work and report status

ffd9138

but: 1. one volume should not be repeatedly worked on. 2. ec shards needs to be distributed and source data should be deleted.

move ec task logic

41cbbef

listing ec shards

0247bf2

local copy, ec. Need to distribute.

379d585

ec is mostly working now

3b2c75d

* distribution of ec shards needs improvement * need configuration to enable ec

show ec volumes

8dd9b7c

interval field UI component

019de6d

rename

600bfbf

chrislusf added 19 commits July 29, 2025 00:24

Update worker.go

c76d829

fix EC rack and dc count

b436ea1

Report task status to admin server

6c25c89

fix task logging, simplify interface checking, use erasure_coding con…

8c6a51c

…stants

factor in empty volume server during task planning

e2e45cb

volume.list adds disk id

31dadc9

track disk id also

9a75bdf

fix locking scheduled and manual scanning

642d1bb

add active topology

1741f96

simplify task detector

7a94392

ec task completed, but shards are not showing up

b968eab

implement ec in ec_typed.go

12aa3a7

adjust log level

5db6ee6

dedup

06e2a9a

implementing ec copying shards and only ecx files

bc5007d

Delete original volume from all locations

7f15701

clean up existing shard locations

aa0ab56

local encoding and distributing

ba59600

gemini-code-assist bot reviewed Jul 30, 2025

View reviewed changes

github-advanced-security bot found potential problems Jul 30, 2025

View reviewed changes

weed/admin/handlers/cluster_handlers.go Fixed Show fixed Hide fixed

chrislusf and others added 6 commits July 30, 2025 11:44

Update docker/admin_integration/EC-TESTING-README.md

04387a3

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

check volume id range

a2caa9e

simplify

f6cb0b8

fix tests

c2f4042

fix types

e6e25e8

clean up logs and tests

269655b

chrislusf merged commit 891a2fb into master Jul 30, 2025
22 checks passed

chrislusf deleted the cleaner-base-before-adding-worker-logs-and-read-logs-from-admin branch July 30, 2025 19:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Admin: misc improvements on admin server and workers. EC now works. #7055

Admin: misc improvements on admin server and workers. EC now works. #7055

Uh oh!

chrislusf commented Jul 30, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jul 30, 2025

Uh oh!

gemini-code-assist bot Jul 30, 2025

Uh oh!

Uh oh!

gemini-code-assist bot Jul 30, 2025

Uh oh!

gemini-code-assist bot Jul 30, 2025

Uh oh!

gemini-code-assist bot Jul 30, 2025

Uh oh!

gemini-code-assist bot Jul 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Admin: misc improvements on admin server and workers. EC now works. #7055

Admin: misc improvements on admin server and workers. EC now works. #7055

Uh oh!

Conversation

chrislusf commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chrislusf commented Jul 30, 2025 •

edited

Loading