-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Admin: misc improvements on admin server and workers. EC now works. #7055
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Admin: misc improvements on admin server and workers. EC now works. #7055
Conversation
…nto their own dedicated package
Current Status ✅ Master: Healthy and running (port 9333) ✅ Filer: Healthy and running (port 8888) ✅ Volume Servers: All 6 servers running (ports 8080-8085) 🔄 Admin/Workers: Will start when dependencies are ready
but: 1. one volume should not be repeatedly worked on. 2. ec shards needs to be distributed and source data should be deleted.
* distribution of ec shards needs improvement * need configuration to enable ec
🎯 Planning: ActiveTopology creates DestinationPlan with specific TargetDisk 📦 Task Creation: maintenance_integration.go creates ECDestination with DiskId 🚀 Task Execution: EC task passes DiskId in VolumeEcShardsCopyRequest 💾 Volume Server: Receives disk_id and stores shards on specific disk (vs.store.Locations[req.DiskId]) 📂 File System: EC shards and metadata land in the exact disk directory planned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @chrislusf, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
I've implemented a series of improvements to the admin server and its interaction with workers, with a strong focus on Erasure Coding (EC) capabilities. This work includes a significant overhaul of the configuration management system, moving to a more robust protobuf-based, schema-driven approach. I've also added extensive Docker-based testing infrastructure to ensure the reliability of new EC worker implementations. Furthermore, the user interface has been enhanced to provide more detailed insights into EC volumes and shards, as well as comprehensive collection-level information.
Highlights
- Modular Configuration System: The core configuration system has been refactored to use Protocol Buffers and a new schema-driven approach. This makes configurations more robust, type-safe, and easier to validate and extend in the future. The UI now dynamically renders configuration forms based on these schemas.
- Enhanced Erasure Coding (EC) Support: The pull request significantly advances Erasure Coding (EC) functionality. This includes new task detection for EC needs, a refined assignment process for EC workers, and detailed UI pages to monitor individual EC shards and their distribution across the cluster, as well as aggregated views of EC volumes.
- Comprehensive Docker Testing Environment: A comprehensive Docker-based testing environment has been introduced, allowing for easy setup and testing of EC and vacuum worker operations. This environment includes all necessary SeaweedFS components (master, volume servers, filer, admin, workers) and tools for load generation and monitoring.
- Improved Maintenance Task Management: The maintenance manager now integrates with an 'active topology' and 'pending operations' tracker. This allows the system to make more intelligent decisions about task planning, considering the current cluster state and ongoing operations to avoid conflicts and optimize resource utilization.
- Refined UI for Cluster Management: The UI has been updated with new pages for EC shards, EC volumes, and detailed collection views. These enhancements provide administrators with a much clearer and more granular understanding of their data distribution and the status of maintenance operations.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant set of features for admin server and worker management, particularly for Erasure Coding (EC) tasks. The changes include a new design document, extensive testing infrastructure via Docker and Makefiles, a schema-driven configuration system, and new UI pages for managing EC volumes and collections. The overall design is robust and the move to protobuf-based, schema-driven configuration is a major improvement. I've identified a few areas for improvement, including a potential bug in a test script, a correctness issue in a reflection helper, and some inconsistencies in documentation and naming that could cause confusion. Addressing these points will enhance the quality and maintainability of this new functionality.
weed/admin/config/schema.go
Outdated
func isZeroValue(v reflect.Value) bool { | ||
switch v.Kind() { | ||
case reflect.Bool: | ||
return !v.Bool() | ||
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64: | ||
return v.Int() == 0 | ||
case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64, reflect.Uintptr: | ||
return v.Uint() == 0 | ||
case reflect.Float32, reflect.Float64: | ||
return v.Float() == 0 | ||
case reflect.String: | ||
return v.String() == "" | ||
case reflect.Slice, reflect.Map, reflect.Array: | ||
return v.IsNil() || v.Len() == 0 | ||
case reflect.Interface, reflect.Ptr: | ||
return v.IsNil() | ||
} | ||
return false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The custom isZeroValue
implementation is not fully correct and reinvents functionality already provided by the Go standard library. For example, v.IsNil()
will panic on non-pointer array types. The reflect.Value.IsZero()
method, available since Go 1.13, correctly checks if a value is the zero value for its type across all kinds and is more robust and maintainable.
func isZeroValue(v reflect.Value) bool {
return v.IsZero()
}
@@ -0,0 +1,18 @@ | |||
FROM alpine:latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clean: ## Stop and remove all containers, networks, and volumes | ||
@echo "🧹 Cleaning up SeaweedFS test environment..." | ||
@docker-compose -f $(COMPOSE_FILE) down -v --remove-orphans | ||
@docker system prune -f |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docker system prune -f
command is very aggressive and can remove any stopped containers, networks, and dangling images on the user's system, not just those related to this project. This could lead to unexpected data loss. The docker-compose -f $(COMPOSE_FILE) down -v --remove-orphans
command is sufficient for cleaning up the environment. Consider removing this line to make the clean
target safer.
@echo "Creating large files with random data to trigger EC (targeting ~60MB total to exceed 50MB limit)..." | ||
@for i in {1..12}; do \ | ||
echo "Creating 5MB random file $$i..."; \ | ||
docker run --rm --network admin_integration_seaweed_net -v /tmp:/tmp --entrypoint sh chrislusf/seaweedfs:local -c "dd if=/dev/urandom of=/tmp/largefile$$i.dat bs=1M count=5 2>/dev/null && weed upload -master=master:9333 /tmp/largefile$$i.dat && rm /tmp/largefile$$i.dat"; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redirecting stderr
of the dd
command to /dev/null
suppresses all error messages, which can hide problems like disk space issues or permissions errors during test data generation. It's better to allow errors to be visible for easier debugging.
docker run --rm --network admin_integration_seaweed_net -v /tmp:/tmp --entrypoint sh chrislusf/seaweedfs:local -c "dd if=/dev/urandom of=/tmp/largefile$$i.dat bs=1M count=5 && weed upload -master=master:9333 /tmp/largefile$$i.dat && rm /tmp/largefile$$i.dat"; \
if docker-compose -f docker-compose-ec-test.yml config | grep -q "workingDir=/work"; then | ||
echo -e "${GREEN}✅ Worker working directory option is properly configured${NC}" | ||
else | ||
echo -e "${RED}❌ Worker working directory option is missing${NC}" | ||
exit 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check for workingDir=/work
will fail because the docker-compose-ec-test.yml
file does not define a workingDir
for the worker services, nor is the -workingDir
flag passed to the worker
command. To align with the test's expectation and the feature described in the README, the worker command in the compose file should be updated to include this setting.
} | ||
|
||
// ShowClusterEcShards handles the cluster EC shards page (individual shards view) | ||
func (h *ClusterHandlers) ShowClusterEcShards(c *gin.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function name ShowClusterEcShards
is misleading as it fetches and renders a list of EC volumes, not individual shards. It calls GetClusterEcVolumes
and uses the app.ClusterEcVolumes
template. To improve code clarity and maintainability, consider renaming this function to ShowClusterEcVolumes
to accurately reflect its purpose.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Uh oh!
There was an error while loading. Please reload this page.