This repository was archived by the owner on Jun 6, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 547
This repository was archived by the owner on Jun 6, 2024. It is now read-only.
2021 March Release Plan #5346
Copy link
Copy link
Open
Labels
Description
Release Manager
Endgame
Feature freeze: TBD
Code freeze: 4.6
Scrum demo date: TBD
Bug Bash date: 4.16
Release date & retrospective date: 4.26
Test Plan:
TBD
Top level themes (work item break down needed)
-
Marketplace v1 backlog - @debuggy / @yiyione / @TobeyQin 2 weeks P0 May need to delay one week Test Owner: @suiguoxin @debuggy @hzy46
-
x-plan - @Binyang2014 TBD Test Owner: @suiguoxin Done
Alert-Manager Test Owner: @Binyang2014 Test done
- send alert to user when job failed Inform the user when jobs status change #5337 @suiguoxin P1 (defer)
- Add alert & auto-fix for GPU perf issue Add alert for GPU perf issue #5342 [alert-handler] auto-fix Nvidia GPU low performance issue #5383 P0
Test cases: [alert-handler] auto-fix Nvidia GPU low performance issue #5383 (comment) - Add
kill-long-running-job
email templates refine kill low-efficiency-job-alert email templates #5384
Test cases: refine kill low-efficiency-job-alert email templates #5384 (comment)
Rest Server Test Owner: @yiyione Done
- support sort by completionTime in get the list jobs API support sort by completedTime in get the list jobs API #5347 @suiguoxin
- API change Support sorting by completionTime #5375 P0 test cases: Support sorting by completionTime #5375 (comment)
- application of this API in cluster-utlization [cluster utilization] get the full list of jobs in 7 days #5376 P0
Deployment Test Owner: @Binyang2014
- add / remove nodes with
layout.yaml
Installation Issue List #5321 Add / Remove Nodes withlayout.yaml
#5167 @Starmys P0 Test Done - webportal package build issue fix version change issue in webportal build #5378 @suiguoxin P0 Test cases: fix version change issue in webportal build #5378 (comment)
- K8s API server's cert need renew each year K8s API server's cert need renew each year #5334 P0 Test cases: K8s API server's cert need renew each year #5334 (comment)
Documents
- Doc for renew API server doc @yiyione
- Document for config.yaml @Starmys
- Document for new submission page, user manual @debuggy
- Add remove nodes
- Doc Nvidia driver version mismatch
Other backlogs
Use case & best practice
- Use case and best practice summary - @hzy46 / @TobeyQin / @suiguoxin
- OpenPAI Advantage
- OpenPAI Best Practice -- P0 topics:
- Cluster setup and onboarding
- Utilization weekly report
- Storage
- How to debug
- Leverage low-priority resources
- AutoML
Job profiling @hzy46 P1Not in PAI release- HiveD user experience support - tbd @yangou1988 VC view page(design review in this release) P1
- HiveD convert old test cases and propose new test cases @hzy46 P1
- Dataset: integrate data prerequisite into marketplace and job submission page @hzy46 [User Story] Dataset: integrate data prerequisite into marketplace and job submission page #5345 TBD P1