This repository was archived by the owner on Jun 6, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 547
This repository was archived by the owner on Jun 6, 2024. It is now read-only.
2021 Feb Release Plan #5253
Copy link
Copy link
Closed
Labels
Description
Release Manager
Endgame
Feature freeze: TBD
Code freeze: 2.24
Scrum demo date: 2.25
Bug Bash date: TBD
Release date & retrospective date: 2.28
Test Plan:
TBD
Work Items
Job submission page new UI
- P0 side bar refine
- P0 basic info + task role
- P0 More info (advanced mode)
- P1 SKU (hived scheduler logic)
- P1 secrets function (including image auth)
- P1 ssh function
- P2 data (team storage)
- P2 save as template
Job protocol update
- P0 support data via extending
prerequisite
in job protocol extend prerequisite field in job protocol #5145- cmd runtime plugin modification
- let runtime plugin parse prerequsite
- change validation in webportal and rest-server
- make
uri
optional
- make
- change openpai protocol
- P1 Host prerequisite in marketplace
Marketplace 2021 Feb. Release Plan
Dockerhub pull policy #5219
- Docker image pull frequency limit in dockerhub
- P0 use ansible notebook to change docker daemon config (solve job pull and service start problems)
- P0 start a new cache registry service to cache images from dockerhub
- P1 change service yaml config when starting service to use mirror registry
- leave an interface for external registry
- Test:
- submit a job using dockerhub image
- cache registry log shows that it receives a pull image request from the job container
- batch submit 100+ simple jobs
- cache registry log records all the pull image requests
- check the worker node rate limit and it does not reach the limit
- submit a job using dockerhub image
DB controller
- make DBController tolerant to wrong framework request make DBController tolerant to wrong framework request #4889 When user's job config is too large, OpenPAI job will be in
Waiting
orStopping
forever #5093 - Remove sensitive Info in database
GPU Utilization Statistics
- create CronJob to send GPU utilization report @suiguoxin send regular GPU utilization report with CronJob #5281
- add gpuhours info and sort by gpuhours in the report @suiguoxin [alert-manager] add gpu*hours info in cluster-utilization cronjob #5294
Let runtime plugin access a "job application token" @suiguoxin
- generate a job application token for every job and mount it to initContainers generate an application token for every job #5270
- save job specific token in sperate namespace #5292 same job tokens in a seperated namespace
- Disable it by default in this release disable job specific token generation in rest-server #5312
save SSH publish keys on user profile page #5274
- webportal @Lijiaoa Add ssh public keys on user-profile page #5223
- openpai-runtime @suiguoxin
- Mount user's extension in runtime container create a user secret for every job to save user extension #5310
openpai runtime
- design discuss: [Proposal] openpai-runtime interface openpai-runtime#12 @Binyang2014
log experience
- Add ".log" extension to downloaded log file @Binyang2014 Add ".log" suffix for user-stdout/user-stderr/user-all log files #5272
- Support read log file behind the customized gateway @Binyang2014 remove log url
WEBPORTAL_URL
prefix #5271
Deployment
- P2 verbose mode for deployment scripts @Starmys
Agile CI and nightly-build
P1 Agile CI and split heavy tests with nightly-deployment #5173 @yiyione
- Setup the nightly-deployment and test
- Build & publish nightly tag image
Fault Document
Document
- update add&remove node doc @suiguoxin
- update aad doc @suiguoxin
Bug Fix
- Memory leak in framework watcher of database controller Memory leak in framework watcher of database controller #5300
- Set correct launchTime in rest-server @suiguoxin fix get jobs api launchedTime field issue #5307
Backlog
HiveD scheduler
- P2 Cell as sku in hived scheduler. @abuccts (backbone support (config) for "Cell", submission form supports for Cell) @hzy46
ETA: design: in progress - P2 UX for HiveD @yangou1988 @hzhua
Show cluster-level info #5254
dependabot alert
- P1 add / remove node with with
layout.yaml
Add / Remove Nodes withlayout.yaml
#5167 @suiguoxin
Need triage
- Rest-Server Job API Perf Tests and Improvements Rest-Server Job API Perf Tests and Improvements #5027
- Python 2/3.5 Deprecation Upgrade Python 2 to Python 3 #5042
- P2 dev-box can be inside master node: doc, testing
- Refine Alert System [Monitor] Alert System Refine #4810 P2 items
- P2 too many tokens created Too many identical tokens have been created. #5123
- P2 Group management & VC request management for user and admin @yiyione Group & VC request management for user and admin #4949
- P2 vscode bug @yiyione
- P2 code coverage concern when developing feature
- P1 (design) Submit job ui should show protocol prerequisites parts 2020 Oct. ~ Nov. Release Plan openpaimarketplace#73 @debuggy @yiyione @TobeyQin
Technical Investigations
- Cluster API - blog @suiguoxin
- KubeVela @SwordFaith
- Open Application Model
- Crossplane