-
Notifications
You must be signed in to change notification settings - Fork 481
[GSoC] Add e2e test for tune
api with LLM hyperparameter optimization
#2420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GSoC] Add e2e test for tune
api with LLM hyperparameter optimization
#2420
Conversation
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
/area gsoc |
Ref: #2339 |
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
tune
api with LLM hyperparameter optimizationtune
api with LLM hyperparameter optimization
…roller Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
@andreyvelich Thank you for catching up! I'm working on this. But the e2e test failed due to some problem inside the
I checked the logs of the master pod, and it only has two containers:
I'm not sure if it has something to do with the update of training operator. Do you have any ideas? By the way, I've installed Training Operator control plane
Update (2025-03-27): katib/sdk/python/v1beta1/kubeflow/katib/api/katib_client.py Lines 690 to 697 in 54764d6
|
@mahdikhashan Hmmm, that's strange. It seems the problem is that the type should be |
yes, i'll do so and share the full testing env for it so then we can work on it. |
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
@andreyvelich @mahdikhashan Thank you for the review! I've incorporated your suggestions, and this PR is now ready for review. Note: I'm also currently testing the example provided in this user guide, but I've encountered an issue related to downloading the model in the
I suspect this error is due to package version compatibility. Updating
I'm actively working on fixing this new issue, but it may take some additional time. How about we proceed to review and merge this PR first and handle the example issue separately in this follow-up issue? Please let me know what you think. Updated 2025-03-28: To fix the above errors, I created a PR here. Please review when you have time @andreyvelich @mahdikhashan . Thanks! |
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
i think when we merged kubeflow/trainer#2576, we can review and merge this one. |
@mahdikhashan @helenxie-bit Are we ready to merge this ? |
/lgtm |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andreyvelich The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
This PR adds an e2e test for the
tune
API, specifically for the scenario of importing external models and datasets for LLM hyperparameter optimization.Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Checklist: