-
Notifications
You must be signed in to change notification settings - Fork 117
feat: MLFlow Integration for experiment tracking #534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: MLFlow Integration for experiment tracking #534
Conversation
Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for contributing! @terrykong to comment on the feature broadly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @therealnaveenkamal for the contribution! appreciate your help making nemo-rl better!
left some comments below
Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com>
Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a couple other comments
Sure @terrykong , working on your comments. I'll ping you here once I'm ready. Thanks for your support. |
Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com>
Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com>
@terrykong I think I've addressed all your comments. please review and let me know if further edits are required. |
Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com>
@terrykong / @SahilJain314 not sure why the workflows have failed. do I have to do anything from my side? the lint & the copyright checks passed when I run the workflows locally. |
@therealnaveenkamal I believe the copyright one is orthogonal to your change so you can ignore that (it doesn't gate our CI, but it should CC: @chtruong814) The linter one fails. can you install the precommit hooks and run it on your branch? |
Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com>
e064aec
to
9064a45
Compare
Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com>
@terrykong thank you. resolved the lint errors. |
thanks @therealnaveenkamal. The CI still says there's some unresolved errors: |
Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com>
@terrykong sorry my bad, I didn't do a lint check after resolving merge conflicts. i think everything should be good now. thanks. |
Signed-off-by: Terry Kong <terryk@nvidia.com>
the linter is failing due to an error that usually indicates the |
update uv.lock to account for the new dependency
Thanks @terrykong, merged. |
Hi @therealnaveenkamal . I think the CI is having some issues with your PR (CC @chtruong814 ). I'll open a new PR, resolve the issues, and make sure to credit your contribution |
Thanks for the support @terrykong !! Looking forward to contributing more. |
Appreciate doing most of the heavy lifting @therealnaveenkamal ! I just pushed it over the finish. I'll close this since #697 has landed |
What does this PR do ?
Add MLflow integration for experiment tracking and model management in NeMo RL
Issues
Closes #514 - Support for MLflow
This PR addresses the community request for MLflow support alongside existing TensorBoard and Weights & Biases logging options.
Usage
Before your PR is "Ready for review"
Pre checks:
Additional Information
This PR adds comprehensive MLflow integration including:
sft_mlflow.yaml
) for SFT training with MLflowThe integration follows the existing logger architecture and provides a seamless experience for users who want to use MLflow for experiment tracking alongside or instead of TensorBoard and Weights & Biases.