Skip to content

Conversation

HonakerM
Copy link
Contributor

@HonakerM HonakerM commented Sep 4, 2024

What this PR does / why we need it:
There is a bug in Caikit when a model failed to load but its LoadedModel instance is still marked as "loaded" because the loaded() check only considers if the future finished not if the model is actually loaded. This PR fixes this by adding an additional parameter require_instance which ensures that a local instance is actually available.

Special notes for your reviewer:

If applicable:

  • this PR contains documentation
  • this PR contains unit tests
  • this PR has been tested for backwards compatibility

Signed-off-by: Michael Honaker <Michael.Honaker@ibm.com>
@HonakerM HonakerM force-pushed the add_restrictive_load_check branch from 0533fa5 to 37a3037 Compare September 4, 2024 18:55
Copy link
Collaborator

@gabe-l-hart gabe-l-hart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One NIT on the docstring and a general question about why this is staying optional rather than just fixing the logic

@@ -118,7 +118,11 @@ def model(self) -> ModuleBase:
self.wait()
return self._model

def loaded(self) -> bool:
def loaded(self, require_instance: bool = False) -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the default False? I guess really, why make this an argument at all? It seems like the old logic was just wrong? Did you trace history to figure out why or self._caikit_model_future.done() was added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't trace history but logically it seems loaded used to see if the ModelFuture was generally completed not just successfully completed. When I checked caikit general it doesn't seem like we use loaded outside the servicer so assumed it was used in a subclass/other implementation.


After looking through the history it seems like it is only used in the info servicer. I'm game to change this to the default

HonakerM and others added 2 commits September 5, 2024 11:20
Co-authored-by: Gabe Goodhart <gabe.l.hart@gmail.com>
Signed-off-by: Michael Honaker <37811263+HonakerM@users.noreply.github.com>
Co-authored-by: Gabe Goodhart <gabe.l.hart@gmail.com>
Signed-off-by: Michael Honaker <37811263+HonakerM@users.noreply.github.com>
Copy link
Collaborator

@gabe-l-hart gabe-l-hart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, yeah, I looked through history and it looks like I added this all in one line, so that second clause was not added for a specific reason. The rest of the code seems to clearly indicate that self._model is only ever set to self._caikit_model_future.result(), so I would say let's just remove that second clause and not have this argument.

Also, linting and passing tests!

@HonakerM
Copy link
Contributor Author

HonakerM commented Sep 5, 2024

so I would say let's just remove that second clause and not have this argument.

Sounds good!

Also, linting and passing tests!

I just committed your changes 👀

Signed-off-by: Michael Honaker <Michael.Honaker@ibm.com>
@gabe-l-hart
Copy link
Collaborator

I just committed your changes 👀

🤦 This is why I still don't really trust suggestions in GH. You can't see how long the lines are!

Signed-off-by: Michael Honaker <Michael.Honaker@ibm.com>
@HonakerM
Copy link
Contributor Author

HonakerM commented Sep 5, 2024

@gabe-l-hart it looks like a test is failing because it was using loaded() to check if a job completed without calling model(). Any suggestions on a solution? Should I change the test or add a new function completed() to check if the job completed

@gabe-l-hart
Copy link
Collaborator

Any suggestions on a solution? Should I change the test or add a new function completed() to check if the job completed

Hmmm. Yeah, that sounds like the right option. I'd suggest calling it load_finished or something that makes it clear that it's just checking the status of the load operation and not the outcome.

Signed-off-by: Michael Honaker <Michael.Honaker@ibm.com>
@HonakerM HonakerM requested a review from gabe-l-hart September 5, 2024 19:30
Copy link
Collaborator

@gabe-l-hart gabe-l-hart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ship it. Thanks Casey!

@gabe-l-hart gabe-l-hart merged commit d72d5bd into caikit:main Sep 5, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants