-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[tabular] Add ag_model_register #4913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@shchur This logic might be applicable to TimeSeries as well. Curious on your thoughts. |
Nit: How far is the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Model configs can have specified priorities via While it is true that this is only the default, it would be overly verbose to put "default" in the name IMO, since the same argument could be said for |
Agree. Maybe then a comment somewhere clarifying this to users that have to set this value |
Job PR-4913-4a17edf is done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, just several tangential thoughts / ideas and a few nits
ag_key: str | None = None # set to string value for subclasses for use in AutoGluon | ||
ag_name: str | None = None # set to string value for subclasses for use in AutoGluon | ||
ag_priority: int = 0 # set to int value for subclasses for use in AutoGluon | ||
ag_priority_by_problem_type: dict[str, int] = {} # if not set, we fall back to ag_priority |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Setting a mutable object as the default value might lead to some problems in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any thoughts on how to avoid this? We could move it to a property instead, but it would look less elegant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One alternative idea that comes to mind is turning this into an instance attribute and calling copy
during __init__
. This also feels less elegant though.
Another option is to use the built-in types.MappingProxyType
to make the dict read-only.
from types import MappingProxyType
mapping = MappingProxyType({"a": 1, "b": 2})
mapping["c"] = 3 # will raise an exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MappingProxyType looks reasonable, I've updated the PR to use it
] | ||
|
||
# TODO: Replace logic in `autogluon.tabular.trainer.model_presets.presets` with `ag_model_register` | ||
ag_model_register = ModelRegister(model_cls_list=REGISTERED_MODEL_CLS_LST) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A potentially less verbose way to implement this pattern is using a decorator.
# in _model_register.py
ALIAS_TO_CLS: dict[str, type] = {}
def register(alias: str):
def decorator(cls):
ALIAS_TO_CLS[alias] = cls
return cls
return decorator
# in lgb_model.py
@register(alias="LGB")
class LGBModel(AbstractTimeSeriesModel):
# ...
This could also be easier to use by the users that implement custom models.
ALIAS_TO_CLS
can be replaced with something more advanced like the actual ModelRegistry
class, or a dict where each alias corresponds to a tuple [model_cls, priority, priority_dict]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One possible downside is this links all models to a global object inherently, whereas the ModelRegister is something that isn't inherently global and theoretically a user could pass their own ModelRegister object to the predictor. No idea if that is something that is desirable though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merging the current logic as-is, we can re-visit this alternative if we think it is cleaner in future.
ag_key: str | None = None # set to string value for subclasses for use in AutoGluon | ||
ag_name: str | None = None # set to string value for subclasses for use in AutoGluon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that both ag_key
and ag_name
serve a similar purpose, and one of them could potentially be removed. For example, we could generate the ag_name
automatically from the class name with something like self.name = re.sub(r"Model$", "", self.__class__.__name__)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that they serve a similar purpose. For now I will keep both for legacy reasons, but we can consider standardizing. Maybe we can have self.ag_name
default to your regix logic if it is None
? Such as self._ag_name = None
with def ag_name()` as a property?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that AG Tabular doesn't have the same orderly name structure of the classes as TimeSeries (but I probably should do it soon...):
For example:
RFModel -> RandomForest
XTModel -> ExtraTrees
LGBModel -> LightGBM
I was about to start doing this. |
Another design consideration is if "priority" is an intrinsic property of the model itself, or is it something the Trainer assigns to the model? |
@canerturkmen This is a good question. For now I think it can be a value that is overwritten by the trainer if needed, but the reason I have it as part of the model is so that a model contribution would only require the contributor to make edits to the model file and nowhere else. Also, this means someone can bring their custom model to AutoGluon and easily specify the priority relative to other models when fitting. Luckily this logic is rather easy to change later and deprecate if needed. |
@canerturkmen Do you have unit tests / example code showing what you can do with this? One thing I might be wary of with automatic register on import is that I want to ensure no duplicate keys, and also it becomes a bit weird in some ways because whether the user's code works or not will now depend on what has been imported elsewhere, something the IDE wouldn't track and could make script sharing confusing. Just a hypothetical though, I don't know how real this concern is. |
Job PR-4913-dc11e1a is done. |
Job PR-4913-062c3e4 is done. |
This is the caller. You could de-duplicate keys in the I'm also pretty surprised this is automerging. I touched abstract_model quite a bit and it might be worth a rebase. Other random consideration: I would pull it up in the namespace if you would like it to be used often and it's a part of the public API: |
Co-authored-by: Lennart Purucker <contact@lennart-purucker.com>
3a2abff
to
22076b6
Compare
@canerturkmen re namespace: The example I had was outdated, here is the current path:
I could maybe move it to
but will wait until I figure out how exactly I want users to interact with the register in the follow-up PR. |
This is a solid option, the only concern really is the "magic" aspect and oddities related to seemingly unused imports actually impacting whether a script works or not. I'll keep as is for now but we can always change it to be auto-add in a follow-up PR if we decide it is ultimately more user friendly. The following situation is what I'm not a huge fan of:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Job PR-4913-22076b6 is done. |
Issue #, if available:
Description of changes:
ag_key
,ag_name
andag_priority
class variables to all AbstractModel subclasses.WIP:
autogluon.tabular.trainer.model_presets.presets
to useag_model_register
. I decided not to do that in this PR as it would expand the scope to the point of being hard to review. So I will do this as a follow-up PR.Example:
Output:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.