-
Notifications
You must be signed in to change notification settings - Fork 45
Add stage name #39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add stage name #39
Conversation
44edcd9
to
af61f2d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I added some comments and ideas, but I like a lot this addition
pdpipe/core.py
Outdated
@@ -652,6 +655,17 @@ def __init__(self, stages, transformer_getter=None, **kwargs): | |||
def __getitem__(self, index): | |||
if isinstance(index, slice): | |||
return PdPipeline(self._stages[index]) | |||
|
|||
if isinstance(index, list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you want to check for a list, or a list of strings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not sure if it's right to force the label
param to be an str
.
pdpipe/core.py
Outdated
if isinstance(index, str): | ||
stages = [stage for stage in self._stages if stage._label == index] | ||
if len(stages) == 0: | ||
raise Exception("'{}' is not exist.".format(index)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if python3.5 support was dropped, this could be in f-strings, which is quite more readable @shaypal5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if two stages have the same label you will return just the first one in the pipeline. Maybe that's the desired behaviour, but highlighting just in case. Not sure whether we would like something when creating the pipeline to check for integrity (i.e. ensure that two stages cannot have same label), or we just trust the user
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if python3.5 support was dropped, this could be in f-strings, which is quite more readable @shaypal5
I did not see the use of 'f-string' in the project, so I also did not use it.
if two stages have the same label you will return just the first one in the pipeline. Maybe that's the desired behaviour, but highlighting just in case. Not sure whether we would like something when creating the pipeline to check for integrity (i.e. ensure that two stages cannot have same label), or we just trust the user
When passing in one item a Stage
is returned, so I cant return two stages.
pipeline[['someLabel']]
will return a pipeline with all stages that label
== 'someLabel'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're both right: I didn't f-strings so far, be we definitely should. I recently moved to using f-strings almost exclusively in my code. I'm also going into type hints, but I don't think it's a very urgent change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the second issue: See discussion below. A reasonable handling of this case should be decided upon and documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re f-strings. We need to drop support for python3.5 or be very determined and use this https://stackoverflow.com/questions/55182209/f-string-invalid-syntax-in-python-3-5 , which I wouldn't recommend @shaypal5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Python 3.5 end-of-life was at September 2020, so Iet's do it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to make a separate PR of dropping python 3.5
and adding f-strings
.
pdpipe/core.py
Outdated
@@ -173,14 +173,16 @@ class PdPipelineStage(abc.ABC): | |||
dataframes, which will be used to determine whether this stage should | |||
be skipped for input dataframes. See pdp.cond for more information on | |||
specialised Condition objects. | |||
label : str, default None | |||
A label of this stage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure I fully understand the label vs desc. Would we like the label to be the string representation then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. Label should be very short, like 'fix_spikes'
, while a description could be something like "Fixes spikes in all numeric columns by applying moving-window based smoothing function with parameters being dynamically learned, while smoothing, separately for each column."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You really couldn't use the later conveniently for access as you could the first, as in pipe['fix_spikes']
or for slicing, like pipe[['fix_spikes', 'drop_corr']]
.
pdpipe/core.py
Outdated
@@ -191,6 +193,7 @@ def __init__(self, exraise=True, exmsg=None, desc=None, prec=None, | |||
self._prec_arg = prec | |||
self._skip = skip | |||
self._appmsg = '{}..'.format(desc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the label be used here if defined and no desc rather than DEF_DESCRIPTION?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the label should be added on the beginning of the application message. Something like
self._appmsg = '{}{}..'.format(f'{label}: ' if label else '', self.,desc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...
self._appmsg = f'{label}: {desc}')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, stages with no supplied name
shouldn't have an application message such as ": Dropping mostly NaN columns"
(which is what the your current code does), but rather simply "Dropping mostly NaN columns".
Please fix to self._appmsg = '{}{}..'.format(f'{label}: ' if label else '', self.,desc)
pdpipe/core.py
Outdated
@@ -191,6 +193,7 @@ def __init__(self, exraise=True, exmsg=None, desc=None, prec=None, | |||
self._prec_arg = prec | |||
self._skip = skip | |||
self._appmsg = '{}..'.format(desc) | |||
self._label = label |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would check label
and raise error if (at least) not a str
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Use with pytest.raises(SomeException)
to check for correct error throwing behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def __init__(self, label='', ...)
if not isinstance(label, str):
raise Exception(f"label must be a str, not {type(label)}.")
self._label = label
...
drop_char = SilentDropStage('char', label='dropChar') | ||
pipeline = PdPipeline([drop_num1, drop_num2, drop_char]) | ||
assert len(pipeline) == 3 | ||
pipeline = pipeline[['dropNum1', 'dropNum2']] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the way it's implemented now, (I think) one can pass as labels integers, and slice here using integers which would make it a bit confusing I think. I would force the labels to be strings, and ensure people cannot pass a list of integers to slice. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notice that we are not going to disable int base slicing or access, as pipelines are first-and-formost lists of pipeline stages, and they should easily slice-able / index-accessible with integers. It's in the documentation.
|
I would say:
Either:
Also, notice that we are not going to disable int base slicing or access, as pipelines are first-and-formost lists of pipeline stages, and they should easily slice-able / index-accessible with integers. It's in the documentation. |
@shaypal5 WDYT about changing |
Sure! A good change, I think, not to mix with dataframe column labels. |
Codecov Report
@@ Coverage Diff @@
## master #39 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 12 12
Lines 1719 1730 +11
Branches 232 238 +6
=========================================
+ Hits 1719 1730 +11
Continue to review full report at Codecov.
|
I didn't use |
pdpipe/core.py
Outdated
self._exraise = exraise | ||
self._exmsg = exmsg | ||
self._desc = desc | ||
self._prec_arg = prec | ||
self._skip = skip | ||
self._appmsg = '{}..'.format(desc) | ||
self._appmsg = '{}: {}'.format(name, desc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix to
self._appmsg = '{}{}..'.format(f'{label}: ' if label else '', self.,desc)
Stages with no supplied name
shouldn't have an application message such as ": Dropping mostly NaN columns"
, but rather simply "Dropping mostly NaN columns"
.
Almost done! See my latest comment on your |
Released in version |
#19