-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Fix setup_outputdir #4734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix setup_outputdir #4734
Conversation
@LennartPurucker This line was added in #4606. Was it intended for the functionality? |
path = os.path.expanduser(path) # replace ~ with absolute path if it exists | ||
path = os.path.abspath(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can wrap this block into
if not path.startswith("s3://"):
path = os.path.expanduser(path) # replace ~ with absolute path if it exists
path = os.path.abspath(path)
I believe there are some edge cases against which the current path = os.path.abspath(path)
command protects (though it does introduce another bug with loading from S3).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this looks like the best solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if one calls .expanduser()
on an S3 path? I think likely nothing, as S3 paths do not contain/start with ~
.
Otherwise, the current suggested if-case would change this behavior, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case I think what @shchur proposed should accommodate S3 paths. I will update the PR.
Ah, my bad! I did not think of s3 when adding this line. It fixes a bug in parallel and distributed fitting where the workers/processes use a relative path but should use the absolute path in the (shared) file system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -61,8 +61,9 @@ def setup_outputdir(path, warn_if_exist=True, create_dir=True, path_suffix=None) | |||
logger.warning( | |||
f'Warning: path already exists! This predictor may overwrite an existing predictor! path="{path}"' | |||
) | |||
path = os.path.expanduser(path) # replace ~ with absolute path if it exists | |||
path = os.path.abspath(path) | |||
if not path.lower().startswith("s3://"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we plan to support s3, then we should also edit the code above this when warn_if_exist=True
to not call os.makedirs
when the path is an s3 path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if we plan to add support for s3, lets add a short docstring that mentions this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several things need to be done:
- If s3, then
os.path.sep
needs to instead be/
when updating path_suffix. - bool flag near start of function to indicate if s3 path. If flag set, skip
elif warn_if_exist
. - Why
path.lower()
? This doesn't seem valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored the function to address comments.
path = os.path.expanduser(path) # replace ~ with absolute path if it exists | ||
path = os.path.abspath(path) | ||
if not path.lower().startswith("s3://"): | ||
path = os.path.expanduser(path) # replace ~ with absolute path if it exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add unit test for s3 path input, since it shouldn't make any folder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added unit test.
a9b8b75
to
aa94612
Compare
f"Only str and pathlib.Path types are supported for path, got {path} of type {type(path)}." | ||
) | ||
|
||
is_s3_path = str(path).lower().startswith("s3://") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why .lower()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just to incorporate cases when people use capitalized S3
in the path.
timestamp = utcnow.strftime("%Y%m%d_%H%M%S") | ||
path = os.path.join("AutogluonModels", f"ag-{timestamp}{path_suffix}") | ||
for i in range(1, 1000): | ||
try: | ||
if create_dir: | ||
if create_dir and not is_s3_path: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to check is_s3_path
when path is None.
# checks that setup_outputdir handles S3 paths correctly | ||
with patch("os.makedirs") as mock_makedirs: # Mock os.makedirs to ensure no local directory is created | ||
path = "s3://test-bucket/test-folder" | ||
returned_path = setup_outputdir(path, warn_if_exist=True, create_dir=False, path_suffix=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create_dir
must be True
to properly test this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
if path_suffix is None: | ||
path_suffix = "" | ||
if path_suffix and path_suffix[-1] == os.path.sep: | ||
if path_suffix and path_suffix[-1] == os.path.sep if not is_s3_path else "/": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to add a unit test for this, it is relevant for Windows OS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more unit tests for this.
11f60c6
to
7f1eaaf
Compare
7f1eaaf
to
a46aed8
Compare
Job PR-4734-a46aed8 is done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the fix!
Issue #, if available:
Fixes #4733
Description of changes:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.