Skip to content

Conversation

suzhoum
Copy link
Contributor

@suzhoum suzhoum commented Dec 11, 2024

Issue #, if available:
Fixes #4733

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@suzhoum
Copy link
Contributor Author

suzhoum commented Dec 11, 2024

@LennartPurucker This line was added in #4606. Was it intended for the functionality?

Comment on lines 64 to 65
path = os.path.expanduser(path) # replace ~ with absolute path if it exists
path = os.path.abspath(path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can wrap this block into

if not path.startswith("s3://"):
    path = os.path.expanduser(path)  # replace ~ with absolute path if it exists
    path = os.path.abspath(path)

I believe there are some edge cases against which the current path = os.path.abspath(path) command protects (though it does introduce another bug with loading from S3).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this looks like the best solution.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if one calls .expanduser() on an S3 path? I think likely nothing, as S3 paths do not contain/start with ~.

Otherwise, the current suggested if-case would change this behavior, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I think what @shchur proposed should accommodate S3 paths. I will update the PR.

@LennartPurucker
Copy link
Collaborator

Ah, my bad! I did not think of s3 when adding this line.

It fixes a bug in parallel and distributed fitting where the workers/processes use a relative path but should use the absolute path in the (shared) file system.

Copy link
Collaborator

@LennartPurucker LennartPurucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -61,8 +61,9 @@ def setup_outputdir(path, warn_if_exist=True, create_dir=True, path_suffix=None)
logger.warning(
f'Warning: path already exists! This predictor may overwrite an existing predictor! path="{path}"'
)
path = os.path.expanduser(path) # replace ~ with absolute path if it exists
path = os.path.abspath(path)
if not path.lower().startswith("s3://"):
Copy link
Contributor

@Innixma Innixma Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we plan to support s3, then we should also edit the code above this when warn_if_exist=True to not call os.makedirs when the path is an s3 path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if we plan to add support for s3, lets add a short docstring that mentions this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several things need to be done:

  1. If s3, then os.path.sep needs to instead be / when updating path_suffix.
  2. bool flag near start of function to indicate if s3 path. If flag set, skip elif warn_if_exist.
  3. Why path.lower()? This doesn't seem valid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored the function to address comments.

path = os.path.expanduser(path) # replace ~ with absolute path if it exists
path = os.path.abspath(path)
if not path.lower().startswith("s3://"):
path = os.path.expanduser(path) # replace ~ with absolute path if it exists
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add unit test for s3 path input, since it shouldn't make any folder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added unit test.

f"Only str and pathlib.Path types are supported for path, got {path} of type {type(path)}."
)

is_s3_path = str(path).lower().startswith("s3://")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why .lower()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just to incorporate cases when people use capitalized S3 in the path.

timestamp = utcnow.strftime("%Y%m%d_%H%M%S")
path = os.path.join("AutogluonModels", f"ag-{timestamp}{path_suffix}")
for i in range(1, 1000):
try:
if create_dir:
if create_dir and not is_s3_path:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to check is_s3_path when path is None.

# checks that setup_outputdir handles S3 paths correctly
with patch("os.makedirs") as mock_makedirs: # Mock os.makedirs to ensure no local directory is created
path = "s3://test-bucket/test-folder"
returned_path = setup_outputdir(path, warn_if_exist=True, create_dir=False, path_suffix=None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_dir must be True to properly test this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

if path_suffix is None:
path_suffix = ""
if path_suffix and path_suffix[-1] == os.path.sep:
if path_suffix and path_suffix[-1] == os.path.sep if not is_s3_path else "/":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to add a unit test for this, it is relevant for Windows OS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more unit tests for this.

@suzhoum suzhoum force-pushed the fix_setup_dir branch 2 times, most recently from 11f60c6 to 7f1eaaf Compare January 16, 2025 00:17
Copy link

Job PR-4734-a46aed8 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-4734/a46aed8/index.html

Copy link
Contributor

@Innixma Innixma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the fix!

@Innixma Innixma merged commit 6c7c3e4 into autogluon:master Jan 17, 2025
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]Load predictor from S3 path failed
4 participants