Skip to content

Conversation

Innixma
Copy link
Contributor

@Innixma Innixma commented Jun 5, 2024

Issue #, if available:

Description of changes:

  • Improve NN_TORCH runtime estimate.

When running NN_TORCH in a ray process, the first batch includes significant time overhead to initialize torch. This leads to the mainline time requirement estimate to be vastly pessimistic as the data becomes larger because it multiplies this time overhead by the number of batches in an epoch. This caused extreme estimates for large datasets, skipping the neural network training entirely in many cases where it would have been useful.

The fix avoids using the first batch as part of the runtime estimate of future batches. This allows the v2 time estimate to be far more accurate.

Example on the adult income dataset:

(_ray_fit pid=1972457) v1 estimate: 46.5966s | v2 estimate: 2.3756s
(_ray_fit pid=1972457) True Time: 2.3813s

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@Innixma Innixma added bug Something isn't working module: tabular labels Jun 5, 2024
@Innixma Innixma added this to the 1.1.1 Release milestone Jun 5, 2024
@Innixma Innixma requested review from rey-allan and suzhoum June 5, 2024 21:03
@yinweisu
Copy link
Contributor

yinweisu commented Jun 5, 2024

Previous CI Run Current CI Run
botocore==1.34.118 botocore==1.34.120
fastcore==1.5.43 fastcore==1.5.44
pytest==8.2.1 pytest==8.2.2
cryptography==42.0.7 cryptography==42.0.8
huggingface-hub==0.23.2 huggingface-hub==0.23.3
boto3==1.34.118 boto3==1.34.120
typer==0.9.4 typer==0.12.3
smart-open==6.4.0 smart-open==7.0.4
thinc==8.2.3 thinc==8.2.4
ruff==0.4.7 ruff==0.4.8
prompt_toolkit==3.0.45 prompt_toolkit==3.0.46
cloudpathlib==0.16.0 cloudpathlib==0.18.1
weasel==0.3.4 weasel==0.4.1
spacy==3.7.4 spacy==3.7.5
botocore==1.34.118 botocore==1.34.120
fastcore==1.5.43 fastcore==1.5.44
- shellingham==1.5.4
pytest==8.2.1 pytest==8.2.2
cryptography==42.0.7 cryptography==42.0.8
huggingface-hub==0.23.2 huggingface-hub==0.23.3
boto3==1.34.118 boto3==1.34.120
typer==0.9.4 typer==0.12.3
smart-open==6.4.0 smart-open==7.0.4
thinc==8.2.3 thinc==8.2.4
ruff==0.4.7 ruff==0.4.8
prompt_toolkit==3.0.45 prompt_toolkit==3.0.46
cloudpathlib==0.16.0 cloudpathlib==0.18.1
weasel==0.3.4 weasel==0.4.1
spacy==3.7.4 spacy==3.7.5

Copy link
Contributor

@rey-allan rey-allan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Innixma Innixma merged commit 9b44d85 into autogluon:master Jun 5, 2024
Copy link

github-actions bot commented Jun 6, 2024

Job PR-4247-4341b4b is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-4247/4341b4b/index.html

@Innixma Innixma deleted the tabular_improve_nn_time_estimate branch April 16, 2025 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module: tabular
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants