-
Notifications
You must be signed in to change notification settings - Fork 142
Closed
Labels
🐛 bugSomething isn't workingSomething isn't working🙋 help wantedExtra attention is neededExtra attention is needed
Description
🤔 Question description [Please make everyone to understand it]
Error in sys.excepthook:
Traceback (most recent call last):
File "/home/share/huadjyin/home/lishaoshuai/miniconda3/envs/SpatialPy/lib/python3.10/site-packages/swanlab/data/callbacker/cloud.py", line 99, in _except_handler
get_run().finish(SwanLabRunState.CRASHED, error=traceback_error(tb, tp(val)))
File "/home/share/huadjyin/home/lishaoshuai/miniconda3/envs/SpatialPy/lib/python3.10/site-packages/swanlab/data/run/main.py", line 305, in finish
getattr(run, "_SwanLabRun__cleanup")(error)
File "/home/share/huadjyin/home/lishaoshuai/miniconda3/envs/SpatialPy/lib/python3.10/site-packages/swanlab/data/run/main.py", line 222, in __cleanup
if self.monitor_cron is not None:
AttributeError: 'SwanLabRun' object has no attribute 'monitor_cron'
Original exception was:
Traceback (most recent call last):
File "/home/share/huadjyin/home/lishaoshuai/pangjiangshuan/stereo-seq-LLM4/ATACRNA_pretrain.py", line 318, in <module>
main()
File "/home/share/huadjyin/home/lishaoshuai/pangjiangshuan/stereo-seq-LLM4/ATACRNA_pretrain.py", line 314, in main
trainer.initialize()
File "/home/share/huadjyin/home/lishaoshuai/pangjiangshuan/stereo-seq-LLM4/ATACRNA_pretrain.py", line 93, in initialize
self._init_model()
File "/home/share/huadjyin/home/lishaoshuai/pangjiangshuan/stereo-seq-LLM4/ATACRNA_pretrain.py", line 108, in _init_model
self._init_wandb()
File "/home/share/huadjyin/home/lishaoshuai/pangjiangshuan/stereo-seq-LLM4/ATACRNA_pretrain.py", line 113, in _init_wandb
swanlab.init(
File "/home/share/huadjyin/home/lishaoshuai/miniconda3/envs/SpatialPy/lib/python3.10/site-packages/swanlab/data/sdk.py", line 206, in init
run = register(
File "/home/share/huadjyin/home/lishaoshuai/miniconda3/envs/SpatialPy/lib/python3.10/site-packages/swanlab/data/run/__init__.py", line 15, in register
run = SwanLabRun(*args, **kwargs)
File "/home/share/huadjyin/home/lishaoshuai/miniconda3/envs/SpatialPy/lib/python3.10/site-packages/swanlab/data/run/main.py", line 133, in __init__
requirements=get_requirements() if settings.requirements_collect else None,
File "/home/share/huadjyin/home/lishaoshuai/miniconda3/envs/SpatialPy/lib/python3.10/site-packages/swanlab/data/run/metadata/requirements.py", line 15, in get_requirements
result = subprocess.run(["pixi", "list"], capture_output=True, text=True, timeout=0.5)
File "/home/share/huadjyin/home/lishaoshuai/miniconda3/envs/SpatialPy/lib/python3.10/subprocess.py", line 503, in run
with Popen(*popenargs, **kwargs) as process:
File "/home/share/huadjyin/home/lishaoshuai/miniconda3/envs/SpatialPy/lib/python3.10/subprocess.py", line 971, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/home/share/huadjyin/home/lishaoshuai/miniconda3/envs/SpatialPy/lib/python3.10/subprocess.py", line 1863, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
PermissionError: [Errno 13] Permission denied: 'pixi'
swanlab: Experiment ATAC-RNA has completed
/home/share/huadjyin/home/lishaoshuai/miniconda3/envs/SpatialPy/lib/python3.10/site-packages/torch/utils/checkpoint.py:295: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.
with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs): # type: ignore[attr-defined]
W0606 11:07:10.575000 70379997448064 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 3972731 closing signal SIGTERM
E0606 11:07:10.607000 70379997448064 torch/distributed/elastic/multiprocessing/api.py:833] failed (exitcode: 1) local_rank: 0 (pid: 3972728) of binary: /home/share/huadjyin/home/lishaoshuai/miniconda3/envs/SpatialPy/bin/python3.10
Traceback (most recent call last):
File "/home/share/huadjyin/home/lishaoshuai/miniconda3/envs/SpatialPy/bin/torchrun", line 8, in <module>
sys.exit(main())
File "/home/share/huadjyin/home/lishaoshuai/miniconda3/envs/SpatialPy/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 348, in wrapper
return f(*args, **kwargs)
File "/home/share/huadjyin/home/lishaoshuai/miniconda3/envs/SpatialPy/lib/python3.10/site-packages/torch/distributed/run.py", line 901, in main
run(args)
File "/home/share/huadjyin/home/lishaoshuai/miniconda3/envs/SpatialPy/lib/python3.10/site-packages/torch/distributed/run.py", line 892, in run
elastic_launch(
File "/home/share/huadjyin/home/lishaoshuai/miniconda3/envs/SpatialPy/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 133, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/share/huadjyin/home/lishaoshuai/miniconda3/envs/SpatialPy/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
/home/share/huadjyin/home/lishaoshuai/pangjiangshuan/stereo-seq-LLM4/ATACRNA_pretrain.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2025-06-06_11:07:10
host : cyclone001-agent-150
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 3972728)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
🧑💻 Expected result
🚑 Any additional [like screenshots]
-
SwanLab Version:
-
Platform:
Metadata
Metadata
Assignees
Labels
🐛 bugSomething isn't workingSomething isn't working🙋 help wantedExtra attention is neededExtra attention is needed