-
Notifications
You must be signed in to change notification settings - Fork 74.8k
Closed
Labels
2.6.0comp:opsOPs related issuesOPs related issuescomp:tpustpu, tpuestimatortpu, tpuestimatorstaleThis label marks the issue/pr stale - to be closed automatically if no activityThis label marks the issue/pr stale - to be closed automatically if no activitystat:awaiting responseStatus - Awaiting response from authorStatus - Awaiting response from authorstat:awaiting tensorflowerStatus - Awaiting response from tensorflowerStatus - Awaiting response from tensorflowertype:bugBugBug
Description
System information
- TensorFlow version (you are using): 2.6
- Are you willing to contribute it (Yes/No): Yes
Describe the feature and the current behavior/state.
XLA compilation fails when the size
arg of tf.image.resize
is not a compile-time constant.
Who will benefit with this feature?
This will allow users to resize images to dynamic sizes on TPUs
Any Other info.
code to reproduce the issue
import tensorflow as tf
print('TensorFlow:', tf.__version__)
resolver = tf.distribute.cluster_resolver.TPUClusterResolver.connect('local')
strategy = tf.distribute.TPUStrategy(resolver)
def _run(inputs):
mask, height, width = inputs
return tf.image.resize(mask, size=[height, width], method='nearest')
@tf.function
def _distributed_run(inputs):
outputs = strategy.run(_run, args=(inputs,))
return strategy.gather(outputs, axis=0)
mask = tf.random.normal((1, 100, 100, 3))
height = 200
width = 200
inputs = (mask, height, width)
outputs = _distributed_run(inputs)
print(outputs.shape)
output:
TensorFlow: 2.6.0
D0826 06:54:39.426496290 15038 ev_posix.cc:173] Using polling engine: epollex
D0826 06:54:39.426573151 15038 lb_policy_registry.cc:42] registering LB policy factory for "grpclb"
D0826 06:54:39.426583436 15038 lb_policy_registry.cc:42] registering LB policy factory for "priority_experimental"
D0826 06:54:39.426587482 15038 lb_policy_registry.cc:42] registering LB policy factory for "weighted_target_experimental"
D0826 06:54:39.426591102 15038 lb_policy_registry.cc:42] registering LB policy factory for "pick_first"
D0826 06:54:39.426594552 15038 lb_policy_registry.cc:42] registering LB policy factory for "round_robin"
D0826 06:54:39.426606683 15038 dns_resolver_ares.cc:499] Using ares dns resolver
D0826 06:54:39.426633671 15038 certificate_provider_registry.cc:33] registering certificate provider factory for "file_watcher"
D0826 06:54:39.426644953 15038 lb_policy_registry.cc:42] registering LB policy factory for "cds_experimental"
D0826 06:54:39.426649442 15038 lb_policy_registry.cc:42] registering LB policy factory for "xds_cluster_impl_experimental"
D0826 06:54:39.426653456 15038 lb_policy_registry.cc:42] registering LB policy factory for "xds_cluster_resolver_experimental"
D0826 06:54:39.426658294 15038 lb_policy_registry.cc:42] registering LB policy factory for "xds_cluster_manager_experimental"
I0826 06:54:39.426774965 15038 server_builder.cc:332] Synchronous server. Num CQs: 1, Min pollers: 1, Max Pollers: 2, CQ timeout (msec): 10000
I0826 06:54:39.426889181 15038 socket_utils_common_posix.cc:353] TCP_USER_TIMEOUT is available. TCP_USER_TIMEOUT will be used thereafter
I0826 06:54:39.472098409 15342 subchannel.cc:1065] New connected subchannel at 0x62a3ce0 for subchannel 0x393c600
D0826 06:54:43.380396347 15830 init.cc:226] grpc_shutdown starts clean-up now
D0826 06:54:46.843676713 15830 ev_posix.cc:173] Using polling engine: epollex
D0826 06:54:46.843732206 15830 lb_policy_registry.cc:42] registering LB policy factory for "grpclb"
D0826 06:54:46.843754461 15830 lb_policy_registry.cc:42] registering LB policy factory for "priority_experimental"
D0826 06:54:46.843758619 15830 lb_policy_registry.cc:42] registering LB policy factory for "weighted_target_experimental"
D0826 06:54:46.843761981 15830 lb_policy_registry.cc:42] registering LB policy factory for "pick_first"
D0826 06:54:46.843768556 15830 lb_policy_registry.cc:42] registering LB policy factory for "round_robin"
D0826 06:54:46.843777491 15830 dns_resolver_ares.cc:499] Using ares dns resolver
D0826 06:54:46.843795296 15830 certificate_provider_registry.cc:33] registering certificate provider factory for "file_watcher"
D0826 06:54:46.843810614 15830 lb_policy_registry.cc:42] registering LB policy factory for "cds_experimental"
D0826 06:54:46.843814307 15830 lb_policy_registry.cc:42] registering LB policy factory for "xds_cluster_impl_experimental"
D0826 06:54:46.843818589 15830 lb_policy_registry.cc:42] registering LB policy factory for "xds_cluster_resolver_experimental"
D0826 06:54:46.843825122 15830 lb_policy_registry.cc:42] registering LB policy factory for "xds_cluster_manager_experimental"
I0826 06:54:46.843933271 15830 server_builder.cc:332] Synchronous server. Num CQs: 1, Min pollers: 1, Max Pollers: 2, CQ timeout (msec): 10000
I0826 06:54:46.844822787 15631 subchannel.cc:1065] New connected subchannel at 0x63af920 for subchannel 0x3867c80
F0826 06:54:47.241484 15823 image_resize_ops.cc:42] Check failed: out_size.size() == 2 (0 vs. 2) Invalid argument: Input 1 to node `resize/ResizeNearestNeighbor` with op ResizeNearestNeighbor must be a compile-time constant.
XLA compilation requires that operator arguments that represent shapes or dimensions be evaluated to concrete values at compile time. This error means that a shape or dimension argument could not be evaluated at compile time, usually because the value of the argument depends on a parameter to the computation, on a variable, or on a stateful operation such as a random number generator.
*** Check failure stack trace: ***
@ 0x7f0a1cc2ad87 (unknown)
@ 0x7f0a1cc29914 (unknown)
@ 0x7f0a1cc292c3 (unknown)
@ 0x7f0a1cc2b709 (unknown)
@ 0x7f0a18de0146 (unknown)
@ 0x7f0a18ddfbcf (unknown)
@ 0x7f0a193f29da (unknown)
@ 0x7f0a19c5e92c (unknown)
@ 0x7f0a193d4295 (unknown)
@ 0x7f0a193e0ba5 (unknown)
@ 0x7f0a193dc526 (unknown)
@ 0x7f0a18e4f8d7 (unknown)
@ 0x7f0a129d5c5e (unknown)
@ 0x7f0a129d7373 (unknown)
@ 0x7f0a129cecd3 TpuCompile_CompileAndBuild
@ 0x7f0a25aa81a5 tensorflow::tpu::TpuProgramGroup::CompileAndBuild()
@ 0x7f0a25a3bed9 tensorflow::tpu::TpuCompileOpKernelImpl::Compile()
@ 0x7f0a25ab2122 tensorflow::tpu::TpuCompileOpKernelCommon::CompileLocallyAndFillHostCache()
@ 0x7f0a25ab27a8 tensorflow::tpu::TpuCompileOpKernelCommon::ComputeInternal()::{lambda()#3}::operator()()
@ 0x7f0a25ab287c std::_Function_handler<>::_M_invoke()
@ 0x7f0a25a6d25a tensorflow::tpu::TpuCompilationCacheExternal::InitializeEntry()
@ 0x7f0a25abc45a tensorflow::tpu::TpuCompilationCacheInterface::CompileIfKeyAbsentHelper()
@ 0x7f0a25abcf4a tensorflow::tpu::TpuCompilationCacheInterface::CompileIfKeyAbsent()
@ 0x7f0a25ab4ca4 tensorflow::tpu::TpuCompileOpKernelCommon::ComputeInternal()
@ 0x7f0a25ab608d tensorflow::tpu::TpuCompileOpKernelCommon::Compute()
@ 0x7f0a2c8fb330 tensorflow::(anonymous namespace)::ExecutorState<>::Process()
@ 0x7f0a2c8eed82 std::_Function_handler<>::_M_invoke()
@ 0x7f0a2cc12e65 Eigen::ThreadPoolTempl<>::WorkerLoop()
@ 0x7f0a2cc10b37 std::_Function_handler<>::_M_invoke()
@ 0x7f0a2cbf41ef tensorflow::(anonymous namespace)::PThread::ThreadFn()
@ 0x7f0c4f7a7609 start_thread
https://symbolize.stripped_domain/r/?trace=7f0a1cc2ad87,7f0a1cc29913,7f0a1cc292c2,7f0a1cc2b708,7f0a18de0145,7f0a18ddfbce,7f0a193f29d9,7f0a19c5e92b,7f0a193d4294,7f0a193e0ba4,7f0a193dc525,7f0a18e4f8d6,7f0a129d5c5d,7f0a129d7372,7f0a129cecd2,7f0a25aa81a4,7f0a25a3bed8,7f0a25ab2121,7f0a25ab27a7,7f0a25ab287b,7f0a25a6d259,7f0a25abc459,7f0a25abcf49,7f0a25ab4ca3,7f0a25ab608c,7f0a2c8fb32f,7f0a2c8eed81,7f0a2cc12e64,7f0a2cc10b36,7f0a2cbf41ee,7f0c4f7a7608&map=b7c22d7954df6b6961e4435041132cf899ee4a5e:7f0a1d725000-7f0a31424270,ca1b7ab241ee28147b3d590cadb5dc1b:7f0a0ff1c000-7f0a1cf4eb20
https://symbolize.stripped_domain/r/?trace=7f0c4f80718b,7f0c4f80720f,7f0a1cc2aec7,7f0a1cc29913,7f0a1cc292c2,7f0a1cc2b708,7f0a18de0145,7f0a18ddfbce,7f0a193f29d9,7f0a19c5e92b,7f0a193d4294,7f0a193e0ba4,7f0a193dc525,7f0a18e4f8d6,7f0a129d5c5d,7f0a129d7372,7f0a129cecd2,7f0a25aa81a4,7f0a25a3bed8,7f0a25ab2121,7f0a25ab27a7,7f0a25ab287b,7f0a25a6d259,7f0a25abc459,7f0a25abcf49,7f0a25ab4ca3,7f0a25ab608c,7f0a2c8fb32f,7f0a2c8eed81,7f0a2cc12e64,7f0a2cc10b36,7f0a2cbf41ee,7f0c4f7a7608&map=b7c22d7954df6b6961e4435041132cf899ee4a5e:7f0a1d725000-7f0a31424270,ca1b7ab241ee28147b3d590cadb5dc1b:7f0a0ff1c000-7f0a1cf4eb20
*** SIGABRT received by PID 15038 (TID 15823) on cpu 2 from PID 15038; ***
E0826 06:54:47.512108 15823 coredump_hook.cc:292] RAW: Remote crash data gathering hook invoked.
E0826 06:54:47.512125 15823 coredump_hook.cc:384] RAW: Skipping coredump since rlimit was 0 at process start.
E0826 06:54:47.512134 15823 client.cc:222] RAW: Coroner client retries enabled (b/136286901), will retry for up to 30 sec.
E0826 06:54:47.512143 15823 coredump_hook.cc:447] RAW: Sending fingerprint to remote end.
E0826 06:54:47.512149 15823 coredump_socket.cc:124] RAW: Stat failed errno=2 on socket /var/google/services/logmanagerd/remote_coredump.socket
E0826 06:54:47.512154 15823 coredump_hook.cc:451] RAW: Cannot send fingerprint to Coroner: [NOT_FOUND] Missing crash reporting socket. Is the listener running?
E0826 06:54:47.512158 15823 coredump_hook.cc:525] RAW: Discarding core.
F0826 06:54:47.241484 15823 image_resize_ops.cc:42] Check failed: out_size.size() == 2 (0 vs. 2) Invalid argument: Input 1 to node `resize/ResizeNearestNeighbor` with op ResizeNearestNeighbor must be a compile-time constant.
XLA compilation requires that operator arguments that represent shapes or dimensions be evaluated to concrete values at compile time. This error means that a shape or dimension argument could not be evaluated at compile time, usually because the value of the argument depends on a p
E0826 06:54:47.998539 15823 process_state.cc:771] RAW: Raising signal 6 with default behavior
Aborted (core dumped)
Metadata
Metadata
Assignees
Labels
2.6.0comp:opsOPs related issuesOPs related issuescomp:tpustpu, tpuestimatortpu, tpuestimatorstaleThis label marks the issue/pr stale - to be closed automatically if no activityThis label marks the issue/pr stale - to be closed automatically if no activitystat:awaiting responseStatus - Awaiting response from authorStatus - Awaiting response from authorstat:awaiting tensorflowerStatus - Awaiting response from tensorflowerStatus - Awaiting response from tensorflowertype:bugBugBug