Skip to content

Support resizing images on TPU using tf.image.resize when size is not a compile-time constant #51693

@srihari-humbarwadi

Description

@srihari-humbarwadi

System information

  • TensorFlow version (you are using): 2.6
  • Are you willing to contribute it (Yes/No): Yes

Describe the feature and the current behavior/state.
XLA compilation fails when the size arg of tf.image.resize is not a compile-time constant.

Who will benefit with this feature?
This will allow users to resize images to dynamic sizes on TPUs

Any Other info.
code to reproduce the issue

import tensorflow as tf
print('TensorFlow:', tf.__version__)

resolver = tf.distribute.cluster_resolver.TPUClusterResolver.connect('local')
strategy = tf.distribute.TPUStrategy(resolver)


def _run(inputs):
    mask, height, width = inputs
    return tf.image.resize(mask, size=[height, width], method='nearest')

@tf.function
def _distributed_run(inputs):
    outputs = strategy.run(_run, args=(inputs,))
    return strategy.gather(outputs, axis=0)

mask = tf.random.normal((1, 100, 100, 3))
height = 200
width = 200
inputs = (mask, height, width)

outputs = _distributed_run(inputs)
print(outputs.shape)

output:

TensorFlow: 2.6.0

D0826 06:54:39.426496290   15038 ev_posix.cc:173]            Using polling engine: epollex
D0826 06:54:39.426573151   15038 lb_policy_registry.cc:42]   registering LB policy factory for "grpclb"
D0826 06:54:39.426583436   15038 lb_policy_registry.cc:42]   registering LB policy factory for "priority_experimental"
D0826 06:54:39.426587482   15038 lb_policy_registry.cc:42]   registering LB policy factory for "weighted_target_experimental"
D0826 06:54:39.426591102   15038 lb_policy_registry.cc:42]   registering LB policy factory for "pick_first"
D0826 06:54:39.426594552   15038 lb_policy_registry.cc:42]   registering LB policy factory for "round_robin"
D0826 06:54:39.426606683   15038 dns_resolver_ares.cc:499]   Using ares dns resolver
D0826 06:54:39.426633671   15038 certificate_provider_registry.cc:33] registering certificate provider factory for "file_watcher"
D0826 06:54:39.426644953   15038 lb_policy_registry.cc:42]   registering LB policy factory for "cds_experimental"
D0826 06:54:39.426649442   15038 lb_policy_registry.cc:42]   registering LB policy factory for "xds_cluster_impl_experimental"
D0826 06:54:39.426653456   15038 lb_policy_registry.cc:42]   registering LB policy factory for "xds_cluster_resolver_experimental"
D0826 06:54:39.426658294   15038 lb_policy_registry.cc:42]   registering LB policy factory for "xds_cluster_manager_experimental"
I0826 06:54:39.426774965   15038 server_builder.cc:332]      Synchronous server. Num CQs: 1, Min pollers: 1, Max Pollers: 2, CQ timeout (msec): 10000
I0826 06:54:39.426889181   15038 socket_utils_common_posix.cc:353] TCP_USER_TIMEOUT is available. TCP_USER_TIMEOUT will be used thereafter
I0826 06:54:39.472098409   15342 subchannel.cc:1065]         New connected subchannel at 0x62a3ce0 for subchannel 0x393c600
D0826 06:54:43.380396347   15830 init.cc:226]                grpc_shutdown starts clean-up now
D0826 06:54:46.843676713   15830 ev_posix.cc:173]            Using polling engine: epollex
D0826 06:54:46.843732206   15830 lb_policy_registry.cc:42]   registering LB policy factory for "grpclb"
D0826 06:54:46.843754461   15830 lb_policy_registry.cc:42]   registering LB policy factory for "priority_experimental"
D0826 06:54:46.843758619   15830 lb_policy_registry.cc:42]   registering LB policy factory for "weighted_target_experimental"
D0826 06:54:46.843761981   15830 lb_policy_registry.cc:42]   registering LB policy factory for "pick_first"
D0826 06:54:46.843768556   15830 lb_policy_registry.cc:42]   registering LB policy factory for "round_robin"
D0826 06:54:46.843777491   15830 dns_resolver_ares.cc:499]   Using ares dns resolver
D0826 06:54:46.843795296   15830 certificate_provider_registry.cc:33] registering certificate provider factory for "file_watcher"
D0826 06:54:46.843810614   15830 lb_policy_registry.cc:42]   registering LB policy factory for "cds_experimental"
D0826 06:54:46.843814307   15830 lb_policy_registry.cc:42]   registering LB policy factory for "xds_cluster_impl_experimental"
D0826 06:54:46.843818589   15830 lb_policy_registry.cc:42]   registering LB policy factory for "xds_cluster_resolver_experimental"
D0826 06:54:46.843825122   15830 lb_policy_registry.cc:42]   registering LB policy factory for "xds_cluster_manager_experimental"
I0826 06:54:46.843933271   15830 server_builder.cc:332]      Synchronous server. Num CQs: 1, Min pollers: 1, Max Pollers: 2, CQ timeout (msec): 10000
I0826 06:54:46.844822787   15631 subchannel.cc:1065]         New connected subchannel at 0x63af920 for subchannel 0x3867c80
F0826 06:54:47.241484   15823 image_resize_ops.cc:42] Check failed: out_size.size() == 2 (0 vs. 2) Invalid argument: Input 1 to node `resize/ResizeNearestNeighbor` with op ResizeNearestNeighbor must be a compile-time constant.

XLA compilation requires that operator arguments that represent shapes or dimensions be evaluated to concrete values at compile time. This error means that a shape or dimension argument could not be evaluated at compile time, usually because the value of the argument depends on a parameter to the computation, on a variable, or on a stateful operation such as a random number generator.
*** Check failure stack trace: ***
    @     0x7f0a1cc2ad87  (unknown)
    @     0x7f0a1cc29914  (unknown)
    @     0x7f0a1cc292c3  (unknown)
    @     0x7f0a1cc2b709  (unknown)
    @     0x7f0a18de0146  (unknown)
    @     0x7f0a18ddfbcf  (unknown)
    @     0x7f0a193f29da  (unknown)
    @     0x7f0a19c5e92c  (unknown)
    @     0x7f0a193d4295  (unknown)
    @     0x7f0a193e0ba5  (unknown)
    @     0x7f0a193dc526  (unknown)
    @     0x7f0a18e4f8d7  (unknown)
    @     0x7f0a129d5c5e  (unknown)
    @     0x7f0a129d7373  (unknown)
    @     0x7f0a129cecd3  TpuCompile_CompileAndBuild
    @     0x7f0a25aa81a5  tensorflow::tpu::TpuProgramGroup::CompileAndBuild()
    @     0x7f0a25a3bed9  tensorflow::tpu::TpuCompileOpKernelImpl::Compile()
    @     0x7f0a25ab2122  tensorflow::tpu::TpuCompileOpKernelCommon::CompileLocallyAndFillHostCache()
    @     0x7f0a25ab27a8  tensorflow::tpu::TpuCompileOpKernelCommon::ComputeInternal()::{lambda()#3}::operator()()
    @     0x7f0a25ab287c  std::_Function_handler<>::_M_invoke()
    @     0x7f0a25a6d25a  tensorflow::tpu::TpuCompilationCacheExternal::InitializeEntry()
    @     0x7f0a25abc45a  tensorflow::tpu::TpuCompilationCacheInterface::CompileIfKeyAbsentHelper()
    @     0x7f0a25abcf4a  tensorflow::tpu::TpuCompilationCacheInterface::CompileIfKeyAbsent()
    @     0x7f0a25ab4ca4  tensorflow::tpu::TpuCompileOpKernelCommon::ComputeInternal()
    @     0x7f0a25ab608d  tensorflow::tpu::TpuCompileOpKernelCommon::Compute()
    @     0x7f0a2c8fb330  tensorflow::(anonymous namespace)::ExecutorState<>::Process()
    @     0x7f0a2c8eed82  std::_Function_handler<>::_M_invoke()
    @     0x7f0a2cc12e65  Eigen::ThreadPoolTempl<>::WorkerLoop()
    @     0x7f0a2cc10b37  std::_Function_handler<>::_M_invoke()
    @     0x7f0a2cbf41ef  tensorflow::(anonymous namespace)::PThread::ThreadFn()
    @     0x7f0c4f7a7609  start_thread
https://symbolize.stripped_domain/r/?trace=7f0a1cc2ad87,7f0a1cc29913,7f0a1cc292c2,7f0a1cc2b708,7f0a18de0145,7f0a18ddfbce,7f0a193f29d9,7f0a19c5e92b,7f0a193d4294,7f0a193e0ba4,7f0a193dc525,7f0a18e4f8d6,7f0a129d5c5d,7f0a129d7372,7f0a129cecd2,7f0a25aa81a4,7f0a25a3bed8,7f0a25ab2121,7f0a25ab27a7,7f0a25ab287b,7f0a25a6d259,7f0a25abc459,7f0a25abcf49,7f0a25ab4ca3,7f0a25ab608c,7f0a2c8fb32f,7f0a2c8eed81,7f0a2cc12e64,7f0a2cc10b36,7f0a2cbf41ee,7f0c4f7a7608&map=b7c22d7954df6b6961e4435041132cf899ee4a5e:7f0a1d725000-7f0a31424270,ca1b7ab241ee28147b3d590cadb5dc1b:7f0a0ff1c000-7f0a1cf4eb20
https://symbolize.stripped_domain/r/?trace=7f0c4f80718b,7f0c4f80720f,7f0a1cc2aec7,7f0a1cc29913,7f0a1cc292c2,7f0a1cc2b708,7f0a18de0145,7f0a18ddfbce,7f0a193f29d9,7f0a19c5e92b,7f0a193d4294,7f0a193e0ba4,7f0a193dc525,7f0a18e4f8d6,7f0a129d5c5d,7f0a129d7372,7f0a129cecd2,7f0a25aa81a4,7f0a25a3bed8,7f0a25ab2121,7f0a25ab27a7,7f0a25ab287b,7f0a25a6d259,7f0a25abc459,7f0a25abcf49,7f0a25ab4ca3,7f0a25ab608c,7f0a2c8fb32f,7f0a2c8eed81,7f0a2cc12e64,7f0a2cc10b36,7f0a2cbf41ee,7f0c4f7a7608&map=b7c22d7954df6b6961e4435041132cf899ee4a5e:7f0a1d725000-7f0a31424270,ca1b7ab241ee28147b3d590cadb5dc1b:7f0a0ff1c000-7f0a1cf4eb20
*** SIGABRT received by PID 15038 (TID 15823) on cpu 2 from PID 15038; ***
E0826 06:54:47.512108   15823 coredump_hook.cc:292] RAW: Remote crash data gathering hook invoked.
E0826 06:54:47.512125   15823 coredump_hook.cc:384] RAW: Skipping coredump since rlimit was 0 at process start.
E0826 06:54:47.512134   15823 client.cc:222] RAW: Coroner client retries enabled (b/136286901), will retry for up to 30 sec.
E0826 06:54:47.512143   15823 coredump_hook.cc:447] RAW: Sending fingerprint to remote end.
E0826 06:54:47.512149   15823 coredump_socket.cc:124] RAW: Stat failed errno=2 on socket /var/google/services/logmanagerd/remote_coredump.socket
E0826 06:54:47.512154   15823 coredump_hook.cc:451] RAW: Cannot send fingerprint to Coroner: [NOT_FOUND] Missing crash reporting socket. Is the listener running?
E0826 06:54:47.512158   15823 coredump_hook.cc:525] RAW: Discarding core.
F0826 06:54:47.241484   15823 image_resize_ops.cc:42] Check failed: out_size.size() == 2 (0 vs. 2) Invalid argument: Input 1 to node `resize/ResizeNearestNeighbor` with op ResizeNearestNeighbor must be a compile-time constant.

XLA compilation requires that operator arguments that represent shapes or dimensions be evaluated to concrete values at compile time. This error means that a shape or dimension argument could not be evaluated at compile time, usually because the value of the argument depends on a p
E0826 06:54:47.998539   15823 process_state.cc:771] RAW: Raising signal 6 with default behavior
Aborted (core dumped)

Metadata

Metadata

Assignees

Labels

2.6.0comp:opsOPs related issuescomp:tpustpu, tpuestimatorstaleThis label marks the issue/pr stale - to be closed automatically if no activitystat:awaiting responseStatus - Awaiting response from authorstat:awaiting tensorflowerStatus - Awaiting response from tensorflowertype:bugBug

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions