Skip to content

Conversation

sorensenjs
Copy link
Contributor

Description

Adds Kendall's Tau, a measure of ordinal concordance as a Keras compatible metric,

Type of change

Checklist:

  • I've properly formatted my code according to the guidelines
    • By running Black + Flake8
    • By running pre-commit hooks
  • This PR addresses an already submitted issue for TensorFlow Addons
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • This PR contains modifications to C++ custom-ops

How Has This Been Tested?

Multiple tests have been included, and because this metric exists in non-tf form in scipy it's easy
to verify correctness.

If you're adding a bugfix or new feature please describe the tests that you ran to verify your changes:
*

@bot-of-gabrieldemarmiesse

@marload

You are owner of some files modified in this pull request.
Would you kindly review the changes whenever you have the time to?
Thank you very much.

@AakashKumarNain
Copy link
Member

Thanks for the contribution. I am not very familiar with scipy implementation. @WindQAQ @bhack are you familiar with this one?

@AakashKumarNain
Copy link
Member

LGTM! Thanks again. Some test are failing but they are on our end and not yours. Once we fix them, I think we can merge this.

@bhack
Copy link
Contributor

bhack commented Sep 20, 2020

/cc @brianwa84 @jvdillon Are you interested in this for the Tensorflow Probability repo? It could fit in the stats sub package: https://www.tensorflow.org/probability/api_docs/python/tfp/stats

@jvdillon
Copy link

Hello! I'd love to see this in TFP!

Id also love to see the top-k loss, as described here: https://arxiv.org/pdf/1206.5280.pdf

Copy link

@jvdillon jvdillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ive left some initial comments, should you be interested in moving it to TFP.

@@ -0,0 +1,195 @@
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2020

exchanges = 0
num = tf.size(y)
k = tf.constant(1, tf.int32)
while tf.less(k, num):
Copy link

@jvdillon jvdillon Sep 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since TFP is a general purpose library, we need to write everything so that it does not presume eager execution. Minimally this means no python controlflow (eg, for, while, and if are not allowed).

More generally, this has a code nature that would be quite inefficient in a SIMD regime. Perhaps we could examine the full quadratic comparison? Eg,

0.5 * tf.math.reduce_mean(score[..., tf.newaxis] < score[..., tf.newaxis, :], axis=-1)

In general this would be quite large. I therefore recommend extending this idea to "chunks" using a tf.while_loop to reduce them. In so doing we can handle many cases in memory yet also bound the memory by way of chunks.

Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only after I finished writing this did I fully examine the implementation of the AUC metrics in Keras, which use a bucketized approximation, not that dissimilar to the two-dimensional bucket approach presented in https://arxiv.org/abs/1712.01521 which would be much more amenable to streaming. (OTOH, I'm unclear why Metrics need to be streaming, usually the classifier labels is a space much smaller than the input features.)

I also note that I should have looked at the completely rewritten scipy implementation which works differently, but still requires sorting by both of the full inputs which seems largely incompatible with tf Metrics design.
https://github.com/scipy/scipy/blob/01d8bfb6f239df4ce70c799b9b485b53733c9911/scipy/stats/stats.py#L4452

I suspect a reasonable thing to do here is to probably fork this into two separate efforts, one for tfp which would focus on the explicit exact solution with potential backoff to approximate, and a tensorflow addons submission that focuses on implementing a streaming approximation. I think I can manage that, but it's going to take some time - and at present I don't have a good sense of which of these would be easier to do first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi - sorry I'm back to this finally - I think from the discussion this PR should be moved to tfp,
most likely in the https://github.com/tensorflow/probability/tree/master/tensorflow_probability/python/stats directory.

However, I think I need to make a significant redesign to comply with the suggestion of removing the control flow.

I'm not quite able to distill what the consensus is here for next steps, maybe I should do all of these:

  • Make a tfp PR?
  • Reimplement this as a streaming algorithm per the previously mentioned arxiv paper.
  • Change this current PR to use tf.while(), possibly with chunks.

@bhack
Copy link
Contributor

bhack commented Sep 21, 2020

@jvdillon Thanks, I have question It seems that the TFP repository currently doesn't hosts metrics right? We could have case like this or our Matthews correlation impl where the user could need to consume it as inherited from tf.keras.metrics.Metric object.
So if the ops is in TFP but the metric is here we need to depend on TFP. Do you plan to support metrics directly?

@jvdillon
Copy link

jvdillon commented Sep 22, 2020 via email

@bhack
Copy link
Contributor

bhack commented Sep 22, 2020

@jvdillon Ok but I see that you have Keras layers and optimizers in TFP.
I think that it is ok to have an op like this in TFP but then why not to have also metric that wraps the operator (when it is useful to be consumed as metrics)?
I think that It will be hard for TF Addons to depend on TFP currently just to implement metrics.

@jvdillon
Copy link

jvdillon commented Sep 22, 2020 via email

@bhack
Copy link
Contributor

bhack commented Sep 22, 2020

@brianwa84
Copy link
Contributor

I guess keras metrics have some streaming functionality? At least the old tf.metrics did.
Note that we have a similar code for streaming reduction baking under tfp.experimental.mcmc (covariance, moments, etc). This uses tensors instead of Variables, though.

@AakashKumarNain
Copy link
Member

I guess keras metrics have some streaming functionality? At least the old tf.metrics did.
Note that we have a similar code for streaming reduction baking under tfp.experimental.mcmc (covariance, moments, etc). This uses tensors instead of Variables, though.

@brianwa84 Yes, the Metric api has streaming functionality.
PS: I am okay with adding the core functionality for the time being.

cc: @seanpmorgan

@bhack
Copy link
Contributor

bhack commented Oct 18, 2020

@seanpmorgan Before we lost this PR I am still oriented to include this TFP.

@bhack
Copy link
Contributor

bhack commented Oct 26, 2020

@jvdillon @brianwa84 Do you have enough Github perm to migrate this in TFP?

@brianwa84
Copy link
Contributor

I don't know what you mean by "migrate". I think you can just open a new PR against TFP.

For now, we could perhaps put something like this under tfp.experimental.stats, where we have other streaming metrics like covariance etc. Or you could modify to drop the streaming component and simply compute on a single tensor of observations, which would be more consistent w/ e.g. tfp.stats.covariance. We would likely want to eliminate the for loops for TFP. If you really need a loop, we usually use tf.while_loop or tf.scan to avoid bloating the tf.function graph.

@bhack
Copy link
Contributor

bhack commented Oct 26, 2020

I don't know what you mean by "migrate".

Yes I supposed that we had a related issue about this feature as it was mandatory in our contribution policy. But in this case we have a PR directly.

For now, we could perhaps put something like this under tfp.experimental.stats, where we have other streaming metrics like covariance etc. Or you could modify to drop the streaming component and simply compute on a single tensor of observations, which would be more consistent w/ e.g. tfp.stats.covariance. We would likely want to eliminate the for loops for TFP. If you really need a loop, we usually use tf.while_loop or tf.scan to avoid bloating the tf.function graph.

The plan is ok for me

@sorensenjs
Can you open the PR at https://github.com/tensorflow/probability/ ?

@sorensenjs
Copy link
Contributor Author

Created tensorflow/probability#1147, following up there with requested changes from this thread.

@sorensenjs sorensenjs closed this Nov 1, 2020
@sorensenjs sorensenjs deleted the kendallstau branch November 1, 2020 21:03
@sorensenjs
Copy link
Contributor Author

@bhack
Copy link
Contributor

bhack commented Mar 12, 2021

It would be nice if it could be exposed as metrics.

@sorensenjs
Copy link
Contributor Author

Kendall's tau doesn't fit the metrics streaming style well. There are approximate versions (see https://arxiv.org/pdf/1712.01521.pdf) that are more interesting and well suited to the metrics. I might try doing that.

@bhack
Copy link
Contributor

bhack commented Mar 12, 2021

Kendall's tau doesn't fit the metrics streaming style well. There are approximate versions (see https://arxiv.org/pdf/1712.01521.pdf) that are more interesting and well suited to the metrics. I might try doing that.

Interesting. Thanks for the resoruce

@sorensenjs
Copy link
Contributor Author

I've created a completely new streaming based algorithm for this abandoned PR
#2423

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants