Skip to content

Conversation

ashwini-orchestral
Copy link
Contributor

It is not safe to call yaml.load with any data received from an untrusted source. It allows instantiation of arbitrary objects. The function yaml.safe_load limits this ability to simple Python objects like integers or lists. For security reasons, yaml.safe_load is used.

@pull-request-size pull-request-size bot added the size/XS PR that changes 0-9 lines. Quick fix/merge. label Feb 22, 2021
@Kami Kami added this to the 3.5.0 milestone Feb 22, 2021
Copy link
Member

@Kami Kami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just echoing my comment here (#5159 (comment)), but otherwise, LGTM.

@Kami Kami self-requested a review February 22, 2021 22:20
@@ -77,7 +77,7 @@ def load_spec(module_name, spec_file, allow_duplicate_keys=False):

# 1. Check for duplicate keys
if not allow_duplicate_keys:
yaml.load(spec_string, UniqueKeyLoader)
yaml.safe_load(spec_string)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this earlier - we intentionally use yaml.load() and not yaml.safe_load() here since we use a custom loader which detects duplicate keys which yaml.safe_load() doesn't support.

So using safe_load would change and break the existing behavior and that's not what we want.

Per my comment on other PR - this function is only used to load trusted openapi.yaml file which is controlled by us so not using safe_load() should be fine.

I suggest reverting this change + adding a comment to that line of code clarifying why we don't use safe_load there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: It looks like it's actually possible to achieve "detect duplicate keys" behavior with safe_load, so we should do something like this instead - https://gist.github.com/danielpops/5a0726f2fb6288da749c4cd604276be8 (we of course can't just copy that code since it has no license which means we need to develop it ourselves or find similar snippet with an appropriate license).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That gist looks super overcomplicated for what it's trying to achieve. I'm cooking a patch for PyYAML that should allow us to inherit from SafeLoader and customize it to disallow duplicate keys.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blag @Kami please let @ashwini-orchestral what she needs to do here. @blag if you are working on a patch, will you be including the change here in this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two PRs to PyYAML to make this easier for us:

If one of those is accepted then it will allow this code to be much simpler.

m4dcoder added 2 commits March 3, 2021 05:15
Add the UniqueKeyLoader to the yaml SafeConstructor so that duplicate
key is checked on safe_load. We want to use safe_load since yaml.load
should not be used to load data from untrusted source.
@pull-request-size pull-request-size bot added size/S PR that changes 10-29 lines. Very easy to review. and removed size/XS PR that changes 0-9 lines. Quick fix/merge. labels Mar 3, 2021
@Kami
Copy link
Member

Kami commented Mar 4, 2021

@m4dcoder Can you please add a small test case for it? Here is quick example of similar test case I added recently - https://github.com/StackStorm/st2/pull/4846/files#diff-4d4c6f566c2d72910eff03ac16f442387a53beb0aa7e9b0ad5944ec5ce602ab2R101.

Besides that, LGTM.

Copy link
Member

@Kami Kami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

from st2common.util import spec_loader


class SpecLoaderTest(unittest2.TestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also wouldn't hurt to add a test case which verifies SafeLoad is indeed used, but otherwise, LGTM.

@cognifloyd
Copy link
Member

We just reformatted the code with black. (Hooray!) And this PR got caught in the cross fire too. (Arrgh!)
Luckily, merging master into this PR should not have many conflicts. Cheers!

Also, we need to look at performance when we switch out yaml.load for yaml.safe_load. @m4dcoder just added CSafeLoader to orquesta, and ended up needing to pass CSafeLoader into yaml.load to get both the performance boost and the safety. See: https://github.com/StackStorm/orquesta/pull/230/files#diff-c2248ccb7408cc089d472f1dcb72a531c386cac3a9af7ad7de9bebc1d5910f3c (and the discussion on that PR as well).

The method yaml.safe_load uses the python implementation of SafeLoader
by default. Use the CSafeLoader if import is successful for better
performance. This requires using the yaml.load method but pass the
CSafeLoader.
@m4dcoder m4dcoder changed the title use yaml.safe_load instead of yaml.load Refactor spec_loader to use yaml.load with SafeLoader Mar 10, 2021
@m4dcoder m4dcoder merged commit 0876d25 into StackStorm:master Mar 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement security size/S PR that changes 10-29 lines. Very easy to review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants