-
-
Notifications
You must be signed in to change notification settings - Fork 766
Refactor spec_loader to use yaml.load with SafeLoader #5162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor spec_loader to use yaml.load with SafeLoader #5162
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just echoing my comment here (#5159 (comment)), but otherwise, LGTM.
@@ -77,7 +77,7 @@ def load_spec(module_name, spec_file, allow_duplicate_keys=False): | |||
|
|||
# 1. Check for duplicate keys | |||
if not allow_duplicate_keys: | |||
yaml.load(spec_string, UniqueKeyLoader) | |||
yaml.safe_load(spec_string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed this earlier - we intentionally use yaml.load()
and not yaml.safe_load()
here since we use a custom loader which detects duplicate keys which yaml.safe_load()
doesn't support.
So using safe_load would change and break the existing behavior and that's not what we want.
Per my comment on other PR - this function is only used to load trusted openapi.yaml file which is controlled by us so not using safe_load()
should be fine.
I suggest reverting this change + adding a comment to that line of code clarifying why we don't use safe_load there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT: It looks like it's actually possible to achieve "detect duplicate keys" behavior with safe_load
, so we should do something like this instead - https://gist.github.com/danielpops/5a0726f2fb6288da749c4cd604276be8 (we of course can't just copy that code since it has no license which means we need to develop it ourselves or find similar snippet with an appropriate license).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That gist looks super overcomplicated for what it's trying to achieve. I'm cooking a patch for PyYAML that should allow us to inherit from SafeLoader
and customize it to disallow duplicate keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@blag @Kami please let @ashwini-orchestral what she needs to do here. @blag if you are working on a patch, will you be including the change here in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two PRs to PyYAML to make this easier for us:
If one of those is accepted then it will allow this code to be much simpler.
Add the UniqueKeyLoader to the yaml SafeConstructor so that duplicate key is checked on safe_load. We want to use safe_load since yaml.load should not be used to load data from untrusted source.
@m4dcoder Can you please add a small test case for it? Here is quick example of similar test case I added recently - https://github.com/StackStorm/st2/pull/4846/files#diff-4d4c6f566c2d72910eff03ac16f442387a53beb0aa7e9b0ad5944ec5ce602ab2R101. Besides that, LGTM. |
Add unit tests to cover success case and failure where YAML has duplicate keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
from st2common.util import spec_loader | ||
|
||
|
||
class SpecLoaderTest(unittest2.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also wouldn't hurt to add a test case which verifies SafeLoad is indeed used, but otherwise, LGTM.
We just reformatted the code with black. (Hooray!) And this PR got caught in the cross fire too. (Arrgh!) Also, we need to look at performance when we switch out |
The method yaml.safe_load uses the python implementation of SafeLoader by default. Use the CSafeLoader if import is successful for better performance. This requires using the yaml.load method but pass the CSafeLoader.
It is not safe to call yaml.load with any data received from an untrusted source. It allows instantiation of arbitrary objects. The function yaml.safe_load limits this ability to simple Python objects like integers or lists. For security reasons, yaml.safe_load is used.