Skip to content

Conversation

ekaf
Copy link
Member

@ekaf ekaf commented Jul 23, 2024

Fix #3266 and CVE-39705, by using the RestrictedUnpickler class from nltk.app.wordnet_app, to prevent from loading pickled classes or functions.

For example:

from nltk.data import load
from nltk.corpus import treebank

pos = load("taggers/maxent_treebank_pos_tagger/PY3/english.pickle")
print(pos.tag(treebank.sents()[0]))

Without this PR, the pickle loads and works:

[('Pierre', 'NNP'), ('Vinken', 'NNP'), (',', ','), ('61', 'CD'), ('years', 'NNS'), ('old', 'JJ'), (',', ','), ('will', 'MD'), ('join', 'VB'), ('the', 'DT'), ('board', 'NN'), ('as', 'IN'), ('a', 'DT'), ('nonexecutive', 'JJ'), ('director', 'NN'), ('Nov.', 'NNP'), ('29', 'CD'), ('.', '.')]

With this PR, loading classes or functions raise an error:

_pickle.UnpicklingError: global 'nltk.tag.sequential.ClassifierBasedPOSTagger' is forbidden

@ekaf
Copy link
Member Author

ekaf commented Jul 23, 2024

This PR allows unpickling simple types, and even dictionaries:

from nltk.data import load
print(type(load(f"help/tagsets/PY3/brown_tagset.pickle")))

<class 'dict'>

But it does not allow sets:

print(type(load(f"taggers/averaged_perceptron_tagger/averaged_perceptron_tagger.pickle")))

_pickle.UnpicklingError: global 'builtin.set' is forbidden

@ekaf ekaf requested review from stevenbird and alvations July 23, 2024 08:56
@ekaf
Copy link
Member Author

ekaf commented Jul 23, 2024

NB: the CI is expected to fail until the replacement nltk_data packages and handlers (#3283 and #3286) are merged.

@ted-at-openai
Copy link

Thank you! <3

@ekaf ekaf mentioned this pull request Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remote code execution vulnerability in NLTK
3 participants