-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Description
The current and earlier versions of NLTK are vulnerable to a remote code execution vulnerability when using the integrated data package download functionality. A man-in-the-middle attacker or an attacker with control over the NLTK data index can force users which use data packages with pickled Python code to download a new version of the package which executes arbitrary code upon unpickling.
Data packages which have been identified to be vulnerable are "averaged_perceptron_tagger" and "punkt". For code to be vulnerable it has to download a data package and use functionality in NLTK which causes the data package to be unpickled.
Here is an example of vulnerable code for the "averaged_perceptron_tagger" data package:
import nltk
nltk.download("averaged_perceptron_tagger")
nltk.pos_tag(["hello", "world"])
This vulnerability was reported together with POC code to exploit it multiple times to the NLTK team via the email address mentioned in SECURITY.md (and later directly to some maintainers as well). It was reported first on 2024-05-19. So far there has been no response from the NLTK maintainers to these reports.
#2522 already pointed out the security implications of using pickled code a few years ago, but didn't receive a response either.