Skip to content

Remote code execution vulnerability in NLTK #3266

@Dunedan

Description

@Dunedan

The current and earlier versions of NLTK are vulnerable to a remote code execution vulnerability when using the integrated data package download functionality. A man-in-the-middle attacker or an attacker with control over the NLTK data index can force users which use data packages with pickled Python code to download a new version of the package which executes arbitrary code upon unpickling.

Data packages which have been identified to be vulnerable are "averaged_perceptron_tagger" and "punkt". For code to be vulnerable it has to download a data package and use functionality in NLTK which causes the data package to be unpickled.

Here is an example of vulnerable code for the "averaged_perceptron_tagger" data package:

import nltk
nltk.download("averaged_perceptron_tagger")
nltk.pos_tag(["hello", "world"])

This vulnerability was reported together with POC code to exploit it multiple times to the NLTK team via the email address mentioned in SECURITY.md (and later directly to some maintainers as well). It was reported first on 2024-05-19. So far there has been no response from the NLTK maintainers to these reports.

#2522 already pointed out the security implications of using pickled code a few years ago, but didn't receive a response either.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions