Remote code execution vulnerability in NLTK

The current and earlier versions of NLTK are vulnerable to a remote code execution vulnerability when using the integrated data package download functionality. A man-in-the-middle attacker or an attacker with control over the NLTK data index can force users which use data packages with pickled Python code to download a new version of the package which executes arbitrary code upon unpickling.

Data packages which have been identified to be vulnerable are "averaged_perceptron_tagger" and "punkt". For code to be vulnerable it has to download a data package and use functionality in NLTK which causes the data package to be unpickled.

Here is an example of vulnerable code for the "averaged_perceptron_tagger" data package:

```python
import nltk
nltk.download("averaged_perceptron_tagger")
nltk.pos_tag(["hello", "world"])
```

This vulnerability was reported together with POC code to exploit it multiple times to the NLTK team via the email address mentioned in [SECURITY.md](SECURITY.md) (and later directly to some maintainers as well). It was reported first on 2024-05-19. So far there has been no response from the NLTK maintainers to these reports.

https://github.com/nltk/nltk/issues/2522 already pointed out the security implications of using pickled code a few years ago, but didn't receive a response either.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remote code execution vulnerability in NLTK #3266

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Remote code execution vulnerability in NLTK #3266

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions