OOM with a lot of memory untouched 

## The problem 
I am training a sentence classification model using a transformer and a pipeline that is based on the default config. I am doing it on the custom dataset. When I start training I get: 

```
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 6.00 GiB total capacity; 1.64 GiB already allocated; 0 bytes free; 1.73 GiB reserved in total by PyTorch)
```
The weird things are: 
* This appears despite a change of the batchsize (breaks with size 1).
* Length of sentences is insignificant
* It appears regardless of a machine (tried on a cluster with 32GB GPU)
* My trainset has 15k sentences but if I lower this to 12 it works properly

Can I specifically ask spacy/torch to reserve more memory? There must be something wrong with memory allocation or something draining the memory out.

## How to reproduce the behavior
I am running with [deft definition](https://github.com/Elzawawy/DeftEval) dataset
and this is in my base config: 

```
# This is an auto-generated partial config. To use it with 'spacy train'
# you can run spacy init fill-config to auto-fill all default settings:
# python -m spacy init fill-config ./base_config.cfg ./config.cfg
[paths]
train = "../../data/definition_data/train.spacy"
dev = "../../data/definition_data/dev.spacy"

[system]
gpu_allocator = "tensorflow"

[nlp]
lang = "en"
pipeline = ["transformer","textcat"]
batch_size = 256

[components]

[components.transformer]
factory = "transformer"

[components.transformer.model]
@architectures = "spacy-transformers.TransformerModel.v1"
name = "bert-base-uncased"
tokenizer_config = {"use_fast": true}

[components.transformer.model.get_spans]
@span_getters = "spacy-transformers.strided_spans.v1"
window = 128
stride = 96

[components.textcat]
factory = "textcat"

[components.textcat.model]
@architectures = "spacy.TextCatBOW.v2"
exclusive_classes = true
ngram_size = 1
no_output_layer = false

[corpora]

[corpora.train]
@readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 0

[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0

[training]
accumulate_gradient = 3
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"

[training.optimizer]
@optimizers = "Adam.v1"

[training.optimizer.learn_rate]
@schedules = "warmup_linear.v1"
warmup_steps = 250
total_steps = 20000
initial_rate = 5e-5

[training.batcher]
@batchers = "spacy.batch_by_padded.v1"
discard_oversize = true
size = 2000
buffer = 256

[initialize]
vectors = ${paths.vectors}
```

## Your Environment

- **spaCy version:** 3.1.0
- **Platform:** Linux-5.10.60.1-microsoft-standard-WSL2-x86_64-with-glibc2.29
- **Python version:** 3.8.11
- **Pipelines:** en_core_web_lg (3.1.0), en_core_web_sm (3.1.0), en_core_web_trf (3.1.0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

OOM with a lot of memory untouched #9578

The problem

How to reproduce the behavior

Your Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

OOM with a lot of memory untouched #9578

Description

The problem

How to reproduce the behavior

Your Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions