-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Open
Labels
feat / transformerFeature: TransformerFeature: TransformergpuUsing spaCy on GPUUsing spaCy on GPUperf / memoryPerformance: memory usePerformance: memory use
Description
The problem
I am training a sentence classification model using a transformer and a pipeline that is based on the default config. I am doing it on the custom dataset. When I start training I get:
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 6.00 GiB total capacity; 1.64 GiB already allocated; 0 bytes free; 1.73 GiB reserved in total by PyTorch)
The weird things are:
- This appears despite a change of the batchsize (breaks with size 1).
- Length of sentences is insignificant
- It appears regardless of a machine (tried on a cluster with 32GB GPU)
- My trainset has 15k sentences but if I lower this to 12 it works properly
Can I specifically ask spacy/torch to reserve more memory? There must be something wrong with memory allocation or something draining the memory out.
How to reproduce the behavior
I am running with deft definition dataset
and this is in my base config:
# This is an auto-generated partial config. To use it with 'spacy train'
# you can run spacy init fill-config to auto-fill all default settings:
# python -m spacy init fill-config ./base_config.cfg ./config.cfg
[paths]
train = "../../data/definition_data/train.spacy"
dev = "../../data/definition_data/dev.spacy"
[system]
gpu_allocator = "tensorflow"
[nlp]
lang = "en"
pipeline = ["transformer","textcat"]
batch_size = 256
[components]
[components.transformer]
factory = "transformer"
[components.transformer.model]
@architectures = "spacy-transformers.TransformerModel.v1"
name = "bert-base-uncased"
tokenizer_config = {"use_fast": true}
[components.transformer.model.get_spans]
@span_getters = "spacy-transformers.strided_spans.v1"
window = 128
stride = 96
[components.textcat]
factory = "textcat"
[components.textcat.model]
@architectures = "spacy.TextCatBOW.v2"
exclusive_classes = true
ngram_size = 1
no_output_layer = false
[corpora]
[corpora.train]
@readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 0
[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0
[training]
accumulate_gradient = 3
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
[training.optimizer]
@optimizers = "Adam.v1"
[training.optimizer.learn_rate]
@schedules = "warmup_linear.v1"
warmup_steps = 250
total_steps = 20000
initial_rate = 5e-5
[training.batcher]
@batchers = "spacy.batch_by_padded.v1"
discard_oversize = true
size = 2000
buffer = 256
[initialize]
vectors = ${paths.vectors}
Your Environment
- spaCy version: 3.1.0
- Platform: Linux-5.10.60.1-microsoft-standard-WSL2-x86_64-with-glibc2.29
- Python version: 3.8.11
- Pipelines: en_core_web_lg (3.1.0), en_core_web_sm (3.1.0), en_core_web_trf (3.1.0)
Metadata
Metadata
Assignees
Labels
feat / transformerFeature: TransformerFeature: TransformergpuUsing spaCy on GPUUsing spaCy on GPUperf / memoryPerformance: memory usePerformance: memory use