Potential bug in PLM training

There seems to be a bug in `mask_tokens` method of `DataCollatorForPermutationLanguageModeling`. Based on the comment, [this line](https://github.com/huggingface/transformers/blob/master/src/transformers/data/data_collator.py#L294) is supposed to compute mask for non-functional tokens, ie. anything but padding and special tokens. So there should be an OR between the `padding_mask` and `special_tokens_mask`, and not AND. For reference, the [corresponding line](https://github.com/zihangdai/xlnet/blob/master/data_utils.py#L602) in the original XLNet code also has an OR.

I should acknowledge that I haven't understood the permutation masking code properly yet. But raising an issue, because it seems wrong to me.

-----

Besides the above problem, I'm also getting a very bad perplexity (**296.0**) on evaluating (w/o finetuning) `xlnet-base-cased` PLM model on plain wikitext2 dataset (`wiki.test.raw`). I've used XLNet example from [here](https://github.com/huggingface/transformers/tree/master/examples/language-modeling) (without `--do-train` flag) to get the perplexity.

The PLM code only works if the sequence lengths are even. To workaround this, I append a padding token when sequence length is odd. Concretely, I replaced the [error here](https://github.com/huggingface/transformers/blob/master/src/transformers/data/data_collator.py#L255) with:

```
padding = inputs.new_ones((inputs.size(0), 1))*self.tokenizer.pad_token_id
inputs = torch.cat([inputs, padding], dim=1)
```

For comparison, the perplexity of BERT in this dataset is around 10.

Transformer Version: from master.

@patrickvonplaten @TevenLeScao @LysandreJik @shngt 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Potential bug in PLM training #6812

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Potential bug in PLM training #6812

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions