-
Notifications
You must be signed in to change notification settings - Fork 30.3k
Description
There seems to be a bug in mask_tokens
method of DataCollatorForPermutationLanguageModeling
. Based on the comment, this line is supposed to compute mask for non-functional tokens, ie. anything but padding and special tokens. So there should be an OR between the padding_mask
and special_tokens_mask
, and not AND. For reference, the corresponding line in the original XLNet code also has an OR.
I should acknowledge that I haven't understood the permutation masking code properly yet. But raising an issue, because it seems wrong to me.
Besides the above problem, I'm also getting a very bad perplexity (296.0) on evaluating (w/o finetuning) xlnet-base-cased
PLM model on plain wikitext2 dataset (wiki.test.raw
). I've used XLNet example from here (without --do-train
flag) to get the perplexity.
The PLM code only works if the sequence lengths are even. To workaround this, I append a padding token when sequence length is odd. Concretely, I replaced the error here with:
padding = inputs.new_ones((inputs.size(0), 1))*self.tokenizer.pad_token_id
inputs = torch.cat([inputs, padding], dim=1)
For comparison, the perplexity of BERT in this dataset is around 10.
Transformer Version: from master.