Skip to content

Potential bug in PLM training #6812

@HarshTrivedi

Description

@HarshTrivedi

There seems to be a bug in mask_tokens method of DataCollatorForPermutationLanguageModeling. Based on the comment, this line is supposed to compute mask for non-functional tokens, ie. anything but padding and special tokens. So there should be an OR between the padding_mask and special_tokens_mask, and not AND. For reference, the corresponding line in the original XLNet code also has an OR.

I should acknowledge that I haven't understood the permutation masking code properly yet. But raising an issue, because it seems wrong to me.


Besides the above problem, I'm also getting a very bad perplexity (296.0) on evaluating (w/o finetuning) xlnet-base-cased PLM model on plain wikitext2 dataset (wiki.test.raw). I've used XLNet example from here (without --do-train flag) to get the perplexity.

The PLM code only works if the sequence lengths are even. To workaround this, I append a padding token when sequence length is odd. Concretely, I replaced the error here with:

padding = inputs.new_ones((inputs.size(0), 1))*self.tokenizer.pad_token_id
inputs = torch.cat([inputs, padding], dim=1)

For comparison, the perplexity of BERT in this dataset is around 10.

Transformer Version: from master.

@patrickvonplaten @TevenLeScao @LysandreJik @shngt

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions