Possibly Incorrect Calculation of Perplexity in Pytorch Implementation #131

shaan97 · 2021-03-18T00:00:25Z

First time ever posting an issue, apologies if I've written something incorrectly or missing obvious things.

On lines 82-88 of the file transformer-xl/pytorch/eval.py, the perplexity is being computed by computing the total loss and the total segment size.

mems = tuple()
        for idx, (data, target, seq_len) in enumerate(eval_iter):
            ret = model(data, target, *mems)
            loss, mems = ret[0], ret[1:]
            loss = loss.mean()
            total_loss += seq_len * loss.item()
            total_len += seq_len

Rather than adding to the total loss the term loss.sum(), the implementation instead multiplies the mean by seq_len. However when computing loss, there should only seq_len - 1 losses in the output of the model (in language modeling you predict the next token based on the previous tokens, so this excludes computing a loss value for the very first token).

(Compare this against the TF implementation in file transformer-xl/tf/train_gpu.py

  if len(tower_losses) > 1:
    loss = tf.add_n(tower_losses) / len(tower_losses)
  else:
    loss = tower_losses[0]

Here this issue is avoided because all losses are appended into a list tower_losses and then summed over and divided by the length of that list.)

This is subtle because it will make your perplexity value seem correct, but in actuality your perplexity computation is pretending to include one extra term. I think this is the correct implementation:

mems = tuple()
        for idx, (data, target, seq_len) in enumerate(eval_iter):
            ret = model(data, target, *mems)
            loss, mems = ret[0], ret[1:]
            loss = loss.mean()
            total_loss += (seq_len - 1) * loss.item()
            total_len += seq_len - 1

Or

mems = tuple()
        for idx, (data, target, seq_len) in enumerate(eval_iter):
            ret = model(data, target, *mems)
            loss, mems = ret[0], ret[1:]
            total_loss += loss.sum().item()
            total_len += seq_len - 1

Is this a bug? Am I missing something? Thanks!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possibly Incorrect Calculation of Perplexity in Pytorch Implementation #131

Possibly Incorrect Calculation of Perplexity in Pytorch Implementation #131

shaan97 commented Mar 18, 2021 •

edited

Loading

Possibly Incorrect Calculation of Perplexity in Pytorch Implementation #131

Possibly Incorrect Calculation of Perplexity in Pytorch Implementation #131

Comments

shaan97 commented Mar 18, 2021 • edited Loading

shaan97 commented Mar 18, 2021 •

edited

Loading