Label Smoothing in NLLB gives ValueError: You have to specify either decoder_input_ids or decoder_inputs_embeds

Shamus · November 1, 2022, 10:27am

Hi. I am fine-tuning a NLLB model with the Trainer API. When I am am specifying the parameter label_smoothing_factor to some value in hf.json, I get an error like this. If the error shown is correct, how do I specify decoder_input_ids or decoder_inputs_embeds?

  File "run_translation.py", line 849, in <module>
    main()
  File "run_translation.py", line 753, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/root/.clearml/venvs-builds/3.8/task_repository/hf-translation.git/.venv/lib/python3.8/site-packages/transformers/trainer.py", line 1498, in train
    return inner_training_loop(
  File "/root/.clearml/venvs-builds/3.8/task_repository/hf-translation.git/.venv/lib/python3.8/site-packages/transformers/trainer.py", line 1740, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/root/.clearml/venvs-builds/3.8/task_repository/hf-translation.git/.venv/lib/python3.8/site-packages/transformers/trainer.py", line 2470, in training_step
    loss = self.compute_loss(model, inputs)
  File "/root/.clearml/venvs-builds/3.8/task_repository/hf-translation.git/.venv/lib/python3.8/site-packages/transformers/trainer.py", line 2502, in compute_loss
    outputs = model(**inputs)
  File "/root/.clearml/venvs-builds/3.8/task_repository/hf-translation.git/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/.clearml/venvs-builds/3.8/task_repository/hf-translation.git/.venv/lib/python3.8/site-packages/transformers/models/m2m_100/modeling_m2m_100.py", line 1315, in forward
    outputs = self.model(
  File "/root/.clearml/venvs-builds/3.8/task_repository/hf-translation.git/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/.clearml/venvs-builds/3.8/task_repository/hf-translation.git/.venv/lib/python3.8/site-packages/transformers/models/m2m_100/modeling_m2m_100.py", line 1206, in forward
    decoder_outputs = self.decoder(
  File "/root/.clearml/venvs-builds/3.8/task_repository/hf-translation.git/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/.clearml/venvs-builds/3.8/task_repository/hf-translation.git/.venv/lib/python3.8/site-packages/transformers/models/m2m_100/modeling_m2m_100.py", line 985, in forward
    raise ValueError("You have to specify either decoder_input_ids or decoder_inputs_embeds")
ValueError: You have to specify either decoder_input_ids or decoder_inputs_embeds

binwang · May 30, 2023, 5:02am

I met the same issue. Have you resolved it?

Bendang · July 22, 2025, 10:31am

I couldnt get it running too. So i did a workaround with this.

class SmoothingSeq2SeqTrainer(Seq2SeqTrainer):
    def __init__(self, *args, smoothing=0.1, **kwargs):
        super().__init__(*args, **kwargs)
        self.smoothing = smoothing

    def compute_loss(self, model, inputs, return_outputs=False, num_items_in_batch=None):
        labels = inputs["labels"]
        outputs = model(**inputs)
        logits = outputs.logits

        # Standard shift for sequence-to-sequence models
        shift_logits = logits[..., :-1, :].contiguous()
        shift_labels = labels[..., 1:].contiguous()

        loss_fct = CrossEntropyLoss(
            label_smoothing=self.smoothing,
            ignore_index=self.processing_class.pad_token_id
        )
        loss = loss_fct(
            shift_logits.view(-1, shift_logits.size(-1)),
            shift_labels.view(-1)
        )
        return (loss, outputs) if return_outputs else loss

trainer = SmoothingSeq2SeqTrainer(
    smoothing=0.1,
    ...
)

Bendang · July 23, 2025, 2:40pm

I believe I found the root cause of the issue: the NLLB model doesn’t automatically generate decoder_input_ids during data collation. Without them, training won’t work as expected with label_smoothing_factor .

To fix this, I created a custom data collator that manually generates decoder_input_ids by shifting the labels to the right — as is standard in seq2seq training. Here’s how you can implement it:

from transformers.data.data_collator import DataCollatorForSeq2Seq
class ManualShiftDataCollator(DataCollatorForSeq2Seq):
    def __call__(self, features, return_tensors=None):
        batch = super().__call__(features, return_tensors)

        if "labels" in batch:
            labels = batch["labels"]
            
            decoder_start_token_id = self.model.config.decoder_start_token_id
            decoder_input_ids = labels.clone()
            
            # Shift the entire tensor to the right.
            decoder_input_ids[:, 1:] = labels[:, :-1]
            
            # Set the first token of each sequence to the decoder_start_token_id.
            decoder_input_ids[:, 0] = decoder_start_token_id
            
            # Replace any -100 that got shifted with the actual padding token ID.
            decoder_input_ids.masked_fill_(decoder_input_ids == -100, self.tokenizer.pad_token_id)

            batch["decoder_input_ids"] = decoder_input_ids
        return batch

# then replace your collator with this
data_collator = ManualShiftDataCollator(
    tokenizer=tokenizer,
    model=model,
    label_pad_token_id=tokenizer.pad_token_id
    pad_to_multiple_of=8,
    return_tensors="pt"
)

Using this collator fixed the root problem for me — the model now trains properly and shows good convergence.

Note: Please disregard the custom label-smoothing loss I mentioned above (SmoothingSeq2SeqTrainer). In my experience, it caused the model to converge very slowly and plateau at a high loss. The proper fix is to use this custom data collator instead.

Hope this helps

Topic		Replies	Views
ValueError in finetuning NLLB Beginners	0	640	April 4, 2023
Fine tuning nllb model Beginners	0	819	February 1, 2023
Fine-tuning NLLB model Models	1	2829	July 20, 2023
Error in Seq2SeqTrainingArguments 🤗Transformers	3	981	May 30, 2023
Popping `inputs[labels]` when self.label_smoother is not None (in trainer.py) Beginners	2	1358	November 11, 2021

Label Smoothing in NLLB gives ValueError: You have to specify either decoder_input_ids or decoder_inputs_embeds

Related topics