Hi. I am fine-tuning a NLLB model with the Trainer API. When I am am specifying the parameter label_smoothing_factor to some value in hf.json, I get an error like this. If the error shown is correct, how do I specify decoder_input_ids or decoder_inputs_embeds?
File "run_translation.py", line 849, in <module>
main()
File "run_translation.py", line 753, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/root/.clearml/venvs-builds/3.8/task_repository/hf-translation.git/.venv/lib/python3.8/site-packages/transformers/trainer.py", line 1498, in train
return inner_training_loop(
File "/root/.clearml/venvs-builds/3.8/task_repository/hf-translation.git/.venv/lib/python3.8/site-packages/transformers/trainer.py", line 1740, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/root/.clearml/venvs-builds/3.8/task_repository/hf-translation.git/.venv/lib/python3.8/site-packages/transformers/trainer.py", line 2470, in training_step
loss = self.compute_loss(model, inputs)
File "/root/.clearml/venvs-builds/3.8/task_repository/hf-translation.git/.venv/lib/python3.8/site-packages/transformers/trainer.py", line 2502, in compute_loss
outputs = model(**inputs)
File "/root/.clearml/venvs-builds/3.8/task_repository/hf-translation.git/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/root/.clearml/venvs-builds/3.8/task_repository/hf-translation.git/.venv/lib/python3.8/site-packages/transformers/models/m2m_100/modeling_m2m_100.py", line 1315, in forward
outputs = self.model(
File "/root/.clearml/venvs-builds/3.8/task_repository/hf-translation.git/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/root/.clearml/venvs-builds/3.8/task_repository/hf-translation.git/.venv/lib/python3.8/site-packages/transformers/models/m2m_100/modeling_m2m_100.py", line 1206, in forward
decoder_outputs = self.decoder(
File "/root/.clearml/venvs-builds/3.8/task_repository/hf-translation.git/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/root/.clearml/venvs-builds/3.8/task_repository/hf-translation.git/.venv/lib/python3.8/site-packages/transformers/models/m2m_100/modeling_m2m_100.py", line 985, in forward
raise ValueError("You have to specify either decoder_input_ids or decoder_inputs_embeds")
ValueError: You have to specify either decoder_input_ids or decoder_inputs_embeds
I believe I found the root cause of the issue: the NLLB model doesn’t automatically generate decoder_input_ids during data collation. Without them, training won’t work as expected with label_smoothing_factor .
To fix this, I created a custom data collator that manually generates decoder_input_ids by shifting the labels to the right — as is standard in seq2seq training. Here’s how you can implement it:
from transformers.data.data_collator import DataCollatorForSeq2Seq
class ManualShiftDataCollator(DataCollatorForSeq2Seq):
def __call__(self, features, return_tensors=None):
batch = super().__call__(features, return_tensors)
if "labels" in batch:
labels = batch["labels"]
decoder_start_token_id = self.model.config.decoder_start_token_id
decoder_input_ids = labels.clone()
# Shift the entire tensor to the right.
decoder_input_ids[:, 1:] = labels[:, :-1]
# Set the first token of each sequence to the decoder_start_token_id.
decoder_input_ids[:, 0] = decoder_start_token_id
# Replace any -100 that got shifted with the actual padding token ID.
decoder_input_ids.masked_fill_(decoder_input_ids == -100, self.tokenizer.pad_token_id)
batch["decoder_input_ids"] = decoder_input_ids
return batch
# then replace your collator with this
data_collator = ManualShiftDataCollator(
tokenizer=tokenizer,
model=model,
label_pad_token_id=tokenizer.pad_token_id
pad_to_multiple_of=8,
return_tensors="pt"
)
Using this collator fixed the root problem for me — the model now trains properly and shows good convergence.
Note: Please disregard the custom label-smoothing loss I mentioned above (SmoothingSeq2SeqTrainer). In my experience, it caused the model to converge very slowly and plateau at a high loss. The proper fix is to use this custom data collator instead.