Hugging Face tutorial: fine-tuning a coding agent with SFT and TRL
Hugging Face published a live tutorial walking through the first step of an agentic post-training workflow — building a supervised fine-tuning baseline from public coding-agent traces using TRL and LoRA.
Score breakdown
The tutorial provides a concrete, reproducible starting point for the agentic post-training workflow — SFT from agent traces — before the more complex GRPO and environment RL stages that follow in the series.
- 01Session 1 of the 'Training Agents' series covers building an SFT baseline from public coding-agent traces.
- 02The workflow converts agent traces into prompt/completion training data using completion-only loss for chat/tool traces.
- 03Fine-tuning is run with TRL + LoRA on Hugging Face Jobs.
Hugging Face published the first session of its "Training Agents" live tutorial series, which focuses on using coding agents to design, run, monitor, and review post-training experiments — while simultaneously training models to become better agents. Session 1 targets the foundational rung of that workflow: establishing a supervised fine-tuning baseline. The tutorial walks through taking publicly available coding-agent traces, transforming them into prompt/completion pairs, and applying completion-only loss suited to chat and tool traces before running a small TRL + LoRA fine-tune on Hugging Face Jobs.
The series frames SFT as the necessary first step before advancing to more complex post-training methods such as GRPO or environment-based reinforcement learning.
The session also covers practical experiment hygiene — specifically how to keep runs reproducible without committing logs or checkpoints to version control — and critically examines what the first evaluation metrics do and do not demonstrate. The series frames SFT as the necessary first step before advancing to more complex post-training methods such as GRPO or environment-based reinforcement learning. Supporting code is available in the companion repository at `https://github.com/burtenshaw/training-agents`.
Key facts
- 01Session 1 of the 'Training Agents' series covers building an SFT baseline from public coding-agent traces.
- 02The workflow converts agent traces into prompt/completion training data using completion-only loss for chat/tool traces.
- 03Fine-tuning is run with TRL + LoRA on Hugging Face Jobs.
- 04The tutorial explains why SFT is the starting point before GRPO or environment-based RL.
- 05Experiment reproducibility without checking in logs or checkpoints is a covered topic.
- 06The session discusses what early eval metrics can and cannot prove.
- 07Companion code is available at the GitHub repo `burtenshaw/training-agents`.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 16, 2026 · 23:11 UTC. How this works →