W&B launches Serverless SFT for LLM post-training on CoreWeave
Weights & Biases introduced Serverless SFT, a managed fine-tuning service powered by CoreWeave that lets AI engineers run supervised fine-tuning and reinforcement learning in a unified workflow without managing infrastructure.
Score breakdown
Teams building agentic systems can now iterate between SFT and RL on managed CoreWeave infrastructure without manually shuttling model artifacts, cutting the operational overhead that typically delays getting fine-tuned agents into production.
- 01W&B Training Serverless SFT is powered by CoreWeave and targets AI engineers fine-tuning LLMs for agentic tasks.
- 02The service eliminates the need to manually move model checkpoints between SFT and RL systems.
- 03Engineers initiate runs by calling the open-source Agent Reinforcement Trainer (ART) API with a dataset and base model.
Weights & Biases introduced W&B Training Serverless SFT, a managed post-training service powered by CoreWeave, designed to remove infrastructure friction from the LLM fine-tuning loop. The video, presented by Russ from Weights & Biases, frames the problem around the growing difficulty of productionizing AI agents: while building demos has become easier, optimizing agents across accuracy, latency, cost, and safety remains hard. SFT combined with RL has become a standard approach for producing reliable, production-ready agents, but repeatedly cycling between the two techniques is operationally painful — teams must move model checkpoints and weights between disparate systems, impeding rapid iteration.
Serverless SFT addresses this by giving engineers instant access to CoreWeave GPU capacity with no provisioning or scaling overhead.
Serverless SFT addresses this by giving engineers instant access to CoreWeave GPU capacity with no provisioning or scaling overhead. The workflow starts with a call to the open-source Agent Reinforcement Trainer (ART) API, where engineers specify a dataset and a base model. Resulting LoRA adapters are saved directly to W&B Artifacts, served via W&B Inference, and evaluated using Weave Evaluations — all within the same platform. The video demonstrates the workflow using a coding agent with a planner and review agent architecture, fine-tuning a Qwen model via Serverless SFT. Engineers can then run serverless RL on top of the SFT checkpoint and repeat the SFT→RL cycle as many times as needed to hit production performance targets.
Key facts
- 01W&B Training Serverless SFT is powered by CoreWeave and targets AI engineers fine-tuning LLMs for agentic tasks.
- 02The service eliminates the need to manually move model checkpoints between SFT and RL systems.
- 03Engineers initiate runs by calling the open-source Agent Reinforcement Trainer (ART) API with a dataset and base model.
- 04Resulting LoRA adapters are saved directly to W&B Artifacts after each SFT run.
- 05Fine-tuned adapters can be served using W&B Inference and evaluated with Weave Evaluations.
- 06The demo fine-tunes a Qwen model using Serverless SFT as part of a coding agent workflow.
- 07The intended loop is: SFT → serve → collect traces → RL → repeat until production-ready.