W&B launches Serverless SFT for LLM post-training on CoreWeave
Weights & Biases introduced Serverless SFT, a managed fine-tuning service powered by CoreWeave that lets AI engineers run supervised fine-tuning and reinforcement learning in a unified workflow without managing infrastructure.
Score breakdown
Teams iterating between SFT and RL can now run the full post-training loop — fine-tuning, evaluation, inference, and RL — inside a single W&B platform, cutting the infrastructure overhead that typically delays getting agents to production.
- 01W&B Training Serverless SFT is powered by CoreWeave GPU infrastructure, with provisioning and scaling handled automatically.
- 02The service targets the SFT-to-RL iteration loop, eliminating the need to shuttle model checkpoints between separate systems.
- 03Engineers initiate fine-tuning by calling the open-source Agent Reinforcement Trainer (ART) API with a dataset and base model.
Weights & Biases introduced Serverless SFT, a managed post-training service powered by CoreWeave, designed to remove infrastructure friction from the iterative SFT-and-RL loop that production AI teams rely on. The core problem the service addresses is that alternating between supervised fine-tuning and reinforcement learning typically means moving model checkpoints and weights across different systems, creating delays that slow optimization and push back time to market. By unifying both post-training techniques on a single platform with instant access to CoreWeave GPU capacity, W&B Training handles provisioning, scaling, and optimization automatically.
The workflow begins by calling the open-source Agent Reinforcement Trainer (ART) API with a specified dataset and base model — the video demonstrates fine-tuning a Qwen model.
The workflow begins by calling the open-source Agent Reinforcement Trainer (ART) API with a specified dataset and base model — the video demonstrates fine-tuning a Qwen model. Resulting LoRA adapters are saved directly to W&B Artifacts, served via W&B Inference, and evaluated using Weave Evaluations during and after the SFT run. Engineers can then run serverless RL on top of the fine-tuned checkpoint, collect traces in Weave Playground, and repeat the SFT-RL cycle as many times as needed before moving an agent to production. The demonstration uses a coding agent with a planner-and-review architecture to illustrate the end-to-end flow.
Key facts
- 01W&B Training Serverless SFT is powered by CoreWeave GPU infrastructure, with provisioning and scaling handled automatically.
- 02The service targets the SFT-to-RL iteration loop, eliminating the need to shuttle model checkpoints between separate systems.
- 03Engineers initiate fine-tuning by calling the open-source Agent Reinforcement Trainer (ART) API with a dataset and base model.
- 04Resulting LoRA adapters are saved directly to W&B Artifacts after each SFT run.
- 05Fine-tuned models can be served using W&B Inference and tested in the Weave Playground.
- 06Weave Evaluations can be run during SFT to monitor model performance in real time.
- 07A Qwen model fine-tuning is demonstrated as a concrete example in the video.