Trace Commons launches to crowdsource open CC-BY-4.0 coding agent datasets
A Reddit user launched Trace Commons, an initiative to collect donated coding agent session traces into an open CC-BY-4.0 dataset so open-weight and open-source models can train on the same kind of data that Anthropic and OpenAI collect from Claude Code and Codex usage.
Score breakdown
If successful, Trace Commons would give open-weight and open-source model labs access to real-world agentic coding interaction data that is currently accumulating exclusively within Anthropic and OpenAI's proprietary pipelines.
- 01u/mon-simas launched the initiative, called Trace Commons, on r/LocalLLaMA.
- 02The dataset is licensed under CC-BY-4.0, making it openly usable for model training.
- 03The project is hosted at trace-commons-web.hf.space.
u/mon-simas posted to r/LocalLLaMA announcing the launch of Trace Commons, a grassroots initiative aimed at building an open, CC-BY-4.0-licensed dataset of coding agent session traces. The project is hosted at trace-commons-web.hf.space and is soliciting donations of coding agent traces from the community.
The stated motivation is a concern that Anthropic and OpenAI are accumulating proprietary training data at scale through their respective coding agent products, Claude Code and Codex.
The stated motivation is a concern that Anthropic and OpenAI are accumulating proprietary training data at scale through their respective coding agent products, Claude Code and Codex. The post argues this dynamic risks creating an oligopoly in which only those companies' models benefit from real-world agentic coding interaction data, while open-weight and open-source model developers are left without access to comparable datasets. Trace Commons is positioned as a counter-initiative to make that class of data openly available to any model lab.
Key facts
- 01u/mon-simas launched the initiative, called Trace Commons, on r/LocalLLaMA.
- 02The dataset is licensed under CC-BY-4.0, making it openly usable for model training.
- 03The project is hosted at trace-commons-web.hf.space.
- 04The goal is to collect donated coding agent session traces from community members.
- 05The motivation is concern that Anthropic and OpenAI hold proprietary coding-session data from Claude Code and Codex usage.
- 06The post argues this data asymmetry could create an oligopoly disadvantaging open-weight and open-source models.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 16, 2026 · 23:11 UTC. How this works →