Open-source MCP server runs Qwen3 35B on rented Nosana GPUs for cheap bulk-text work
`qwen-nosana-mcp` is an open-source MCP server and CLI that lets Claude Code or Codex agents offload bulk-text tasks to a Qwen3.6 35B instance running on a rented Nosana GPU at roughly $1/hour, with no API key, no rate limits, and no per-token billing.
Score breakdown
The project offers a path to running a large open-weight model for bulk agentic coding tasks without per-token API costs, rate limits, or third-party data exposure, by pairing MCP with rented decentralized GPU compute.
- 01Routes bulk-text work from Claude Code or Codex to a self-hosted Qwen3.6 35B Q8_0 model
- 02Runs on a Nosana NVIDIA Pro 6000 Blackwell GPU at approximately $1/hour
- 03No Alibaba API key, no TOS-based content filtering, no rate limits, no per-token billing
The `qwen-nosana-mcp` project, published by u/Impressive-Owl3830 on r/mcp, provides an open-source MCP server and CLI that allows agentic coding tools like Claude Code and Codex to offload bulk-text workloads — long-document summarization, structured extraction, mass code generation, and translation of long documents — to a Qwen3.6 35B Q8_0 instance running on a Nosana NVIDIA Pro 6000 Blackwell GPU. The intended architecture positions a frontier model such as Sonnet 4.6 or GPT-5 as the "smart conductor" while Qwen3 handles the compute-heavy "cheap muscle" tasks at approximately $1/hour.
The project emphasizes a privacy-first design: because the model runs on a rented decentralized GPU rather than a third-party API, user prompts and data never leave that GPU.
The project emphasizes a privacy-first design: because the model runs on a rented decentralized GPU rather than a third-party API, user prompts and data never leave that GPU. There is no Alibaba API key requirement, no TOS-based content filtering, and no per-token billing. Setup requires obtaining a Nosana API key from `deploy.nosana.com` and exporting it as `NOSANA_API_KEY` in the shell environment. Built-in cost controls include a 60-minute default timeout capping spend at roughly $1, a 5-minute idle auto-stop, and a hard maximum of 4 hours per deployment.
Key facts
- 01Routes bulk-text work from Claude Code or Codex to a self-hosted Qwen3.6 35B Q8_0 model
- 02Runs on a Nosana NVIDIA Pro 6000 Blackwell GPU at approximately $1/hour
- 03No Alibaba API key, no TOS-based content filtering, no rate limits, no per-token billing
- 04Prompts and data stay on the rented decentralized GPU — never sent to a third-party API
- 05Built-in cost controls: 60-min default timeout (~$1 max), 5-min idle auto-stop, 4-hour hard cap
- 06Intended use cases include long-document summarization, structured extraction, mass code generation, and translation
- 07Setup requires a Nosana API key exported as `NOSANA_API_KEY` in the shell environment
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 14, 2026 · 09:08 UTC. How this works →