Leaked "Oceanus V1-P" scores 70/70 in coding and reasoning tests
AICodeKing tested a leaked model appearing as Oceanus V1-P (possibly the rumored "Mythos" model) using OpenCode across seven practical tasks, and it scored a perfect 70 out of 70.
Score breakdown
A leaked, unverified model called Oceanus V1-P outscored all other models tested — including Opus 4.8 and GPT-5.5 — by a wide margin on a diverse set of practical coding and reasoning tasks, though its true origin and stability remain unknown.
- 01Oceanus V1-P scored a perfect 70 out of 70 across seven practical tasks tested with OpenCode.
- 02Opus 4.8 was the next closest at 61 out of 70 (87.14%); all other models scored significantly lower.
- 03Other models tested: Opus 4.7 (39), GPT-5.5 (27), M3 (25), Gemini 3.5 Flash (24), DeepSeek V4 Pro (21), Mimo V2.5 Pro (14).
AICodeKing tested a leaked model that appeared as Oceanus V1-P when accessed via OpenCode, though many in the community are referring to it as the "Mythos" model. The video cautions that the model's actual identity is uncertain — it could be a private test model, a renamed model, or routed through another system. Across seven practical tasks covering coding, 3D visuals, SVG generation, game logic, combinatorics, and an agentic fine-tuning workflow, Oceanus V1-P achieved a perfect score of 70 out of 70. Opus 4.8 was the closest competitor at 61 out of 70, with all other tested models dropping off sharply.
Oceanus V1-P scored 10 out of 10 on every task where results were described.
The individual tasks included an elevator simulation (three elevators, multi-floor logic in a single HTML file), a Three.js 3D contact lens case with clickable opening caps, a folding table animation, an SVG of a panda eating a burger, a bow-and-arrow simulator game with targets, timing, scoring, and a leaderboard, a difficult combinatorics problem with a correct answer of 20,460 that most models failed, and a local fine-tuning workflow involving dataset generation, Gemma 2B fine-tuning, and a local web UI. Oceanus V1-P scored 10 out of 10 on every task where results were described. The model's real origin, stability, pricing, and final name remain unknown at the time of the video.
Key facts
- 01Oceanus V1-P scored a perfect 70 out of 70 across seven practical tasks tested with OpenCode.
- 02Opus 4.8 was the next closest at 61 out of 70 (87.14%); all other models scored significantly lower.
- 03Other models tested: Opus 4.7 (39), GPT-5.5 (27), M3 (25), Gemini 3.5 Flash (24), DeepSeek V4 Pro (21), Mimo V2.5 Pro (14).
- 04Tasks included a 3-elevator simulation, a Three.js 3D contact lens case, SVG generation, a bow-and-arrow game, and a combinatorics problem with the correct answer of 20,460.
- 05The model also completed an agentic workflow involving dataset generation, Gemma 2B fine-tuning, and a local web UI.
- 06The model's true identity, origin, stability, pricing, and final name are still unclear; it may be a private test or renamed model.
- 07The model is appearing on some API sites and is widely referred to as 'Mythos,' but displayed as Oceanus V1-P during testing.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 9, 2026 · 17:05 UTC. How this works →