Anthropic details election safeguards for Claude ahead of US midterms
Anthropic published an update on Claude's election-related safeguards, including bias evaluations, policy enforcement, and new tests for autonomous influence operations ahead of the US midterms and other major global elections.
Score breakdown
Practitioners building on Claude for civic or political applications should note the published evaluation methodology and open-source dataset, which provide a replicable framework for assessing political bias and election-policy compliance in AI models.
- 01Claude Opus 4.7 and Sonnet 4.6 scored 95% and 96%, respectively, on political impartiality evaluations run before each model launch.
- 02A 600-prompt election policy test suite (300 harmful + 300 legitimate requests) found Opus 4.7 and Sonnet 4.6 responded appropriately 100% and 99.8% of the time.
- 03In multi-turn influence operation simulations, Sonnet 4.6 and Opus 4.7 responded appropriately 90% and 94% of the time.
Anthropic's update describes a multi-layered approach to keeping Claude safe and impartial around elections. Political neutrality is embedded through character training — where the model is rewarded for balanced, equal-depth engagement across the political spectrum — and reinforced via system prompts on Claude.ai. Before each model launch, Anthropic runs evaluations measuring how consistently and impartially Claude handles politically charged prompts; Claude Opus 4.7 and Sonnet 4.6 scored 95% and 96%, respectively, on these benchmarks. The evaluation methodology and open-source dataset have been published for external replication, and Anthropic is collaborating with The Future of Free Speech (an independent think tank at Vanderbilt University), the Foundation for American Innovation, and the Collective Intelligence Project on a broader review of model behaviors around freedom of expression.
A 600-prompt test suite — comprising 300 harmful requests paired with 300 legitimate ones — found that Claude Opus 4.7 and Sonnet 4.6 responded appropriately 100% and 99.8% of the time.
On the enforcement side, Anthropic's Usage Policy prohibits using Claude to run deceptive political campaigns, create fake digital content to influence political discourse, commit voter fraud, interfere with voting systems, or spread misleading information about voting processes. A 600-prompt test suite — comprising 300 harmful requests paired with 300 legitimate ones — found that Claude Opus 4.7 and Sonnet 4.6 responded appropriately 100% and 99.8% of the time. Resistance to influence operations, tested via multi-turn simulated conversations mimicking real adversarial tactics, showed Sonnet 4.6 and Opus 4.7 responding appropriately 90% and 94% of the time. Ahead of launching Mythos Preview and Opus 4.7, Anthropic also introduced a novel test category: whether models can autonomously plan and execute a multi-step influence operation without human prompting. With safeguards in place, the latest models refused nearly every such task.
Key facts
- 01Claude Opus 4.7 and Sonnet 4.6 scored 95% and 96%, respectively, on political impartiality evaluations run before each model launch.
- 02A 600-prompt election policy test suite (300 harmful + 300 legitimate requests) found Opus 4.7 and Sonnet 4.6 responded appropriately 100% and 99.8% of the time.
- 03In multi-turn influence operation simulations, Sonnet 4.6 and Opus 4.7 responded appropriately 90% and 94% of the time.
- 04Anthropic tested for the first time whether models can autonomously plan and run end-to-end influence operations; the latest models refused nearly every task.