The first Anthropic-trained AI system to qualify as ASL-3 qualifies because of misuse risks rather than autonomy risks

1kṀ2544

10000

92%

chance

ALL

This will be evaluated according to the AI Safety Levels (ASL) standard v1.0 defined by Anthropic here, which gives two different ways that an AI system could qualify for AI Safety Level 3 (ASL-3). This resolves based on the first clear public disclosure by Anthropic that indicates that they have trained a model and found it to qualify for ASL-3.

If Anthropic announces a policy that would prevent this information from being disclosed, announces that it has permanently ceased developing new AI systems, or ceases to operate, this will resolve N/A after six months.

Update 2025-05-01 (PST): - Additional Resolution Criteria:
- Anthropic must claim to have passed their CBRN threshold before passing their AI R&D or 2–8h software engineering thresholds, aligning with the criteria used for the previous autonomy risks category. (AI summary of creator comment)

AI Safety

Anthropic

Anthropic RSP

Get

1,000

to start trading!

3 Comments

22 Holders

29 Trades

Sort by:

bought Ṁ500 YES

https://www.anthropic.com/news/activating-asl3-protections Does this

resolve?

Activating AI Safety Level 3 protections

We have activated the AI Safety Level 3 (ASL-3) Deployment and Security Standards described in Anthropic’s Responsible Scaling Policy (RSP) in conjunction with launching Claude Opus 4. The ASL-3 Security Standard involves increased internal security measures that make it harder to steal model weights, while the corresponding Deployment Standard covers a narrowly targeted set of deployment measures designed to limit the risk of Claude being misused specifically for the development or acquisition of chemical, biological, radiological, and nuclear (CBRN) weapons.

@Thomas42 This is a bit dicey, but I think not yet:
To be clear, we have not yet determined whether Claude Opus 4 has definitively passed the Capabilities Threshold that requires ASL-3 protections.

It looks like the RSP has been restructured a bit, but the evals persist more or less as is AFAICT. I'll resolve this yes if Anthropic claims to have passed their CBRN threshold before passing their AI R&D or 2–8h software engineering thresholds, which seem to match what was used for the old autonomy risks category.