This will be evaluated according to the AI Safety Levels (ASL) standard v1.0 defined by Anthropic here, which gives two different ways that an AI system could qualify for AI Safety Level 3 (ASL-3). This resolves based on the first clear public disclosure by Anthropic that indicates that they have trained a model and found it to qualify for ASL-3.
If Anthropic announces a policy that would prevent this information from being disclosed, announces that it has permanently ceased developing new AI systems, or ceases to operate, this will resolve N/A after six months.
Update 2025-05-01 (PST): - Additional Resolution Criteria:
Anthropic must claim to have passed their CBRN threshold before passing their AI R&D or 2–8h software engineering thresholds, aligning with the criteria used for the previous autonomy risks category. (AI summary of creator comment)
It looks like the RSP has been restructured a bit, but the evals persist more or less as is AFAICT. I'll resolve this yes if Anthropic claims to have passed their CBRN threshold before passing their AI R&D or 2–8h software engineering thresholds, which seem to match what was used for the old autonomy risks category.