Rule Evals
Evaluation of whether explicit rules improve AI code generation output.
Methodology
- Model: Claude Opus 4.6 via Claude Code
- Approach: Compare baseline (no rule) vs extracted rule (concise "Rule for AI agents" version)
- Metric: Does providing the rule improve output compared to model's inherent knowledge?
- Code: eval suite on GitHub
Results
💪 in Eval Improved = Model already applies this pattern without explicit instruction