Author eval suite for skill `azure-naming-research`

## Skill

`azure-naming-research` — source: `.github/skills/azure-naming-research/SKILL.md`

## Scope

Author the eval suite at `.github/evals/azure-naming-research/`:

- [ ] `eval.yaml` — suite config (executor, model, graders)
- [ ] At least 2 positive tasks under `tasks/positive-*.yaml`
- [ ] At least 1 negative task under `tasks/negative-*.yaml` (off-topic / out-of-scope prompts)
- [ ] Entry added to `.github/evals/manifest.yaml` at `tier: expanded`

## Procedure

1. `/skill-bench azure-naming-research` drafts the suite from the live `SKILL.md`.
2. `waza run .github/evals/azure-naming-research/eval.yaml -v` locally — confirm all tasks resolve and produce a score.
3. `/skill-improve azure-naming-research` to iterate on graders.
4. Open PR.
5. Mock CI runs automatically. A maintainer will dispatch a real-model run before merge.

## Acceptance

- [ ] Suite runs cleanly in `mock` executor.
- [ ] At least one positive task passes in a real-model run.
- [ ] All negative tasks produce a refusal or out-of-scope acknowledgement.
- [ ] `manifest.yaml` entry added; PR description includes the real-model run summary.

## Conventions to follow

- Persona lock: refusal graders should accept the agent's own scope language, not require a specific phrase.
- Don't add `required_skills` to a `skill_invocation` grader unless the skill genuinely invokes those sub-skills.
- Prompt graders need `continue_session: true` in their grader config.

## Related

- Umbrella: #93
- Harness: #61


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Author eval suite for skill `azure-naming-research` #104

Skill

Scope

Procedure

Acceptance

Conventions to follow

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Author eval suite for skill azure-naming-research #104

Description

Skill

Scope

Procedure

Acceptance

Conventions to follow

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Author eval suite for skill `azure-naming-research` #104