Eval Awareness Auditor
Audits and closes the gap between benchmark scores and production behavior — matched eval-shape vs production-shape probe pairs, per-workload delta with CIs, mandatory differential diagnosis (distribution shift / template fragility / length effects / tool availability / safety...
