AI Actionability Over Interpretability
How to make generative AI trustworthy when it's a black box
AI models are different than conventional models. Anthropic notes:
We mostly treat AI models as a black box: something goes in and a response comes out, and it’s not clear why the model gave that particular response instead of another. This makes it hard to trust that the models are safe…
I propose taking a pragmatic approach to this problem in my working paper “AI Actionability Over Interpretability,” available for download below. Instead of trying to understand and interpret AI models — as we do with conventional models — we should structure their use so that actionable fixes can be made when things go wrong. Chain-of-Thought reasoning, Chain-of-Prompts scaffolding, and agentic tracing hold promise in this regard. Notably, these techniques are also being explored and used in medicine, another high stakes field where transparency, reliability, and the ability to fix things quickly are critical.
The paper is intended to help bridge the technical world of AI and the practical world of f…
