We use cookies to enhance your browsing experience, serve personalised ads or content, and analyse our traffic.
By clicking "I accept", you consent to our use of cookies.

Transformer Circuits, or How to Read the Mind of an LLM
Evento:
M11 - AI Explainability
Lingua:
Italiano

Tag

  • LLM

Speaker

Transformer Circuits, or How to Read the Mind of an LLM

What if we could step inside an LLM and watch it think in real time?

This talk distills the latest research from Anthropic, DeepMind, and OpenAI to present the current state of the art in LLM interpretability.

We’ll start with the modern interpretation of embeddings as sparse, monosemantic features living in high-dimensional space.
From there, we’ll explore emerging techniques such as circuit tracing and attribution graphs, and see how researchers reconstruct the computational pathways behind behaviors like multilingual reasoning, refusals, and hallucinations.

We’ll also look at new evidence suggesting that models may have limited forms of introspection—clarifying what they can, and crucially cannot, reliably report about their internal processes.

Finally, we’ll connect these “microscopic” insights to real engineering practice: how feature-level understanding can improve debugging, safety, and robustness in deployed AI systems, and where current methods still fall short.