Description
We lack reliable tools to understand why an LLM behaves the way it does by looking "inside" it. Current interpretability methods often rely on questionable assumptions, and LLMs may not use human-like concepts, making them hard to understand. Explanations generated by these tools can also be misleading or unfaithful.