Description
LLMs are vulnerable to adversarial inputs where users can bypass safety restrictions. This can involve "jailbreaking" the model creator's restrictions, or "prompt injection" where an application developer's instructions are overridden, sometimes by a third party through data the LLM processes. There are no robust ways to separate instructions from data within an LLM's input, making these attacks particularly hard to prevent.