Skip to content

Prompt Injection versus Jailbreaking Definitions

Updated: at 04:32 AM

I appreciate Simon Willison’s clear distinction between prompt injection and jailbreaking, so I will keep it here for future reference.

Prompt injection is a class of attacks against applications built on top of Large Language Models (LLMs) that work by concatenating untrusted user input with a trusted prompt constructed by the application’s developer.

Jailbreaking is the class of attacks that attempt to subvert safety filters built into the LLMs themselves.

Crucially: if there’s no concatenation of trusted and untrusted strings, it’s not prompt injection. That’s why I called it prompt injection in the first place: it was analogous to SQL injection, where untrusted user input is concatenated with trusted SQL code.

via