What is Prompt Injection?

Prompt Injection is a class of attacks where untrusted instructions are introduced into an AI system’s context to override or manipulate intended behavior. Prompt Injection can occur through user input, retrieved content, or tool outputs.

Quick definition

Prompt Injection is when someone tricks an AI system by inserting malicious instructions into the prompt context.

How Prompt Injection works

  • Prompt Injection targets the model’s tendency to follow instructions in context.
  • Prompt Injection can be direct, through a user prompt, or indirect, through retrieved text in RAG.
  • Prompt Injection can attempt to bypass system prompt constraints.
  • Prompt Injection risk is reduced by strict input handling, separation of instructions and data, and tool permission controls.

Why Prompt Injection matters

Prompt Injection matters because Prompt Injection can cause data leakage, unsafe actions, or unreliable outputs.

Prompt Injection is especially relevant for systems that browse or retrieve external content.

Example use cases

  • A retrieved webpage containing instructions intended to alter the model’s output.
  • A user prompt that attempts to override system rules.
  • A RAG system that needs to treat retrieved text as data, not instructions.

Related terms