What is Prompt Injection?

Prompt Injection is a class of attacks where untrusted instructions are introduced into an AI system’s context to override or manipulate intended behavior. Prompt Injection can occur through user input, retrieved content, or tool outputs.

Quick definition

Prompt Injection is when someone tricks an AI system by inserting malicious instructions into the prompt context.

How Prompt Injection works

Prompt Injection targets the model’s tendency to follow instructions in context.
Prompt Injection can be direct, through a user prompt, or indirect, through retrieved text in RAG.
Prompt Injection can attempt to bypass system prompt constraints.
Prompt Injection risk is reduced by strict input handling, separation of instructions and data, and tool permission controls.

Why Prompt Injection matters

Prompt Injection matters because Prompt Injection can cause data leakage, unsafe actions, or unreliable outputs.

Prompt Injection is especially relevant for systems that browse or retrieve external content.

Example use cases

A retrieved webpage containing instructions intended to alter the model’s output.
A user prompt that attempts to override system rules.
A RAG system that needs to treat retrieved text as data, not instructions.

What is Prompt Injection?

Quick definition

How Prompt Injection works

Why Prompt Injection matters

Example use cases

Related terms

Related Resources

Start Free Trial for AI Monitoring

AI Visibility Pricing Plans

Free AI Prompt Generator

AEO Strategy Guide

AI Visibility Blog Guides

Why ChatGPT Is Not Mentioning Your Brand