🟠 High  |  Source: Schneier on Security


Researchers have demonstrated that large language models (LLMs) don’t truly separate system, user, and assistant roles internally — they recognise stylistic patterns rather than enforcing genuine trust boundaries. This makes prompt injection attacks a structural problem rather than a configuration one, as attackers can craft text that subtly shifts model behaviour without obvious malicious markers. The finding suggests that current defences based on role tags or prompt formatting are fundamentally insufficient.

Security Architect’s Take: Avoid treating system prompt separation as a security control for any LLM-integrated application; assume prompt injection is always possible and enforce validation, output filtering, and least-privilege tool access at the application layer rather than relying on the model itself to enforce boundaries.

Original advisory: Interesting Paper Exploring Prompt Injection