What is a large language model (LLM)?

A large language model (LLM) is a type of AI trained on vast amounts of text data to predict and generate human language. Models like GPT-4, Claude, and Gemini are trained on hundreds of billions of words and use a neural network architecture called a transformer, which processes all words in a sequence simultaneously rather than one at a time. LLMs learn statistical patterns across language and can generate coherent text, answer questions, write code, summarise documents, and follow complex instructions — without being explicitly programmed for each task.

What are the biggest security risks when deploying LLMs in enterprise environments?

The five most significant security risks for enterprise LLM deployments are: prompt injection attacks where malicious content overrides system instructions (OWASP LLM01); sensitive data leakage where the model reveals training data, system prompts, or user data in its outputs (LLM06); insecure plugin and tool integration where LLM agents with access to APIs, databases, and file systems can be manipulated to perform unauthorised actions (LLM07/LLM08); supply chain vulnerabilities in third-party model weights and datasets; and model denial-of-service via resource-exhausting inputs. These risks require controls that differ fundamentally from traditional application security.

What is the OWASP LLM Top 10?

The OWASP LLM Top 10 is a reference framework for the most critical security risks specific to large language model applications. The top risks are prompt injection (LLM01), insecure output handling (LLM02), training data poisoning (LLM03), model denial of service (LLM04), supply chain vulnerabilities (LLM05), sensitive information disclosure (LLM06), insecure plugin design (LLM07), excessive agency (LLM08), overreliance (LLM09), and model theft (LLM10). It was first published by OWASP in 2023 and is the primary framework used by security teams to assess LLM-based applications.

Do I need to understand AI to secure it as a security architect?

You need to understand enough about how LLMs work to reason about their attack surface — you do not need to be able to train one. Specifically: understanding that LLMs process inputs as tokens and can be manipulated through those inputs (prompt injection); that LLMs generate outputs probabilistically and can leak training data; and that AI agents with tool access behave as autonomous IAM principals. You can map these risks to controls you already understand: input validation, output filtering, least-privilege IAM, and audit logging. The AI-specific threat model extends your existing cloud security knowledge rather than replacing it.

A Beginner's Guide to AI and Large Language Models — and Why Security Architects Should Care

Artificial intelligence has moved from research curiosity to enterprise infrastructure in under five years. If you are a cloud security architect who has not yet had to secure an LLM-powered application, you almost certainly will within the next twelve months. This guide explains what large language models are, how they are built, and — critically — what the security risks are when you deploy, integrate, or expose them in your organisation.

No prior knowledge of machine learning is assumed. By the end you should understand enough to have an informed conversation with an AI engineering team, assess the risk surface of an LLM deployment, and apply the controls the industry has coalesced around.

What Is Artificial Intelligence?

Artificial intelligence is a broad term for software systems that perform tasks we would traditionally associate with human reasoning — understanding language, recognising images, making decisions, solving problems. The field has existed since the 1950s but remained largely academic until two things converged: enormous datasets became available (much of the internet), and the hardware to process them (GPUs, TPUs) became cheap enough to use at scale.

Modern AI is almost entirely based on a technique called machine learning — rather than programming a system with explicit rules, you show it millions of examples and let it learn patterns from the data. The model does not follow a rule book. It develops internal representations of the world through statistical patterns in what it has seen.

Within machine learning, deep learning uses networks of mathematical nodes loosely inspired by neurons in the brain — hence “neural networks”. These networks can learn extremely complex patterns given enough data and compute. Modern large language models are deep learning systems, and they are currently the most commercially significant form of AI.

What Is a Large Language Model?

A large language model (LLM) is a type of AI system trained on vast quantities of text — books, websites, code, scientific papers, forums — with one goal: predict the next word (or more precisely, the next token, a chunk of text) given what came before.

That sounds almost trivially simple. The remarkable thing is what emerges when you train a large enough network on a large enough dataset. The model does not merely memorise text. It develops internal representations of grammar, facts, reasoning patterns, and something resembling world knowledge. By the time a frontier model like GPT-4, Claude, or Gemini is trained, it can write code, summarise legal documents, explain complex topics, answer questions, and carry multi-turn conversations — all from that single training objective of “predict the next token”.

The “large” in LLM refers to the scale: frontier models have hundreds of billions of internal parameters (the numerical weights that encode learned knowledge), trained on trillions of tokens of text, using thousands of specialised chips running for months. The compute cost of training a frontier model is measured in tens to hundreds of millions of dollars.

How Are LLMs Built? The Training Pipeline

Understanding the build process helps you understand both the capabilities and the failure modes. LLM development typically has three stages.

Stage 1: Pre-training

The model is initialised with random weights and then exposed to a huge dataset — often several trillion tokens scraped from the public internet, books, code repositories, and licensed text corpora. During pre-training, the model repeatedly predicts the next token, compares its prediction to what actually came next, measures the error, and adjusts its weights slightly to do better next time. This process — called backpropagation — happens billions of times across the entire dataset.

Pre-training is the most expensive phase. It is also where the model acquires most of its knowledge and capability. The resulting model is called a base model or foundation model. It is good at completing text but has no particular sense of being helpful or following instructions — it will happily continue a harmful sentence if that is what the statistics suggest comes next.

Stage 2: Fine-tuning and Instruction Tuning

The base model is further trained on a smaller, curated dataset of (prompt, ideal response) pairs. This teaches the model to follow instructions rather than just complete text. The model learns that when someone asks a question, the appropriate next tokens are an answer — not more of the question.

This phase is far cheaper than pre-training because the dataset is small (thousands to hundreds of thousands of examples rather than trillions of tokens) and the model’s weights are already close to useful.

Stage 3: Reinforcement Learning from Human Feedback (RLHF)

In the final phase, human raters compare pairs of model responses and indicate which is better — more accurate, more helpful, safer, less harmful. These preferences train a separate model called a reward model, which learns to score responses the way the human raters would. The LLM is then further tuned to maximise that reward score.

RLHF is what makes modern LLMs feel conversational and helpful rather than erratic. It is also a primary mechanism for safety alignment — teaching the model to decline harmful requests. The limitations of this alignment are a significant security consideration, which we will return to.

Retrieval-Augmented Generation (RAG)

Many enterprise LLM deployments add a fourth component: retrieval-augmented generation. Rather than relying solely on knowledge baked into the model’s weights at training time, a RAG system retrieves relevant documents from an external knowledge base (a vector database, a document store, an internal wiki) and injects them into the prompt. This gives the model access to current, organisation-specific information it was never trained on.

RAG is now the dominant pattern for enterprise LLM applications. Its security implications are significant — the retrieval pipeline, the vector database, and the injected context are all attack surfaces.

The Transformer Architecture

All major LLMs use a neural network design called the transformer, introduced by Google researchers in 2017. You do not need to understand the mathematics, but a high-level mental model is useful.

The transformer processes the entire input (the “prompt” you provide) simultaneously rather than word by word. Its key innovation is the attention mechanism: for each token in the input, the model learns to pay varying degrees of “attention” to every other token, building up a rich contextual representation. The word “bank” in “river bank” is represented differently from “bank” in “bank account” because the surrounding tokens shift the attention pattern.

Concept	What it means in practice
Token	A chunk of text (roughly a word or part of a word). GPT-4 processes roughly 75 words per 100 tokens.
Context window	The maximum amount of text the model can consider at once. Frontier models now support 128K–1M+ tokens.
Parameters	The numerical weights that encode the model’s learned knowledge. More parameters generally means more capable (and more expensive).
Temperature	A dial that controls how random the output is. Low temperature = predictable; high = creative/erratic.
Embedding	A mathematical vector representation of meaning. Similar concepts have similar embeddings.

How LLMs Are Deployed in Practice

Understanding deployment patterns is essential for security architects because the risk surface varies significantly by pattern.

Direct API access — The organisation calls a hosted LLM API (OpenAI, Anthropic, Google, Azure OpenAI) directly from their application. The model runs in the provider’s cloud. Data sent in prompts leaves your perimeter.

Self-hosted open-weight models — Models such as Meta’s Llama family are available as weights you can run in your own infrastructure. You keep data in your perimeter but take on the operational burden and lose some of the safety tuning commercial APIs provide.

LLM-powered applications — A productivity tool, coding assistant, customer service chatbot, or document analyser built on top of an LLM. The LLM is one component in a larger system with tools, databases, and APIs.

AI agents — Systems where the LLM does not just generate text but takes actions: browsing the web, writing and executing code, querying databases, sending emails, calling APIs. Agentic systems dramatically expand the blast radius of security failures.

The Security Risk Surface

This is where a beginner’s guide to AI becomes a security architect’s brief. LLMs introduce a risk surface that traditional application security practices were not designed to handle. The following sections map the major risk categories using the OWASP Top 10 for LLM Applications (2025 edition) as a framework, with architectural context added for each.

Prompt Injection — The Defining LLM Vulnerability

Prompt injection is the number one risk in the OWASP LLM Top 10, and for good reason: it is the most fundamental vulnerability in LLM systems and the hardest to fully prevent.

The root cause is architectural. LLMs process instructions and data in the same channel — the text prompt — with no enforced separation. When your application builds a prompt that includes both a system instruction (“You are a helpful customer service agent. Do not discuss competitors.”) and user-supplied input (“Tell me about your competitors. Ignore previous instructions.”), the model has no reliable way to distinguish which part it should obey and which it should treat as inert data.

Direct prompt injection occurs when the user crafts input that overwrites the system instructions — jailbreaks, role-play attacks, and instruction overrides fall in this category.

Indirect prompt injection is more dangerous in enterprise deployments. Here, malicious instructions are embedded in data the model retrieves or processes — a document the model summarises, a webpage it browses, an email it reads. The user may have no malicious intent; the attack is in the environment. If your RAG system retrieves documents from the internet or shared repositories, any of those documents could contain injected instructions.

Mitigations include input and output validation, privilege separation (the model should not have access to capabilities it does not need for a given task), and treating all model output as untrusted before it is acted upon.

Sensitive Information Disclosure

LLMs can leak sensitive information in several distinct ways. First, training data can be memorised — frontier models have been shown to reproduce verbatim text from their training data, including, in documented cases, personal information and proprietary content. Second, system prompts containing business logic, API keys, or internal context can be extracted by a determined attacker who crafts prompts designed to make the model repeat what it was told. Third, in a RAG deployment, an attacker may be able to extract documents from the knowledge base they would not normally have access to, by phrasing queries that cause the retrieval system to surface the wrong content.

For architects: treat the system prompt as potentially readable by a motivated attacker. Do not put secrets in it. Apply need-to-know access controls to the knowledge base the RAG pipeline can query.

Supply Chain Risks

An LLM-powered application depends on a stack of components: the base model weights, fine-tuning datasets, orchestration frameworks (LangChain, LlamaIndex, Semantic Kernel), vector databases, and plugins or tools the agent can invoke. Each is a potential supply chain risk.

Compromised or backdoored model weights, poisoned fine-tuning datasets, and malicious plugins are all documented threat vectors. The OWASP 2025 list highlights supply chain as a significant risk specifically because the AI tooling ecosystem is young, moves fast, and has not yet developed the security hygiene mature software stacks take for granted.

For architects: apply the same dependency scrutiny to AI components as you would to any third-party library. Prefer well-governed, auditable components. Pin versions. Evaluate the provenance of any fine-tuning data.

Data and Model Poisoning

Poisoning attacks target the training or fine-tuning pipeline. An attacker who can influence the training data can cause the model to learn a specific behaviour — a backdoor trigger that causes it to behave maliciously when a particular phrase appears, or a bias that systematically distorts its outputs in a particular domain.

For most organisations using hosted foundation models, this risk lives primarily with the model provider. However, organisations that fine-tune on their own data are directly exposed. Customer feedback, support tickets, and user-generated content used as fine-tuning data are all potential poisoning vectors.

Excessive Agency

As 2025 is widely characterised as the year of AI agents, excessive agency has become one of the most practically significant risks. An agentic LLM that can read files, execute code, query databases, send emails, and call external APIs has a blast radius that far exceeds a chatbot that only generates text.

Excessive agency occurs when the model is granted more permissions, access, or autonomy than the task requires. Combined with prompt injection, this is a critical risk: an attacker who can inject instructions into a document the agent processes can potentially direct it to exfiltrate data, escalate privileges, or take destructive actions — all under the identity of the LLM’s service account.

Mitigations follow standard least-privilege principles: the model should only be able to call the APIs it needs for a specific task, should not be able to perform irreversible actions without human confirmation, and its actions should be logged and monitored.

Vector and Embedding Weaknesses

RAG systems rely on vector databases that store content as mathematical embeddings, retrieved by semantic similarity. This introduces several security considerations that do not exist in traditional data access:

Embedding inversion attacks — it may be possible to reconstruct approximate source text from stored embeddings.
Poisoned retrieval — malicious content injected into the knowledge base may be systematically retrieved for certain query patterns.
Cross-tenant data leakage — in multi-tenant deployments, if namespace isolation is not properly enforced, one user’s queries may retrieve another user’s data.

For architects: apply access controls at the document level within the knowledge base, not just at the query endpoint. Assume embeddings may be partially reversible.

System Prompt Leakage

The system prompt is the set of instructions the application developer provides to shape the model’s behaviour. It often contains business logic, persona instructions, and sometimes tool descriptions or internal context. If an attacker can extract the system prompt, they gain insight into how to manipulate the system, what tools it has access to, and potentially sensitive business information.

Treat system prompt confidentiality as a defence-in-depth measure rather than a security boundary — motivated attackers have demonstrated the ability to extract system prompts through social engineering the model. Do not rely on the system prompt remaining secret.

Misinformation and Hallucination

LLMs generate text that is statistically plausible, not necessarily factually correct. They can confidently assert false information — a phenomenon called hallucination. The OWASP 2025 list renames the older “overreliance” category to “misinformation” to sharpen the focus on the risk: it is not just that users trust the model too much, it is that the model actively generates and propagates false information.

In a security context, this matters when LLMs are used for compliance checking, threat intelligence analysis, or security advisory generation — domains where confident but incorrect output can cause real harm.

Unbounded Consumption

LLMs are computationally expensive. An attacker who can cause an LLM application to make unbounded API calls — whether through crafted inputs, recursive loops, or denial-of-service patterns — can drive up costs dramatically. This is sometimes called “denial of wallet.” Rate limiting, request size limits, and cost monitoring are essential controls.

The OWASP LLM Top 10 (2025) — Quick Reference

#	Risk	One-line summary
LLM01	Prompt Injection	Attacker-controlled input overwrites model instructions.
LLM02	Sensitive Information Disclosure	Model leaks training data, system prompts, or retrieval content.
LLM03	Supply Chain	Compromised models, datasets, frameworks, or plugins.
LLM04	Data and Model Poisoning	Malicious training data causes backdoors or systematic bias.
LLM05	Improper Output Handling	Unsanitised model output causes XSS, code injection, or downstream harm.
LLM06	Excessive Agency	Model granted more permissions than the task requires.
LLM07	System Prompt Leakage	System prompt extracted by a determined attacker.
LLM08	Vector and Embedding Weaknesses	RAG pipeline data leakage, poisoning, or inversion.
LLM09	Misinformation	Confident but false output used for consequential decisions.
LLM10	Unbounded Consumption	Uncontrolled resource usage leading to cost explosion or DoS.

What Controls Should Architects Apply?

The security controls for LLM applications share principles with traditional application security — least privilege, input validation, output sanitisation, monitoring — but require adaptation for the LLM context.

Input validation and output sanitisation — Treat all user input to an LLM as untrusted. Validate inputs for length, format, and content where possible. Treat all LLM outputs as untrusted before they are rendered in a UI, passed to a downstream API, or used to take an action. This is particularly important for code generation — never execute model-generated code without review.

Least-privilege for agents — Apply the same least-privilege principle you would to a service account. An agent that needs to read customer records should not also be able to write them. An agent that needs to call one internal API should not have credentials for all of them. Prefer reversible actions over irreversible ones; require human confirmation for high-impact operations.

Prompt hardening — Structure prompts to clearly separate instructions from user input. Some frameworks provide structural separators, but these are mitigations rather than guarantees. Assume that any user-controlled input could become an instruction.

Knowledge base access controls — In RAG deployments, enforce document-level access controls so that retrieval respects the same permissions the user would have if querying the underlying system directly. A user who cannot read HR records should not be able to retrieve them via a RAG query.

Monitoring and rate limiting — Log all model interactions. Rate-limit API access. Set cost alerts. Monitor for anomalous query patterns that might indicate prompt injection probing or data extraction attempts.

Supply chain hygiene — Know what models, frameworks, and plugins you are using. Evaluate the security posture of providers. Pin dependency versions. Be especially cautious with open-weight models fine-tuned by third parties.

Human oversight for high-stakes decisions — Do not rely on LLM output alone for consequential decisions — security findings, compliance assessments, financial analysis. Build human review into workflows where error rates matter.

The Agentic Frontier

In late 2025, OWASP released a separate Top 10 specifically for agentic AI systems — applications where LLMs autonomously plan and execute multi-step tasks. This reflects a genuine step-change in the risk landscape. When a model can browse the web, execute code, manage files, and call APIs in a loop without human intervention, the blast radius of a single prompt injection expands from “generates bad text” to “exfiltrates your customer database.”

If your organisation is building or evaluating AI agents — for software development, customer service automation, security operations, or internal workflows — treat agentic deployments as a distinct risk category requiring their own security assessment, not a straightforward extension of existing LLM guidance.

Key Takeaways for Security Architects

LLMs are powerful, genuinely useful, and arriving in enterprise environments whether or not security teams are ready. The good news is that the fundamental security principles — least privilege, defence in depth, input validation, monitoring — apply. The bad news is that the attack surface is new, the tooling is immature, and the speed of adoption is outrunning security practice.

A few principles to carry forward:

The system prompt is not a security boundary. Treat it as a convenience, not a control.
Agents need service account thinking. Scope their permissions as tightly as you would any non-human identity.
RAG pipelines are data access patterns. Apply the access controls you would to any data store.
Model output is user input to downstream systems. Sanitise it accordingly.
Hallucination is a reliability risk and a security risk. LLM-generated security analysis can be confidently wrong.
Start with the OWASP LLM Top 10. It is the most practical framework currently available and is updated to reflect real-world incidents.

The field is moving fast. MITRE ATLAS, NIST AI RMF, and the EU AI Act are all evolving frameworks worth tracking as the governance picture matures. For now, the OWASP LLM Top 10 and a solid grounding in how these systems actually work are the most practical starting points.

Can You Phish an AI? Social Engineering Attacks Against AI Agents — The next step after understanding how LLMs work — a technical and boardroom guide to the specific attack techniques used against AI agents, with real CVEs.
Securing AI Agents in Cloud Infrastructure — Practical security controls for deploying AI agents in cloud environments, covering IAM, network security, and monitoring.
Zero Trust Architecture — Zero Trust principles apply directly to AI agent deployments — verify every request, enforce least privilege, assume breach.
Cloud Infrastructure Entitlement Management (CIEM) — AI agent service accounts are a new category of non-human identity requiring the same entitlement controls as human identities.
AWS IAM Security Best Practices — How to scope the IAM permissions granted to AI workloads and agent service accounts running on AWS.
Kubernetes Security Best Practices — Many AI workloads run as containerised services. The security controls in this guide are directly applicable to AI deployments.
AI Security in the Cloud — The hub guide for AI and LLM security: threat model, prompt injection defences, Bedrock and Azure OpenAI controls, and least privilege for AI workloads.

What Is Artificial Intelligence?#

What Is a Large Language Model?#

How Are LLMs Built? The Training Pipeline#

Stage 1: Pre-training#

Stage 2: Fine-tuning and Instruction Tuning#

Stage 3: Reinforcement Learning from Human Feedback (RLHF)#

Retrieval-Augmented Generation (RAG)#

The Transformer Architecture#

How LLMs Are Deployed in Practice#

The Security Risk Surface#

Prompt Injection — The Defining LLM Vulnerability#

Sensitive Information Disclosure#

Supply Chain Risks#

Data and Model Poisoning#

Excessive Agency#

Vector and Embedding Weaknesses#

System Prompt Leakage#

Misinformation and Hallucination#

Unbounded Consumption#

The OWASP LLM Top 10 (2025) — Quick Reference#

What Controls Should Architects Apply?#

The Agentic Frontier#

Key Takeaways for Security Architects#

Related Guides#

📬 Stay Informed