Artificial intelligence has moved from research curiosity to enterprise infrastructure in under five years. If you are a cloud security architect who has not yet had to secure an LLM-powered application, you almost certainly will within the next twelve months. This guide explains what large language models are, how they are built, and — critically — what the security risks are when you deploy, integrate, or expose them in your organisation.
No prior knowledge of machine learning is assumed. By the end you should understand enough to have an informed conversation with an AI engineering team, assess the risk surface of an LLM deployment, and apply the controls the industry has coalesced around.
What Is Artificial Intelligence?
Artificial intelligence is a broad term for software systems that perform tasks we would traditionally associate with human reasoning — understanding language, recognising images, making decisions, solving problems. The field has existed since the 1950s but remained largely academic until two things converged: enormous datasets became available (much of the internet), and the hardware to process them (GPUs, TPUs) became cheap enough to use at scale.
Modern AI is almost entirely based on a technique called machine learning — rather than programming a system with explicit rules, you show it millions of examples and let it learn patterns from the data. The model does not follow a rule book. It develops internal representations of the world through statistical patterns in what it has seen.
Within machine learning, deep learning uses networks of mathematical nodes loosely inspired by neurons in the brain — hence “neural networks”. These networks can learn extremely complex patterns given enough data and compute. Modern large language models are deep learning systems, and they are currently the most commercially significant form of AI.
What Is a Large Language Model?
A large language model (LLM) is a type of AI system trained on vast quantities of text — books, websites, code, scientific papers, forums — with one goal: predict the next word (or more precisely, the next token, a chunk of text) given what came before.
That sounds almost trivially simple. The remarkable thing is what emerges when you train a large enough network on a large enough dataset. The model does not merely memorise text. It develops internal representations of grammar, facts, reasoning patterns, and something resembling world knowledge. By the time a frontier model like GPT-4, Claude, or Gemini is trained, it can write code, summarise legal documents, explain complex topics, answer questions, and carry multi-turn conversations — all from that single training objective of “predict the next token”.
The “large” in LLM refers to the scale: frontier models have hundreds of billions of internal parameters (the numerical weights that encode learned knowledge), trained on trillions of tokens of text, using thousands of specialised chips running for months. The compute cost of training a frontier model is measured in tens to hundreds of millions of dollars.
How Are LLMs Built? The Training Pipeline
Understanding the build process helps you understand both the capabilities and the failure modes. LLM development typically has three stages.
Stage 1: Pre-training
The model is initialised with random weights and then exposed to a huge dataset — often several trillion tokens scraped from the public internet, books, code repositories, and licensed text corpora. During pre-training, the model repeatedly predicts the next token, compares its prediction to what actually came next, measures the error, and adjusts its weights slightly to do better next time. This process — called backpropagation — happens billions of times across the entire dataset.
Pre-training is the most expensive phase. It is also where the model acquires most of its knowledge and capability. The resulting model is called a base model or foundation model. It is good at completing text but has no particular sense of being helpful or following instructions — it will happily continue a harmful sentence if that is what the statistics suggest comes next.
Stage 2: Fine-tuning and Instruction Tuning
The base model is further trained on a smaller, curated dataset of (prompt, ideal response) pairs. This teaches the model to follow instructions rather than just complete text. The model learns that when someone asks a question, the appropriate next tokens are an answer — not more of the question.
This phase is far cheaper than pre-training because the dataset is small (thousands to hundreds of thousands of examples rather than trillions of tokens) and the model’s weights are already close to useful.
Stage 3: Reinforcement Learning from Human Feedback (RLHF)
In the final phase, human raters compare pairs of model responses and indicate which is better — more accurate, more helpful, safer, less harmful. These preferences train a separate model called a reward model, which learns to score responses the way the human raters would. The LLM is then further tuned to maximise that reward score.
RLHF is what makes modern LLMs feel conversational and helpful rather than erratic. It is also a primary mechanism for safety alignment — teaching the model to decline harmful requests. The limitations of this alignment are a significant security consideration, which we will return to.
Retrieval-Augmented Generation (RAG)
Many enterprise LLM deployments add a fourth component: retrieval-augmented generation. Rather than relying solely on knowledge baked into the model’s weights at training time, a RAG system retrieves relevant documents from an external knowledge base (a vector database, a document store, an internal wiki) and injects them into the prompt. This gives the model access to current, organisation-specific information it was never trained on.
RAG is now the dominant pattern for enterprise LLM applications. Its security implications are significant — the retrieval pipeline, the vector database, and the injected context are all attack surfaces.
The Transformer Architecture
All major LLMs use a neural network design called the transformer, introduced by Google researchers in 2017. You do not need to understand the mathematics, but a high-level mental model is useful.
The transformer processes the entire input (the “prompt” you provide) simultaneously rather than word by word. Its key innovation is the attention mechanism: for each token in the input, the model learns to pay varying degrees of “attention” to every other token, building up a rich contextual representation. The word “bank” in “river bank” is represented differently from “bank” in “bank account” because the surrounding tokens shift the attention pattern.
| Concept | What it means in practice |
|---|---|
| Token | A chunk of text (roughly a word or part of a word). GPT-4 processes roughly 75 words per 100 tokens. |
| Context window | The maximum amount of text the model can consider at once. Frontier models now support 128K–1M+ tokens. |
| Parameters | The numerical weights that encode the model’s learned knowledge. More parameters generally means more capable (and more expensive). |
| Temperature | A dial that controls how random the output is. Low temperature = predictable; high = creative/erratic. |
| Embedding | A mathematical vector representation of meaning. Similar concepts have similar embeddings. |
How LLMs Are Deployed in Practice
Understanding deployment patterns is essential for security architects because the risk surface varies significantly by pattern.
Direct API access — The organisation calls a hosted LLM API (OpenAI, Anthropic, Google, Azure OpenAI) directly from their application. The model runs in the provider’s cloud. Data sent in prompts leaves your perimeter.
Self-hosted open-weight models — Models such as Meta’s Llama family are available as weights you can run in your own infrastructure. You keep data in your perimeter but take on the operational burden and lose some of the safety tuning commercial APIs provide.
LLM-powered applications — A productivity tool, coding assistant, customer service chatbot, or document analyser built on top of an LLM. The LLM is one component in a larger system with tools, databases, and APIs.
AI agents — Systems where the LLM does not just generate text but takes actions: browsing the web, writing and executing code, querying databases, sending emails, calling APIs. Agentic systems dramatically expand the blast radius of security failures.
The Security Risk Surface
This is where a beginner’s guide to AI becomes a security architect’s brief. LLMs introduce a risk surface that traditional application security practices were not designed to handle. The following sections map the major risk categories using the OWASP Top 10 for LLM Applications (2025 edition) as a framework, with architectural context added for each.
Prompt Injection — The Defining LLM Vulnerability
Prompt injection is the number one risk in the OWASP LLM Top 10, and for good reason: it is the most fundamental vulnerability in LLM systems and the hardest to fully prevent.
The root cause is architectural. LLMs process instructions and data in the same channel — the text prompt — with no enforced separation. When your application builds a prompt that includes both a system instruction (“You are a helpful customer service agent. Do not discuss competitors.”) and user-supplied input (“Tell me about your competitors. Ignore previous instructions.”), the model has no reliable way to distinguish which part it should obey and which it should treat as inert data.
Direct prompt injection occurs when the user crafts input that overwrites the system instructions — jailbreaks, role-play attacks, and instruction overrides fall in this category.
Indirect prompt injection is more dangerous in enterprise deployments. Here, malicious instructions are embedded in data the model retrieves or processes — a document the model summarises, a webpage it browses, an email it reads. The user may have no malicious intent; the attack is in the environment. If your RAG system retrieves documents from the internet or shared repositories, any of those documents could contain injected instructions.
Mitigations include input and output validation, privilege separation (the model should not have access to capabilities it does not need for a given task), and treating all model output as untrusted before it is acted upon.
Sensitive Information Disclosure
LLMs can leak sensitive information in several distinct ways. First, training data can be memorised — frontier models have been shown to reproduce verbatim text from their training data, including, in documented cases, personal information and proprietary content. Second, system prompts containing business logic, API keys, or internal context can be extracted by a determined attacker who crafts prompts designed to make the model repeat what it was told. Third, in a RAG deployment, an attacker may be able to extract documents from the knowledge base they would not normally have access to, by phrasing queries that cause the retrieval system to surface the wrong content.
For architects: treat the system prompt as potentially readable by a motivated attacker. Do not put secrets in it. Apply need-to-know access controls to the knowledge base the RAG pipeline can query.
Supply Chain Risks
An LLM-powered application depends on a stack of components: the base model weights, fine-tuning datasets, orchestration frameworks (LangChain, LlamaIndex, Semantic Kernel), vector databases, and plugins or tools the agent can invoke. Each is a potential supply chain risk.
Compromised or backdoored model weights, poisoned fine-tuning datasets, and malicious plugins are all documented threat vectors. The OWASP 2025 list highlights supply chain as a significant risk specifically because the AI tooling ecosystem is young, moves fast, and has not yet developed the security hygiene mature software stacks take for granted.
For architects: apply the same dependency scrutiny to AI components as you would to any third-party library. Prefer well-governed, auditable components. Pin versions. Evaluate the provenance of any fine-tuning data.
Data and Model Poisoning
Poisoning attacks target the training or fine-tuning pipeline. An attacker who can influence the training data can cause the model to learn a specific behaviour — a backdoor trigger that causes it to behave maliciously when a particular phrase appears, or a bias that systematically distorts its outputs in a particular domain.
For most organisations using hosted foundation models, this risk lives primarily with the model provider. However, organisations that fine-tune on their own data are directly exposed. Customer feedback, support tickets, and user-generated content used as fine-tuning data are all potential poisoning vectors.
Excessive Agency
As 2025 is widely characterised as the year of AI agents, excessive agency has become one of the most practically significant risks. An agentic LLM that can read files, execute code, query databases, send emails, and call external APIs has a blast radius that far exceeds a chatbot that only generates text.
Excessive agency occurs when the model is granted more permissions, access, or autonomy than the task requires. Combined with prompt injection, this is a critical risk: an attacker who can inject instructions into a document the agent processes can potentially direct it to exfiltrate data, escalate privileges, or take destructive actions — all under the identity of the LLM’s service account.
Mitigations follow standard least-privilege principles: the model should only be able to call the APIs it needs for a specific task, should not be able to perform irreversible actions without human confirmation, and its actions should be logged and monitored.
Vector and Embedding Weaknesses
RAG systems rely on vector databases that store content as mathematical embeddings, retrieved by semantic similarity. This introduces several security considerations that do not exist in traditional data access:
- Embedding inversion attacks — it may be possible to reconstruct approximate source text from stored embeddings.
- Poisoned retrieval — malicious content injected into the knowledge base may be systematically retrieved for certain query patterns.
- Cross-tenant data leakage — in multi-tenant deployments, if namespace isolation is not properly enforced, one user’s queries may retrieve another user’s data.
For architects: apply access controls at the document level within the knowledge base, not just at the query endpoint. Assume embeddings may be partially reversible.
System Prompt Leakage
The system prompt is the set of instructions the application developer provides to shape the model’s behaviour. It often contains business logic, persona instructions, and sometimes tool descriptions or internal context. If an attacker can extract the system prompt, they gain insight into how to manipulate the system, what tools it has access to, and potentially sensitive business information.
Treat system prompt confidentiality as a defence-in-depth measure rather than a security boundary — motivated attackers have demonstrated the ability to extract system prompts through social engineering the model. Do not rely on the system prompt remaining secret.
Misinformation and Hallucination
LLMs generate text that is statistically plausible, not necessarily factually correct. They can confidently assert false information — a phenomenon called hallucination. The OWASP 2025 list renames the older “overreliance” category to “misinformation” to sharpen the focus on the risk: it is not just that users trust the model too much, it is that the model actively generates and propagates false information.
In a security context, this matters when LLMs are used for compliance checking, threat intelligence analysis, or security advisory generation — domains where confident but incorrect output can cause real harm.
Unbounded Consumption
LLMs are computationally expensive. An attacker who can cause an LLM application to make unbounded API calls — whether through crafted inputs, recursive loops, or denial-of-service patterns — can drive up costs dramatically. This is sometimes called “denial of wallet.” Rate limiting, request size limits, and cost monitoring are essential controls.
The OWASP LLM Top 10 (2025) — Quick Reference
| # | Risk | One-line summary |
|---|---|---|
| LLM01 | Prompt Injection | Attacker-controlled input overwrites model instructions. |
| LLM02 | Sensitive Information Disclosure | Model leaks training data, system prompts, or retrieval content. |
| LLM03 | Supply Chain | Compromised models, datasets, frameworks, or plugins. |
| LLM04 | Data and Model Poisoning | Malicious training data causes backdoors or systematic bias. |
| LLM05 | Improper Output Handling | Unsanitised model output causes XSS, code injection, or downstream harm. |
| LLM06 | Excessive Agency | Model granted more permissions than the task requires. |
| LLM07 | System Prompt Leakage | System prompt extracted by a determined attacker. |
| LLM08 | Vector and Embedding Weaknesses | RAG pipeline data leakage, poisoning, or inversion. |
| LLM09 | Misinformation | Confident but false output used for consequential decisions. |
| LLM10 | Unbounded Consumption | Uncontrolled resource usage leading to cost explosion or DoS. |
What Controls Should Architects Apply?
The security controls for LLM applications share principles with traditional application security — least privilege, input validation, output sanitisation, monitoring — but require adaptation for the LLM context.
Input validation and output sanitisation — Treat all user input to an LLM as untrusted. Validate inputs for length, format, and content where possible. Treat all LLM outputs as untrusted before they are rendered in a UI, passed to a downstream API, or used to take an action. This is particularly important for code generation — never execute model-generated code without review.
Least-privilege for agents — Apply the same least-privilege principle you would to a service account. An agent that needs to read customer records should not also be able to write them. An agent that needs to call one internal API should not have credentials for all of them. Prefer reversible actions over irreversible ones; require human confirmation for high-impact operations.
Prompt hardening — Structure prompts to clearly separate instructions from user input. Some frameworks provide structural separators, but these are mitigations rather than guarantees. Assume that any user-controlled input could become an instruction.
Knowledge base access controls — In RAG deployments, enforce document-level access controls so that retrieval respects the same permissions the user would have if querying the underlying system directly. A user who cannot read HR records should not be able to retrieve them via a RAG query.
Monitoring and rate limiting — Log all model interactions. Rate-limit API access. Set cost alerts. Monitor for anomalous query patterns that might indicate prompt injection probing or data extraction attempts.
Supply chain hygiene — Know what models, frameworks, and plugins you are using. Evaluate the security posture of providers. Pin dependency versions. Be especially cautious with open-weight models fine-tuned by third parties.
Human oversight for high-stakes decisions — Do not rely on LLM output alone for consequential decisions — security findings, compliance assessments, financial analysis. Build human review into workflows where error rates matter.
The Agentic Frontier
In late 2025, OWASP released a separate Top 10 specifically for agentic AI systems — applications where LLMs autonomously plan and execute multi-step tasks. This reflects a genuine step-change in the risk landscape. When a model can browse the web, execute code, manage files, and call APIs in a loop without human intervention, the blast radius of a single prompt injection expands from “generates bad text” to “exfiltrates your customer database.”
If your organisation is building or evaluating AI agents — for software development, customer service automation, security operations, or internal workflows — treat agentic deployments as a distinct risk category requiring their own security assessment, not a straightforward extension of existing LLM guidance.
Key Takeaways for Security Architects
LLMs are powerful, genuinely useful, and arriving in enterprise environments whether or not security teams are ready. The good news is that the fundamental security principles — least privilege, defence in depth, input validation, monitoring — apply. The bad news is that the attack surface is new, the tooling is immature, and the speed of adoption is outrunning security practice.
A few principles to carry forward:
- The system prompt is not a security boundary. Treat it as a convenience, not a control.
- Agents need service account thinking. Scope their permissions as tightly as you would any non-human identity.
- RAG pipelines are data access patterns. Apply the access controls you would to any data store.
- Model output is user input to downstream systems. Sanitise it accordingly.
- Hallucination is a reliability risk and a security risk. LLM-generated security analysis can be confidently wrong.
- Start with the OWASP LLM Top 10. It is the most practical framework currently available and is updated to reflect real-world incidents.
The field is moving fast. MITRE ATLAS, NIST AI RMF, and the EU AI Act are all evolving frameworks worth tracking as the governance picture matures. For now, the OWASP LLM Top 10 and a solid grounding in how these systems actually work are the most practical starting points.