LLM Safety and Security Stack

An LLM safety and security stack is a layered set of tools and practices used to protect AI applications from risks like prompt injection, data leakage, and unintended harmful behaviors. It integrates protections throughout the entire application lifecycle, from development to deployment. While LLM safety and security are related, they are distinct disciplines: 

  • Security addresses intentional, malicious attacks from adversaries, such as prompt injection and data poisoning.
  • Safety focuses on the LLM's inherent behavior, ensuring it performs as intended, avoids harmful outputs (toxicity, bias), and aligns with ethical guidelines. 


The LLM safety and security stack

  • A defense-in-depth approach is required to secure an LLM stack, with safeguards applied at multiple layers. 


Input and output guardrails

Guardrails are a critical first line of defense that screen both incoming prompts and outgoing model completions. 

  • Input moderation: Tools screen user prompts to detect malicious requests, including prompt injection, harmful content, and data exfiltration attempts. They can block the request or sanitize sensitive information like Personally Identifiable Information (PII) before it reaches the model.
  • Output moderation: Tools scan the LLM's responses for toxicity, hallucinations, sensitive information, or compliance violations. They can automatically flag or block unsafe or non-compliant content.
  • Vector embedding filters: Advanced guardrail systems can analyze the semantic meaning of prompts using embeddings to identify and block queries that are semantically similar to known malicious inputs, even if the phrasing is new.
  • PII anonymization: Services redact or mask sensitive user information (like names, phone numbers, and addresses) before and after it is processed by the LLM. 


Model and data controls

This layer focuses on securing the core model and the data it accesses.

  • Access controls: Implement strict Role-Based Access Control (RBAC) and multi-factor authentication (MFA) to restrict who can access or configure the model and fine-tuning data.
  • Fine-tuning data validation: Before fine-tuning a model, all datasets should be validated and scanned for malicious payloads, sensitive data, and harmful biases. This prevents the introduction of vulnerabilities during training.
  • Secure supply chain: The components used to build LLM applications—including pre-trained models, libraries, and datasets—must be sourced from trusted providers and monitored for integrity. A software bill of materials (SBOM) is crucial for this.
  • Retrieval-Augmented Generation (RAG) safeguards: For RAG systems, which pull data from external sources, strict validation and access controls must be applied to all retrieval sources to prevent malicious data from hijacking model behavior. 


Infrastructure and platform security

Traditional cybersecurity practices are applied to the infrastructure where LLMs operate.

  • Network security: Protect the network layer by using firewalls, intrusion detection systems, and secure protocols like HTTPS to prevent unauthorized access and protect data in transit.
  • Isolated environments: Run LLMs and sensitive applications in isolated, containerized environments. This reduces the "blast radius" of a potential breach by containing damage if the model is compromised.
  • Secure APIs: Manage and secure API access with strong authentication, encryption, and API gateways. Treat all model interactions as untrusted by default.
  • Secure secrets management: Use external secrets vaults for API keys and credentials rather than hardcoding them into the application. 


Governance and monitoring

These processes provide oversight and ensure continuous improvement of the security and safety posture.

  • Observability and monitoring: Robust monitoring is essential for identifying unusual activity, such as jailbreak attempts, data leaks, and sudden changes in model behavior. This can be achieved through logging, tracing, and anomaly detection.
  • Policy enforcement: Use specialized tools to enforce pre-defined policies related to PII, toxicity, and disallowed topics.
  • Incident response planning: Create a plan to contain and mitigate damage in the event of a security or privacy breach.
  • Security audits: Regularly audit the entire LLM stack and conduct penetration testing to identify and address vulnerabilities, paying special attention to LLM-specific risks.