---
title: "Threat Model"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Threat Model}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>", eval = TRUE)
```
```{css, echo = FALSE, eval = TRUE}
.llmshieldr-info-box {
border-left: 4px solid #2f80ed;
background: #f3f8ff;
padding: 1rem 1.15rem;
margin: 1.5rem 0;
border-radius: 0.35rem;
}
.llmshieldr-info-box h2,
.llmshieldr-info-box h3,
.llmshieldr-info-box h4 {
margin-top: 0;
}
.llmshieldr-info-box p:last-child,
.llmshieldr-info-box ul:last-child,
.llmshieldr-info-box ol:last-child {
margin-bottom: 0;
}
```
`llmshieldr` is an application-level guardrail layer for R workflows that send
text to large language models or receive text from them. It helps make common
risks visible and auditable; it is not a complete security boundary.
## Assets
- User prompts and chat history.
- Retrieved context in RAG workflows.
- Model outputs before display, storage, or downstream use.
- Sensitive data such as PII, PHI, credentials, and business records.
- Tool inputs and outputs in applications that call external systems.
- Streaming chunks before complete model output is available.
- Audit logs and policy configuration.
## Trust Boundaries
- User-provided text entering an R application.
- Retrieved documents, search results, or database rows entering model context.
- Model output leaving the LLM provider or local model.
- Tool calls that can affect files, databases, APIs, accounts, or transactions.
- Streaming output chunks crossing from model provider to application.
- Audit logs written to local or shared storage.
## In Scope
`llmshieldr` provides starter controls for:
- Direct and indirect prompt-injection language.
- Common PII, PHI, and secret patterns.
- Simple NLP intent signals for override, exposure, and harmful-action intent.
- Output markers for unsafe agency, system-prompt leakage, unsafe code, and
high-confidence medical or financial claims.
- RAG context source allowlists and simple context anomaly signals.
- Tool-call argument scanning and tool-output scanning.
- Conversation scanning with role-preserving metadata.
- Streaming output scanning with rolling context.
- Token and request budget guards with pre-call reservation and rollback.
- Optional semantic review through a reviewer function, chat object, local
Ollama reviewer, or remote reviewer endpoint.
- Auditable findings, actions, risk scores, and JSONL/CSV/RDS audit output.
## Partially Covered
These areas have package surface but need workflow-specific evidence or
additional controls before they should be treated as robust protections:
- OWASP LLM Top 10 coverage. The package maps controls to categories, but this
is not exhaustive protection for each category.
- Obfuscated prompt injection. Unicode normalization, delimiter collapse,
invisible-text findings, and encoded-payload checks help, but a larger
adversarial evaluation suite is still needed.
- RAG poisoning. Source allowlists and anomaly checks help, but there is no
provenance scoring, embedding-neighborhood analysis, or document trust graph.
- Semantic review. Reviewer JSON is parsed with schema metadata, confidence,
evidence, recommended actions, span support, and structured failure metadata,
but reviewer reliability depends on the model and deployment.
- Tool and streaming guardrails. Package helpers scan text surfaces, but they
do not replace application authorization, sandboxing, idempotency, or
rollback for external side effects.
## Out Of Scope
`llmshieldr` does not provide:
- A network firewall or sandbox.
- Model training-time alignment.
- Formal compliance certification.
- Guaranteed PII/PHI discovery.
- Malware analysis.
- Full multilingual safety coverage.
- Automated execution of tools or tool authorization.
- Full human approval workflow management beyond `escalate` action metadata.
- Cross-machine distributed rate limiting.
- Protection against compromised model providers, dependencies, or
infrastructure.
## Expected Use
Use `llmshieldr` as one transparent layer in a broader safety design:
- Scan and redact prompts before sending them to a model.
- Scan retrieved context before adding it to prompts.
- Scan model outputs before display or downstream use.
- Scan tool-call inputs before execution and tool outputs before reuse.
- Scan streaming output chunks when using streaming APIs.
- Configure policy controls for refusal and escalation behavior.
- Write audit logs to sensitive storage.
- Add organization-specific rules and negative tests.
- Run evaluations against your own application data before deployment.
::: {.llmshieldr-info-box}
## Non-Goals
Do not describe `llmshieldr` as guaranteeing safety, compliance, jailbreak
resistance, or complete OWASP coverage. It is an R-native, transparent,
testable guardrail package with starter controls and extension points.
:::