---
title: "Threat Model"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Threat Model}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>", eval = TRUE)
```

```{css, echo = FALSE, eval = TRUE}
.llmshieldr-info-box {
  border-left: 4px solid #2f80ed;
  background: #f3f8ff;
  padding: 1rem 1.15rem;
  margin: 1.5rem 0;
  border-radius: 0.35rem;
}

.llmshieldr-info-box h2,
.llmshieldr-info-box h3,
.llmshieldr-info-box h4 {
  margin-top: 0;
}

.llmshieldr-info-box p:last-child,
.llmshieldr-info-box ul:last-child,
.llmshieldr-info-box ol:last-child {
  margin-bottom: 0;
}
```

`llmshieldr` is an application-level guardrail layer for R workflows that send
text to large language models or receive text from them. It helps make common
risks visible and auditable; it is not a complete security boundary.

## Assets

- User prompts and chat history.
- Retrieved context in RAG workflows.
- Model outputs before display, storage, or downstream use.
- Sensitive data such as PII, PHI, credentials, and business records.
- Tool inputs and outputs in applications that call external systems.
- Streaming chunks before complete model output is available.
- Audit logs and policy configuration.

## Trust Boundaries

- User-provided text entering an R application.
- Retrieved documents, search results, or database rows entering model context.
- Model output leaving the LLM provider or local model.
- Tool calls that can affect files, databases, APIs, accounts, or transactions.
- Streaming output chunks crossing from model provider to application.
- Audit logs written to local or shared storage.

## In Scope

`llmshieldr` provides starter controls for:

- Direct and indirect prompt-injection language.
- Common PII, PHI, and secret patterns.
- Simple NLP intent signals for override, exposure, and harmful-action intent.
- Output markers for unsafe agency, system-prompt leakage, unsafe code, and
  high-confidence medical or financial claims.
- RAG context source allowlists and simple context anomaly signals.
- Tool-call argument scanning and tool-output scanning.
- Conversation scanning with role-preserving metadata.
- Streaming output scanning with rolling context.
- Token and request budget guards with pre-call reservation and rollback.
- Optional semantic review through a reviewer function, chat object, local
  Ollama reviewer, or remote reviewer endpoint.
- Auditable findings, actions, risk scores, and JSONL/CSV/RDS audit output.

## Partially Covered

These areas have package surface but need workflow-specific evidence or
additional controls before they should be treated as robust protections:

- OWASP LLM Top 10 coverage. The package maps controls to categories, but this
  is not exhaustive protection for each category.
- Obfuscated prompt injection. Unicode normalization, delimiter collapse,
  invisible-text findings, and encoded-payload checks help, but a larger
  adversarial evaluation suite is still needed.
- RAG poisoning. Source allowlists and anomaly checks help, but there is no
  provenance scoring, embedding-neighborhood analysis, or document trust graph.
- Semantic review. Reviewer JSON is parsed with schema metadata, confidence,
  evidence, recommended actions, span support, and structured failure metadata,
  but reviewer reliability depends on the model and deployment.
- Tool and streaming guardrails. Package helpers scan text surfaces, but they
  do not replace application authorization, sandboxing, idempotency, or
  rollback for external side effects.

## Out Of Scope

`llmshieldr` does not provide:

- A network firewall or sandbox.
- Model training-time alignment.
- Formal compliance certification.
- Guaranteed PII/PHI discovery.
- Malware analysis.
- Full multilingual safety coverage.
- Automated execution of tools or tool authorization.
- Full human approval workflow management beyond `escalate` action metadata.
- Cross-machine distributed rate limiting.
- Protection against compromised model providers, dependencies, or
  infrastructure.

## Expected Use

Use `llmshieldr` as one transparent layer in a broader safety design:

- Scan and redact prompts before sending them to a model.
- Scan retrieved context before adding it to prompts.
- Scan model outputs before display or downstream use.
- Scan tool-call inputs before execution and tool outputs before reuse.
- Scan streaming output chunks when using streaming APIs.
- Configure policy controls for refusal and escalation behavior.
- Write audit logs to sensitive storage.
- Add organization-specific rules and negative tests.
- Run evaluations against your own application data before deployment.

::: {.llmshieldr-info-box}
## Non-Goals

Do not describe `llmshieldr` as guaranteeing safety, compliance, jailbreak
resistance, or complete OWASP coverage. It is an R-native, transparent,
testable guardrail package with starter controls and extension points.
:::