--- title: "Threat Model" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Threat Model} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>", eval = TRUE) ``` ```{css, echo = FALSE, eval = TRUE} .llmshieldr-info-box { border-left: 4px solid #2f80ed; background: #f3f8ff; padding: 1rem 1.15rem; margin: 1.5rem 0; border-radius: 0.35rem; } .llmshieldr-info-box h2, .llmshieldr-info-box h3, .llmshieldr-info-box h4 { margin-top: 0; } .llmshieldr-info-box p:last-child, .llmshieldr-info-box ul:last-child, .llmshieldr-info-box ol:last-child { margin-bottom: 0; } ``` `llmshieldr` is an application-level guardrail layer for R workflows that send text to large language models or receive text from them. It helps make common risks visible and auditable; it is not a complete security boundary. ## Assets - User prompts and chat history. - Retrieved context in RAG workflows. - Model outputs before display, storage, or downstream use. - Sensitive data such as PII, PHI, credentials, and business records. - Tool inputs and outputs in applications that call external systems. - Streaming chunks before complete model output is available. - Audit logs and policy configuration. ## Trust Boundaries - User-provided text entering an R application. - Retrieved documents, search results, or database rows entering model context. - Model output leaving the LLM provider or local model. - Tool calls that can affect files, databases, APIs, accounts, or transactions. - Streaming output chunks crossing from model provider to application. - Audit logs written to local or shared storage. ## In Scope `llmshieldr` provides starter controls for: - Direct and indirect prompt-injection language. - Common PII, PHI, and secret patterns. - Simple NLP intent signals for override, exposure, and harmful-action intent. - Output markers for unsafe agency, system-prompt leakage, unsafe code, and high-confidence medical or financial claims. - RAG context source allowlists and simple context anomaly signals. - Tool-call argument scanning and tool-output scanning. - Conversation scanning with role-preserving metadata. - Streaming output scanning with rolling context. - Token and request budget guards with pre-call reservation and rollback. - Optional semantic review through a reviewer function, chat object, local Ollama reviewer, or remote reviewer endpoint. - Auditable findings, actions, risk scores, and JSONL/CSV/RDS audit output. ## Partially Covered These areas have package surface but need workflow-specific evidence or additional controls before they should be treated as robust protections: - OWASP LLM Top 10 coverage. The package maps controls to categories, but this is not exhaustive protection for each category. - Obfuscated prompt injection. Unicode normalization, delimiter collapse, invisible-text findings, and encoded-payload checks help, but a larger adversarial evaluation suite is still needed. - RAG poisoning. Source allowlists and anomaly checks help, but there is no provenance scoring, embedding-neighborhood analysis, or document trust graph. - Semantic review. Reviewer JSON is parsed with schema metadata, confidence, evidence, recommended actions, span support, and structured failure metadata, but reviewer reliability depends on the model and deployment. - Tool and streaming guardrails. Package helpers scan text surfaces, but they do not replace application authorization, sandboxing, idempotency, or rollback for external side effects. ## Out Of Scope `llmshieldr` does not provide: - A network firewall or sandbox. - Model training-time alignment. - Formal compliance certification. - Guaranteed PII/PHI discovery. - Malware analysis. - Full multilingual safety coverage. - Automated execution of tools or tool authorization. - Full human approval workflow management beyond `escalate` action metadata. - Cross-machine distributed rate limiting. - Protection against compromised model providers, dependencies, or infrastructure. ## Expected Use Use `llmshieldr` as one transparent layer in a broader safety design: - Scan and redact prompts before sending them to a model. - Scan retrieved context before adding it to prompts. - Scan model outputs before display or downstream use. - Scan tool-call inputs before execution and tool outputs before reuse. - Scan streaming output chunks when using streaming APIs. - Configure policy controls for refusal and escalation behavior. - Write audit logs to sensitive storage. - Add organization-specific rules and negative tests. - Run evaluations against your own application data before deployment. ::: {.llmshieldr-info-box} ## Non-Goals Do not describe `llmshieldr` as guaranteeing safety, compliance, jailbreak resistance, or complete OWASP coverage. It is an R-native, transparent, testable guardrail package with starter controls and extension points. :::