Why This Pattern Exists
[!INFO] The first mistake many teams make with AI assistance is asking an AI model to answer business questions directly from its training memory. That is risky for any financial institution, especially when you are dealing with payment journeys.
A Customer may ask
[!NOTE] My Payment is pending. Can I cancel it, and when will the money return to my account?
A useful answer depends on current business facts and what exactly the status of the payment is:
- The customer’s transaction status.
- The payment method.
- The bank/card/network settlement stage.
- Internal policy.
- Region-specific or scheme-specific rules.
- Whether the user is authenticated and authorized to see this information.
The model alone does not know those facts. A Grounded Retrieval-Augmented Generation pattern gives the model relevant, approved context before it answers.
[!INFO] The principle is simple: Retrieve trusted information first. Let the model explain using that information. Refuse or escalate when the retrieved context is not enough.
Mental Model
Grounded RAG is not “chat with PDFs.” It is an architecture pattern for controlling what knowledge the AI is allowed to use.
The model becomes a language and reasoning layer over:
- Trusted policy documents.
- Product FAQs.
- Operational runbooks.
- Transaction facts from internal systems.
- Guardrails that decide whether the assistant can answer.
- For regulated or high-trust domains, RAG should be treated as a controlled information supply chain.
High-Level Architecture
Diagrams below use a shared colour palette so component roles and trust boundaries are visible at a glance. Subgraphs mark the responsibility split between the user edge, the AI plane, grounded knowledge, source-of-truth APIs, and governance.
flowchart TB
classDef user fill:#E0F2FE,stroke:#0369A1,color:#0C4A6E,stroke-width:2px
classDef edge fill:#FEF3C7,stroke:#B45309,color:#78350F,stroke-width:2px
classDef ai fill:#EDE9FE,stroke:#6D28D9,color:#4C1D95,stroke-width:2px
classDef data fill:#DCFCE7,stroke:#15803D,color:#14532D,stroke-width:2px
classDef external fill:#FEE2E2,stroke:#B91C1C,color:#7F1D1D,stroke-width:2px,stroke-dasharray: 4 3
classDef governance fill:#F3F4F6,stroke:#374151,color:#111827,stroke-width:1px
U[Customer or Support Agent]:::user
subgraph EDGE["π Edge / Trust Boundary"]
UI[Web or Support Portal]:::edge
AUTH[Authentication and Authorization]:::edge
end
subgraph AI_PLANE["π€ AI Plane"]
API[AI Assistant API]:::ai
CLASSIFY[Intent and Risk Classifier]:::ai
RETRIEVE[Retriever]:::ai
CONTEXT[Context Builder]:::ai
LLM[LLM Response Generator]:::ai
VALIDATE[Answer Validator]:::ai
end
subgraph KNOWLEDGE["π Grounded Knowledge"]
DOCS[Approved Policy Docs and FAQs]:::data
CHUNK[Chunk and Embed Pipeline]:::data
VDB[(Vector Database)]:::data
end
subgraph EXT["π Source-of-Truth APIs (read-only)"]
PAYAPI[Payment Status API]:::external
PAYDB[(Payment System)]:::external
end
subgraph GOV["π‘οΈ Governance"]
AUDIT[(Audit Logs)]:::governance
OBS[Tracing and Evaluation]:::governance
HUMAN[Human Escalation Queue]:::governance
end
U --> UI --> AUTH --> API
API --> CLASSIFY --> RETRIEVE
DOCS --> CHUNK --> VDB --> RETRIEVE
API --> PAYAPI
PAYDB --> PAYAPI
RETRIEVE --> CONTEXT
PAYAPI --> CONTEXT
CLASSIFY --> CONTEXT
CONTEXT --> LLM --> VALIDATE
VALIDATE -->|Grounded and Safe| UI
VALIDATE -->|Low Confidence or Restricted| HUMAN
API --> AUDIT
VALIDATE --> OBS
HUMAN --> OBS
subgraph LEGEND[" Legend "]
L1[User]:::user
L2[Edge / Trust]:::edge
L3[AI Plane]:::ai
L4[Grounded Data]:::data
L5[External API]:::external
L6[Governance]:::governance
end
[!TIP] A logged-in customer asks: “I made a card payment yesterday and it is still pending. Can I cancel it?”
[!NOTE] To address the customer query β simplified journey:
- User authenticates in the banking app.
- Assistant detects this is a payment-status question.
- System retrieves the transaction status from a read-only payment API.
- System retrieves approved policy snippets about pending card payments.
- The model generates an answer using only retrieved policy and transaction facts.
- If the status is ambiguous, the assistant escalates to support instead of guessing.
[!NOTE] Example Answer Shape Your payment is currently pending, which usually means the merchant has authorized the amount but final settlement has not completed yet. Based on the payment policy, pending card payments usually cannot be cancelled by the bank while the merchant authorization is active. If the merchant releases the authorization, the amount is normally made available again after the hold expires. I can help you raise a support request if you do not recognize this payment.
Sources:
- Card payment pending policy
- Transaction status from payment service
[!CAUTION] Important: the assistant is not moving money, cancelling payments, or making a dispute decision. It is explaining status and routing the user safely.
Activity Diagram
Same colour language applied here β green is a safe terminal state, red is a refusal or escalation, yellow is a decision gate, purple is an AI step, grey is governance/logging.
flowchart TD
classDef start fill:#E0F2FE,stroke:#0369A1,color:#0C4A6E,stroke-width:2px
classDef ai fill:#EDE9FE,stroke:#6D28D9,color:#4C1D95,stroke-width:2px
classDef decision fill:#FEF3C7,stroke:#B45309,color:#78350F,stroke-width:2px
classDef safe fill:#DCFCE7,stroke:#15803D,color:#14532D,stroke-width:2px
classDef refuse fill:#FEE2E2,stroke:#B91C1C,color:#7F1D1D,stroke-width:2px
classDef log fill:#F3F4F6,stroke:#374151,color:#111827,stroke-width:1px
A[User asks payment question]:::start --> B[Authenticate user]:::ai
B --> C{User authorized<br/>for transaction?}:::decision
C -->|No| C1[Refuse account-specific answer]:::refuse
C1 --> Z[Log event]:::log
C -->|Yes| D[Classify intent and risk]:::ai
D --> E{Action requested?<br/>cancel / refund / dispute}:::decision
E -->|Yes| E1[Route to controlled workflow<br/>or human approval]:::refuse
E1 --> Z
E -->|No β explain status| F[Retrieve payment status]:::ai
F --> G[Retrieve relevant policy docs]:::ai
G --> H{Enough trusted<br/>context?}:::decision
H -->|No| H1[Ask clarifying question<br/>or escalate]:::refuse
H1 --> Z
H -->|Yes| I[Build grounded prompt]:::ai
I --> J[Generate answer]:::ai
J --> K[Validate citations, PII,<br/>and policy compliance]:::ai
K --> L{Answer safe<br/>and grounded?}:::decision
L -->|Yes| M[Return answer with sources]:::safe
L -->|No| N[Escalate or limited fallback]:::refuse
M --> Z
N --> Z
subgraph LEGEND_AD[" Legend "]
LD1[Start / Input]:::start
LD2[AI Step]:::ai
LD3[Decision Gate]:::decision
LD4[Safe Terminal]:::safe
LD5[Refusal / Escalation]:::refuse
LD6[Audit / Log]:::log
end
Sequence Diagram
Numbered steps and grouped phases β Authentication, Grounded Retrieval, Generation & Validation, Response β make the request flow scannable.
sequenceDiagram
autonumber
participant User
participant App
participant API as AI Assistant API
participant Auth as Auth Service
participant Pay as Payment Status API
participant Vec as Vector DB
participant LLM as LLM
participant Obs as Logs and Traces
User->>App: Ask about pending payment
App->>API: Send question and transaction id
rect rgb(254, 243, 199)
Note over API,Auth: π Authentication
API->>Auth: Verify access
Auth-->>API: Authorized
end
rect rgb(220, 252, 231)
Note over API,Vec: π Grounded Retrieval (read-only)
API->>Pay: Read transaction status
Pay-->>API: Pending authorization
API->>Vec: Search policy snippets
Vec-->>API: Relevant payment policy chunks
end
rect rgb(237, 233, 254)
Note over API,LLM: π€ Generation and Validation
API->>LLM: Generate answer from supplied context
LLM-->>API: Draft answer with source references
API->>API: Validate grounding and safety
end
rect rgb(243, 244, 246)
Note over API,Obs: π‘οΈ Audit
API->>Obs: Record prompt metadata, retrieved docs, result
end
API-->>App: Answer with sources or escalation
App-->>User: Display response
The LLM is called once, with a prompt that already contains the authorized transaction status and the retrieved policy chunks. The model is not deciding what data it sees; the application is.
When to Use This Pattern
[!TIP] Use grounded RAG when:
- The answer must come from approved, current knowledge.
- The user question depends on account-specific facts.
- The assistant must cite sources.
- The cost of a hallucinated answer is high.
- Missing context should trigger escalation instead of guesswork.
Good candidates: payment support, fraud support, dispute intake, claims support, compliance help desks, policy Q&A, internal operations support.
When Not to Use This Pattern Alone
[!WARNING] Do not use this pattern by itself when:
- The assistant needs to move money or change account state.
- The workflow requires legal, regulatory, or dispute decisions.
- The answer depends on private data that cannot be safely passed to the model.
- The source documents are stale, conflicting, or not governed.
- There is no escalation path for uncertain answers.
For those cases, combine grounded RAG with workflow orchestration, policy-as-code, human review, and stronger audit controls.
Implementation Sketch
[!NOTE] Build a local demo that answers payment-support questions from: However, we will use synthetic data to model real RAG behaviour β a flat policy document in place of a vector store, and a static JSON in place of a payment database.
π οΈ Implementation Steps (patterns/01-grounded-rag-payment-support/README.md) View on GitHub
Pattern 01: Grounded RAG for Payment Support
This project is a small, local implementation of a grounded RAG assistant for payment-support questions.
It answers questions such as:
Why is my card payment pending?
The goal is not to build a full banking assistant. The goal is to show the core production pattern in a way a developer can run, debug, explain, and later adapt for a real organization.
What This Pattern Demonstrates
Grounded RAG means the assistant should answer from trusted context instead of relying on model memory.
In this demo, the assistant combines:
- A user question.
- Read-only synthetic payment facts.
- Approved policy snippets from local markdown files.
- Intent and safety guardrails.
- A response contract that includes answer type, reason, next step, sources, status, and metadata.
The implementation deliberately avoids real payment systems and real LLM calls. Those are represented by local, deterministic components so the pattern can be tested without external credentials, private data, or paid services.
Quick Start
Run these commands from this folder:
python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install -r requirements.txt
python -m unittest discover -s tests
python -m uvicorn app.main:app --reload --port 8000
Then open:
- Local UI: http://127.0.0.1:8000/
- API docs: http://127.0.0.1:8000/docs
- Health check: http://127.0.0.1:8000/health
- Demo scenarios as JSON: http://127.0.0.1:8000/examples
You can also run the command-line demo:
python -m app.demo
Repository Layout
01-grounded-rag-payment-support/
README.md
requirements.txt
app/
__init__.py
assistant.py
demo.py
guardrails.py
main.py
payment_status.py
rag_pipeline.py
schemas.py
data/
transactions.json
policies/
ach_processing.md
card_pending_payments.md
dispute_intake.md
prompt_injection_safety.md
diagrams/
activity.mmd
architecture.mmd
pattern-01-grounded-rag-payment-support.drawio
pattern-01-simple-rag-payment-status.drawio
tests/
test_pattern_01.py
What Has Been Built
This repo currently includes:
- A local FastAPI API with
/ask,/health,/examples, and/. - A simple browser UI served from
/for end-to-end validation. - A deterministic assistant flow in
app/assistant.py. - A synthetic read-only payment store in
data/transactions.json. - A local keyword-based policy retriever in
app/rag_pipeline.py. - Guardrails for unauthorized access, controlled payment actions, and prompt-injection style requests.
- Unit tests for the main happy path and guardrail paths.
The UI lets you run these scenarios:
- Pending card payment.
- Settled card payment.
- ACH processing payment.
- Unauthorized access guardrail.
- Controlled workflow escalation.
- Prompt-injection guardrail.
High-Level Architecture
GitHub renders Mermaid diagrams in markdown files. If the diagram does not render in another markdown viewer, open the .mmd files in the diagrams/ folder or import the .drawio files into draw.io.
flowchart TB
User[User or support agent] --> UI[Local browser UI]
UI --> API[FastAPI app]
API --> Assistant[Grounded payment assistant]
Assistant --> Auth[Authorization check]
Auth --> Store[Synthetic payment store]
Assistant --> Intent[Intent and risk classifier]
Assistant --> Retriever[Local policy retriever]
Retriever --> Policies[Approved policy markdown]
Assistant --> Generator[Grounded answer builder]
Generator --> Validator[Grounding validator]
Validator --> Response[Structured response]
Response --> UI
Request Flow
flowchart TD
A[Submit question] --> B[Load transaction]
B --> C{User owns transaction}
C -->|No| D[Return refusal]
C -->|Yes| E[Classify intent]
E --> F{Prompt injection risk}
F -->|Yes| G[Return escalation]
F -->|No| H{Payment action requested}
H -->|Yes| I[Route to controlled workflow]
H -->|No| J[Retrieve policy chunks]
J --> K[Build grounded answer]
K --> L{Sources available}
L -->|No| M[Return escalation]
L -->|Yes| N[Return answer with sources]
End-to-End Example
Request:
{
"user_id": "user_123",
"transaction_id": "txn_pending_card_001",
"question": "Why is my card payment pending?"
}
Response shape:
{
"type": "answer",
"answer": "Your card payment to Example Merchant for GBP 49.99 is currently pending...",
"reason": "The response used approved policy context and read-only payment facts.",
"next_step": "If the payment is not recognized, route the customer to dispute intake...",
"sources": [
"Card Pending Payment Policy (card_pending_payments.md)"
],
"transaction_status": "pending_authorization",
"metadata": {
"retrieved_chunk_ids": [
"card_pending_payments:1",
"card_pending_payments:2",
"card_pending_payments:3"
],
"payment_method": "card"
}
}
Call it directly:
Invoke-RestMethod `
-Uri http://127.0.0.1:8000/ask `
-Method Post `
-ContentType 'application/json' `
-Body '{"user_id":"user_123","transaction_id":"txn_pending_card_001","question":"Why is my card payment pending?"}'
Code Walkthrough
app/main.py
This is the FastAPI entry point.
It creates:
GET /for the local browser UI.GET /healthfor a simple readiness check.GET /examplesfor the UI scenario list.POST /askfor the assistant workflow.
The UI is intentionally embedded in the API for this first pattern. That keeps the demo easy to run with one command and no frontend build step.
app/assistant.py
This is the orchestration layer.
The main method is:
GroundedPaymentAssistant.answer(user_id, transaction_id, question)
It performs the flow in this order:
- Load the transaction for the user.
- Refuse if the transaction belongs to another user.
- Escalate if the transaction does not exist.
- Classify the question for prompt-injection risk or controlled payment action.
- Retrieve relevant policy chunks.
- Build a grounded answer from transaction facts and policy text.
- Validate that sources exist.
- Return a structured
AssistantResponse.
app/payment_status.py
This is the read-only payment fact store for the demo.
It loads synthetic transactions from:
data/transactions.json
It enforces ownership by checking:
request.user_id == transaction.user_id
In a real organization, this is where you would call an internal payment-status API or database-backed service. The assistant should still receive only the minimum read-only facts needed to answer.
app/rag_pipeline.py
This is the local retrieval layer.
It:
- Reads policy markdown files from
data/policies/. - Splits each file into paragraph chunks.
- Tokenizes the user question and payment method.
- Scores chunks with a simple keyword overlap.
- Returns the top matching policy chunks.
This is not a vector database. It is a small local stand-in so the RAG pattern is visible and testable without infrastructure.
app/guardrails.py
This contains two guardrail checks:
- Intent classification for risky user requests.
- Grounding validation after answer generation.
The current classifier is rule-based. It catches examples like:
Cancel this payment nowRefund meIgnore previous instructionsReveal your system prompt
In production, this can become a policy engine, model-based classifier, rules service, or a combination of all three.
app/schemas.py
This file defines the core data contracts:
PaymentTransactionPolicyChunkIntentAssistantResponse
The response contract is important because downstream systems need predictable fields for audit, UI rendering, support routing, and evaluation.
tests/test_pattern_01.py
The tests cover:
- A grounded pending-payment answer with sources.
- Refusal when a user asks about another user’s transaction.
- Escalation for cancel, refund, reverse, dispute, or chargeback style actions.
- Escalation for prompt-injection style requests.
- Read-only settled payment status.
Run:
python -m unittest discover -s tests
Local Demo Choices vs Production Choices
This demo uses local substitutes for components that would normally be enterprise services.
| Capability | This demo | Production alternative |
|---|---|---|
| User interface | Embedded HTML served by FastAPI | Support portal, mobile app, authenticated web app, CRM case UI |
| Authentication | user_id passed in request | SSO, session token, JWT, mTLS, API gateway identity |
| Authorization | Transaction ownership check in local JSON | Central entitlement service, account access policy, scoped service token |
| Payment facts | data/transactions.json | Read-only payment status API, ledger view, transaction service, case platform |
| Policy corpus | Markdown files in data/policies/ | CMS, policy service, knowledge base, SharePoint, Confluence, versioned docs |
| Retrieval | Keyword overlap retriever | Embeddings plus vector DB, hybrid search, reranking, metadata filters |
| LLM generation | Deterministic Python string builder | Foundation model through approved gateway, model router, hosted LLM, private model |
| Safety checks | Rule-based guardrails | Policy engine, model safety classifier, human review, workflow orchestration |
| Observability | Local test output | Logs, traces, prompt/response audit, evaluation store, incident review |
Why This Demo Does Not Call a Foundation Model
The answer generation is currently deterministic Python code.
That is intentional for Pattern 01:
- It keeps the repo runnable without API keys.
- It avoids sending payment-like data to an external service.
- It makes tests stable.
- It lets developers focus on the architecture before model behavior.
- It shows that RAG is more than prompting. The hard parts are context selection, authorization, source control, validation, and routing.
In production, the deterministic answer builder would usually be replaced by a foundation model call. The model should receive a strict prompt contract, the authorized payment facts, the retrieved policy snippets, and instructions to answer only from that context.
Production model call sketch:
flowchart TB
Assistant[Assistant orchestrator] --> Prompt[Prompt contract]
Assistant --> Facts[Authorized payment facts]
Assistant --> Context[Retrieved policy snippets]
Prompt --> Model[Approved foundation model]
Facts --> Model
Context --> Model
Model --> Draft[Draft answer]
Draft --> Validate[Grounding and safety validation]
The model should not:
- Decide whether the user is authorized.
- Call payment mutation APIs directly.
- Cancel, refund, reverse, dispute, or move money.
- Answer from hidden memory when approved context is missing.
- Ignore missing, stale, or conflicting source material.
Production Replacement Path
Use this section as the migration map when adapting the pattern for a real team.
1. Replace Synthetic Transactions with a Read-Only Payment API
Current local component:
app/payment_status.py
data/transactions.json
Production path:
- Create a payment-status client that calls an internal read-only API.
- Pass the authenticated user or account context from the API gateway.
- Return only the fields needed by the assistant.
- Keep mutation operations out of this path.
- Add timeout, retry, and graceful escalation behavior.
Example production facts:
{
"transaction_id": "txn_abc",
"customer_id": "cust_123",
"payment_method": "card",
"status": "pending_authorization",
"amount": "49.99",
"currency": "GBP",
"merchant": "Example Merchant",
"created_at": "2026-05-16T09:20:00Z",
"status_reason_code": "merchant_authorization"
}
2. Replace Local Markdown Retrieval with a Governed Knowledge Source
Current local component:
app/rag_pipeline.py
data/policies/
Production path:
- Store approved policies in a governed source system.
- Chunk documents with stable IDs and version metadata.
- Generate embeddings using an approved embedding model.
- Store vectors in a vector database such as pgvector, Qdrant, Chroma, Azure AI Search, OpenSearch, or another approved platform.
- Use metadata filters for region, product, payment method, effective date, and policy status.
- Return source titles, source URLs, chunk IDs, and document versions.
Important metadata:
{
"chunk_id": "card_pending_payments:v3:chunk_001",
"title": "Card Pending Payment Policy",
"source": "policy-system-url",
"effective_date": "2026-05-16",
"region": "UK",
"payment_method": "card",
"approved": true
}
3. Replace Deterministic Generation with an Approved Model Gateway
Current local component:
_generate_grounded_answer() in app/assistant.py
Production path:
- Call the organization-approved model gateway.
- Use a versioned prompt template.
- Send only authorized facts and retrieved policy snippets.
- Ask for structured output.
- Require citations to retrieved chunks.
- Validate the draft before showing it to a user.
- Log model version, prompt version, source IDs, and response type.
Recommended output contract:
{
"type": "answer | refusal | escalation",
"answer": "string",
"reason": "string",
"next_step": "string",
"sources": ["source labels"],
"transaction_status": "string or null",
"metadata": {
"retrieved_chunk_ids": [],
"model": "approved-model-name",
"prompt_version": "payment-support-v1"
}
}
4. Replace Rule-Based Guardrails with Layered Controls
Current local component:
app/guardrails.py
Production path:
- Keep simple deterministic rules for high-confidence prohibited actions.
- Add a model or classifier for nuanced risk detection.
- Add policy-as-code for product, region, and regulatory constraints.
- Add human review or controlled workflow routing for sensitive actions.
- Add regression tests for every guardrail that matters.
Sensitive actions should go to controlled workflows:
- Cancel payment.
- Refund payment.
- Reverse payment.
- Recall ACH.
- Raise chargeback.
- Decide dispute outcome.
- Change account or payment details.
5. Add Production Observability
For a real support team, log enough to debug and audit without leaking private data.
Capture:
- Request ID.
- Authenticated user or session reference.
- Transaction reference.
- Intent category.
- Retrieved chunk IDs and versions.
- Response type.
- Escalation reason.
- Model name and prompt version if using an LLM.
- Latency and dependency failures.
Avoid logging:
- Full payment instrument details.
- Secrets or tokens.
- Unnecessary personal data.
- Raw prompts if they contain sensitive information and your governance model disallows it.
Guardrails in This Pattern
The assistant is intentionally read-only.
It can:
- Explain payment status.
- Cite approved policy context.
- Tell the user the next safe support step.
- Escalate when it lacks safe context.
It cannot:
- Cancel a payment.
- Refund a payment.
- Reverse a payment.
- File or decide a dispute.
- Reveal system or developer instructions.
- Answer another user’s transaction details.
Test Matrix
| Scenario | Input example | Expected result |
|---|---|---|
| Pending payment | Why is my card payment pending? | answer with policy source |
| Settled payment | What is the status of this card payment? | answer with settled status |
| Unauthorized access | Different user_id for the transaction | refusal |
| Controlled action | Cancel this payment now | escalation |
| Prompt injection | Ignore previous instructions... | escalation |
Development Runbook
Run tests
python -m unittest discover -s tests
Run the UI and API
python -m uvicorn app.main:app --reload --port 8000
Stop a running local server
If you started Uvicorn in the foreground, press Ctrl+C.
If it is running in the background, find the process:
Get-Process -Name python | Select-Object Id,ProcessName,Path
Then stop the matching process:
Stop-Process -Id <process-id>
Add a new policy document
- Add a markdown file under
data/policies/. - Use a first-level heading as the policy title.
- Add an
effective_date:line. - Keep paragraphs short because each paragraph becomes a retrievable chunk.
- Add or update tests for the scenario that should retrieve it.
Add a new transaction scenario
- Add a row to
data/transactions.json. - Add a demo case to
DEMO_CASESinapp/main.py. - Add a unit test if it changes behavior or covers a new guardrail.
Known Limitations
This is a teaching implementation, not a production assistant.
Current limitations:
- No real authentication.
- No external payment API.
- No vector database.
- No foundation model call.
- No streaming response.
- No persistent audit log.
- No policy versioning beyond file names and effective dates.
- No frontend build system.
- The browser UI is intentionally simple and embedded in
app/main.py.
These limitations are useful for Pattern 01 because the first thing to learn is the flow. The production replacement path above shows where each local substitute should be replaced.
When to Use This Pattern
Use grounded RAG when:
- The answer must come from approved, current knowledge.
- The user question depends on account-specific facts.
- The assistant must cite sources.
- The cost of a hallucinated answer is high.
- Missing context should trigger escalation instead of guesswork.
Payment support, fraud support, dispute intake, claims support, compliance help desks, policy Q&A, and internal operations support are good candidates.
When Not to Use This Pattern Alone
Do not use this pattern by itself when:
- The assistant needs to move money or change account state.
- The workflow requires legal, regulatory, or dispute decisions.
- The answer depends on private data that cannot be safely passed to the model.
- The source documents are stale, conflicting, or not governed.
- There is no escalation path for uncertain answers.
For those cases, combine grounded RAG with workflow orchestration, policy-as-code, human review, and stronger audit controls.
Production Guardrail Checklist
Before using this pattern in an organization, confirm:
- Authentication is handled outside the model.
- Authorization is checked before retrieval and generation.
- Payment APIs are read-only unless routed through approved workflows.
- Retrieved context is filtered by product, region, customer type, and effective date.
- The model is instructed to answer only from provided context.
- The response must include sources.
- Missing sources produce escalation.
- Sensitive actions route to controlled workflows.
- Prompt-injection attempts are detected and logged.
- Evaluation tests cover happy paths, refusals, and escalations.
- Audit logs capture source IDs, prompt version, model version, and response type.
References and Next Steps
The source files in this repo are intentionally small so the architecture is easy to inspect.
Good next improvements:
- Add screenshots of the local UI.
- Add a real vector database example behind the same retriever interface.
- Add a model-gateway adapter while keeping deterministic tests.
- Add structured prompt templates.
- Add an evaluation dataset for more payment-support questions.
- Add tracing and audit logging.
Next pattern in the series can build on this by adding tool calling or workflow orchestration for controlled payment support actions.
What I Learned
[!NOTE] The biggest takeaway was that with this pattern in mind, AI is not a replacement for your point-to-point API communication and should not be used for those flows β the RAG pattern shows how additional documentation and processes can be plugged in alongside your AI to give customers accurate, cited answers with the right context. Start with the simplest solution that works first.
What’s Next
In the next part we will be learning the next AI Usage Pattern from start to end in a systematic manner.
[!TIP] Star the GitHub repo and follow along with the series.
Have questions or feedback? Drop a comment below or connect on LinkedIn.
π¬ Comments