AI Architecture Pattern 01: Grounded RAG For Payment Support

AI does not replace your APIs β€” it explains what they return. Pattern 01 of the AI Architecture Patterns series shows how to ground an AI assistant in trusted policy and live transaction context for a payment-support journey, and where the boundary must stay between AI and money movement.

Why This Pattern Exists

[!INFO] The first mistake many teams make with AI assistance is asking an AI model to answer business questions directly from its training memory. That is risky for any financial institution, especially when you are dealing with payment journeys.

A Customer may ask

[!NOTE] My Payment is pending. Can I cancel it, and when will the money return to my account?

A useful answer depends on current business facts and what exactly the status of the payment is:

  1. The customer’s transaction status.
  2. The payment method.
  3. The bank/card/network settlement stage.
  4. Internal policy.
  5. Region-specific or scheme-specific rules.
  6. Whether the user is authenticated and authorized to see this information.

The model alone does not know those facts. A Grounded Retrieval-Augmented Generation pattern gives the model relevant, approved context before it answers.

[!INFO] The principle is simple: Retrieve trusted information first. Let the model explain using that information. Refuse or escalate when the retrieved context is not enough.

Mental Model

Grounded RAG is not “chat with PDFs.” It is an architecture pattern for controlling what knowledge the AI is allowed to use.

The model becomes a language and reasoning layer over:

  • Trusted policy documents.
  • Product FAQs.
  • Operational runbooks.
  • Transaction facts from internal systems.
  • Guardrails that decide whether the assistant can answer.
  • For regulated or high-trust domains, RAG should be treated as a controlled information supply chain.

High-Level Architecture

Diagrams below use a shared colour palette so component roles and trust boundaries are visible at a glance. Subgraphs mark the responsibility split between the user edge, the AI plane, grounded knowledge, source-of-truth APIs, and governance.

flowchart TB
    classDef user fill:#E0F2FE,stroke:#0369A1,color:#0C4A6E,stroke-width:2px
    classDef edge fill:#FEF3C7,stroke:#B45309,color:#78350F,stroke-width:2px
    classDef ai fill:#EDE9FE,stroke:#6D28D9,color:#4C1D95,stroke-width:2px
    classDef data fill:#DCFCE7,stroke:#15803D,color:#14532D,stroke-width:2px
    classDef external fill:#FEE2E2,stroke:#B91C1C,color:#7F1D1D,stroke-width:2px,stroke-dasharray: 4 3
    classDef governance fill:#F3F4F6,stroke:#374151,color:#111827,stroke-width:1px

    U[Customer or Support Agent]:::user
    subgraph EDGE["πŸ”“ Edge / Trust Boundary"]
        UI[Web or Support Portal]:::edge
        AUTH[Authentication and Authorization]:::edge
    end
    subgraph AI_PLANE["πŸ€– AI Plane"]
        API[AI Assistant API]:::ai
        CLASSIFY[Intent and Risk Classifier]:::ai
        RETRIEVE[Retriever]:::ai
        CONTEXT[Context Builder]:::ai
        LLM[LLM Response Generator]:::ai
        VALIDATE[Answer Validator]:::ai
    end
    subgraph KNOWLEDGE["πŸ“š Grounded Knowledge"]
        DOCS[Approved Policy Docs and FAQs]:::data
        CHUNK[Chunk and Embed Pipeline]:::data
        VDB[(Vector Database)]:::data
    end
    subgraph EXT["πŸ”Œ Source-of-Truth APIs (read-only)"]
        PAYAPI[Payment Status API]:::external
        PAYDB[(Payment System)]:::external
    end
    subgraph GOV["πŸ›‘οΈ Governance"]
        AUDIT[(Audit Logs)]:::governance
        OBS[Tracing and Evaluation]:::governance
        HUMAN[Human Escalation Queue]:::governance
    end

    U --> UI --> AUTH --> API
    API --> CLASSIFY --> RETRIEVE
    DOCS --> CHUNK --> VDB --> RETRIEVE
    API --> PAYAPI
    PAYDB --> PAYAPI
    RETRIEVE --> CONTEXT
    PAYAPI --> CONTEXT
    CLASSIFY --> CONTEXT
    CONTEXT --> LLM --> VALIDATE
    VALIDATE -->|Grounded and Safe| UI
    VALIDATE -->|Low Confidence or Restricted| HUMAN
    API --> AUDIT
    VALIDATE --> OBS
    HUMAN --> OBS

    subgraph LEGEND[" Legend "]
        L1[User]:::user
        L2[Edge / Trust]:::edge
        L3[AI Plane]:::ai
        L4[Grounded Data]:::data
        L5[External API]:::external
        L6[Governance]:::governance
    end

[!TIP] A logged-in customer asks: “I made a card payment yesterday and it is still pending. Can I cancel it?”

[!NOTE] To address the customer query β€” simplified journey:

  1. User authenticates in the banking app.
  2. Assistant detects this is a payment-status question.
  3. System retrieves the transaction status from a read-only payment API.
  4. System retrieves approved policy snippets about pending card payments.
  5. The model generates an answer using only retrieved policy and transaction facts.
  6. If the status is ambiguous, the assistant escalates to support instead of guessing.

[!NOTE] Example Answer Shape Your payment is currently pending, which usually means the merchant has authorized the amount but final settlement has not completed yet. Based on the payment policy, pending card payments usually cannot be cancelled by the bank while the merchant authorization is active. If the merchant releases the authorization, the amount is normally made available again after the hold expires. I can help you raise a support request if you do not recognize this payment.

Sources:

  • Card payment pending policy
  • Transaction status from payment service

[!CAUTION] Important: the assistant is not moving money, cancelling payments, or making a dispute decision. It is explaining status and routing the user safely.

Activity Diagram

Same colour language applied here β€” green is a safe terminal state, red is a refusal or escalation, yellow is a decision gate, purple is an AI step, grey is governance/logging.

flowchart TD
    classDef start fill:#E0F2FE,stroke:#0369A1,color:#0C4A6E,stroke-width:2px
    classDef ai fill:#EDE9FE,stroke:#6D28D9,color:#4C1D95,stroke-width:2px
    classDef decision fill:#FEF3C7,stroke:#B45309,color:#78350F,stroke-width:2px
    classDef safe fill:#DCFCE7,stroke:#15803D,color:#14532D,stroke-width:2px
    classDef refuse fill:#FEE2E2,stroke:#B91C1C,color:#7F1D1D,stroke-width:2px
    classDef log fill:#F3F4F6,stroke:#374151,color:#111827,stroke-width:1px

    A[User asks payment question]:::start --> B[Authenticate user]:::ai
    B --> C{User authorized<br/>for transaction?}:::decision
    C -->|No| C1[Refuse account-specific answer]:::refuse
    C1 --> Z[Log event]:::log

    C -->|Yes| D[Classify intent and risk]:::ai
    D --> E{Action requested?<br/>cancel / refund / dispute}:::decision
    E -->|Yes| E1[Route to controlled workflow<br/>or human approval]:::refuse
    E1 --> Z

    E -->|No β€” explain status| F[Retrieve payment status]:::ai
    F --> G[Retrieve relevant policy docs]:::ai
    G --> H{Enough trusted<br/>context?}:::decision
    H -->|No| H1[Ask clarifying question<br/>or escalate]:::refuse
    H1 --> Z

    H -->|Yes| I[Build grounded prompt]:::ai
    I --> J[Generate answer]:::ai
    J --> K[Validate citations, PII,<br/>and policy compliance]:::ai
    K --> L{Answer safe<br/>and grounded?}:::decision
    L -->|Yes| M[Return answer with sources]:::safe
    L -->|No| N[Escalate or limited fallback]:::refuse
    M --> Z
    N --> Z

    subgraph LEGEND_AD[" Legend "]
        LD1[Start / Input]:::start
        LD2[AI Step]:::ai
        LD3[Decision Gate]:::decision
        LD4[Safe Terminal]:::safe
        LD5[Refusal / Escalation]:::refuse
        LD6[Audit / Log]:::log
    end

Sequence Diagram

Numbered steps and grouped phases β€” Authentication, Grounded Retrieval, Generation & Validation, Response β€” make the request flow scannable.

sequenceDiagram
    autonumber
    participant User
    participant App
    participant API as AI Assistant API
    participant Auth as Auth Service
    participant Pay as Payment Status API
    participant Vec as Vector DB
    participant LLM as LLM
    participant Obs as Logs and Traces

    User->>App: Ask about pending payment
    App->>API: Send question and transaction id

    rect rgb(254, 243, 199)
    Note over API,Auth: πŸ”“ Authentication
    API->>Auth: Verify access
    Auth-->>API: Authorized
    end

    rect rgb(220, 252, 231)
    Note over API,Vec: πŸ“š Grounded Retrieval (read-only)
    API->>Pay: Read transaction status
    Pay-->>API: Pending authorization
    API->>Vec: Search policy snippets
    Vec-->>API: Relevant payment policy chunks
    end

    rect rgb(237, 233, 254)
    Note over API,LLM: πŸ€– Generation and Validation
    API->>LLM: Generate answer from supplied context
    LLM-->>API: Draft answer with source references
    API->>API: Validate grounding and safety
    end

    rect rgb(243, 244, 246)
    Note over API,Obs: πŸ›‘οΈ Audit
    API->>Obs: Record prompt metadata, retrieved docs, result
    end

    API-->>App: Answer with sources or escalation
    App-->>User: Display response

The LLM is called once, with a prompt that already contains the authorized transaction status and the retrieved policy chunks. The model is not deciding what data it sees; the application is.


When to Use This Pattern

[!TIP] Use grounded RAG when:

  1. The answer must come from approved, current knowledge.
  2. The user question depends on account-specific facts.
  3. The assistant must cite sources.
  4. The cost of a hallucinated answer is high.
  5. Missing context should trigger escalation instead of guesswork.

Good candidates: payment support, fraud support, dispute intake, claims support, compliance help desks, policy Q&A, internal operations support.

When Not to Use This Pattern Alone

[!WARNING] Do not use this pattern by itself when:

  1. The assistant needs to move money or change account state.
  2. The workflow requires legal, regulatory, or dispute decisions.
  3. The answer depends on private data that cannot be safely passed to the model.
  4. The source documents are stale, conflicting, or not governed.
  5. There is no escalation path for uncertain answers.

For those cases, combine grounded RAG with workflow orchestration, policy-as-code, human review, and stronger audit controls.

Implementation Sketch

[!NOTE] Build a local demo that answers payment-support questions from: However, we will use synthetic data to model real RAG behaviour β€” a flat policy document in place of a vector store, and a static JSON in place of a payment database.

πŸ› οΈ Implementation Steps (patterns/01-grounded-rag-payment-support/README.md) View on GitHub

Pattern 01: Grounded RAG for Payment Support

This project is a small, local implementation of a grounded RAG assistant for payment-support questions.

It answers questions such as:

Why is my card payment pending?

The goal is not to build a full banking assistant. The goal is to show the core production pattern in a way a developer can run, debug, explain, and later adapt for a real organization.

What This Pattern Demonstrates

Grounded RAG means the assistant should answer from trusted context instead of relying on model memory.

In this demo, the assistant combines:

  • A user question.
  • Read-only synthetic payment facts.
  • Approved policy snippets from local markdown files.
  • Intent and safety guardrails.
  • A response contract that includes answer type, reason, next step, sources, status, and metadata.

The implementation deliberately avoids real payment systems and real LLM calls. Those are represented by local, deterministic components so the pattern can be tested without external credentials, private data, or paid services.

Quick Start

Run these commands from this folder:

python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install -r requirements.txt
python -m unittest discover -s tests
python -m uvicorn app.main:app --reload --port 8000

Then open:

You can also run the command-line demo:

python -m app.demo

Repository Layout

01-grounded-rag-payment-support/
  README.md
  requirements.txt
  app/
    __init__.py
    assistant.py
    demo.py
    guardrails.py
    main.py
    payment_status.py
    rag_pipeline.py
    schemas.py
  data/
    transactions.json
    policies/
      ach_processing.md
      card_pending_payments.md
      dispute_intake.md
      prompt_injection_safety.md
  diagrams/
    activity.mmd
    architecture.mmd
    pattern-01-grounded-rag-payment-support.drawio
    pattern-01-simple-rag-payment-status.drawio
  tests/
    test_pattern_01.py

What Has Been Built

This repo currently includes:

  • A local FastAPI API with /ask, /health, /examples, and /.
  • A simple browser UI served from / for end-to-end validation.
  • A deterministic assistant flow in app/assistant.py.
  • A synthetic read-only payment store in data/transactions.json.
  • A local keyword-based policy retriever in app/rag_pipeline.py.
  • Guardrails for unauthorized access, controlled payment actions, and prompt-injection style requests.
  • Unit tests for the main happy path and guardrail paths.

The UI lets you run these scenarios:

  • Pending card payment.
  • Settled card payment.
  • ACH processing payment.
  • Unauthorized access guardrail.
  • Controlled workflow escalation.
  • Prompt-injection guardrail.

High-Level Architecture

GitHub renders Mermaid diagrams in markdown files. If the diagram does not render in another markdown viewer, open the .mmd files in the diagrams/ folder or import the .drawio files into draw.io.

flowchart TB
    User[User or support agent] --> UI[Local browser UI]
    UI --> API[FastAPI app]
    API --> Assistant[Grounded payment assistant]
    Assistant --> Auth[Authorization check]
    Auth --> Store[Synthetic payment store]
    Assistant --> Intent[Intent and risk classifier]
    Assistant --> Retriever[Local policy retriever]
    Retriever --> Policies[Approved policy markdown]
    Assistant --> Generator[Grounded answer builder]
    Generator --> Validator[Grounding validator]
    Validator --> Response[Structured response]
    Response --> UI

Request Flow

flowchart TD
    A[Submit question] --> B[Load transaction]
    B --> C{User owns transaction}
    C -->|No| D[Return refusal]
    C -->|Yes| E[Classify intent]
    E --> F{Prompt injection risk}
    F -->|Yes| G[Return escalation]
    F -->|No| H{Payment action requested}
    H -->|Yes| I[Route to controlled workflow]
    H -->|No| J[Retrieve policy chunks]
    J --> K[Build grounded answer]
    K --> L{Sources available}
    L -->|No| M[Return escalation]
    L -->|Yes| N[Return answer with sources]

End-to-End Example

Request:

{
  "user_id": "user_123",
  "transaction_id": "txn_pending_card_001",
  "question": "Why is my card payment pending?"
}

Response shape:

{
  "type": "answer",
  "answer": "Your card payment to Example Merchant for GBP 49.99 is currently pending...",
  "reason": "The response used approved policy context and read-only payment facts.",
  "next_step": "If the payment is not recognized, route the customer to dispute intake...",
  "sources": [
    "Card Pending Payment Policy (card_pending_payments.md)"
  ],
  "transaction_status": "pending_authorization",
  "metadata": {
    "retrieved_chunk_ids": [
      "card_pending_payments:1",
      "card_pending_payments:2",
      "card_pending_payments:3"
    ],
    "payment_method": "card"
  }
}

Call it directly:

Invoke-RestMethod `
  -Uri http://127.0.0.1:8000/ask `
  -Method Post `
  -ContentType 'application/json' `
  -Body '{"user_id":"user_123","transaction_id":"txn_pending_card_001","question":"Why is my card payment pending?"}'

Code Walkthrough

app/main.py

This is the FastAPI entry point.

It creates:

  • GET / for the local browser UI.
  • GET /health for a simple readiness check.
  • GET /examples for the UI scenario list.
  • POST /ask for the assistant workflow.

The UI is intentionally embedded in the API for this first pattern. That keeps the demo easy to run with one command and no frontend build step.

app/assistant.py

This is the orchestration layer.

The main method is:

GroundedPaymentAssistant.answer(user_id, transaction_id, question)

It performs the flow in this order:

  1. Load the transaction for the user.
  2. Refuse if the transaction belongs to another user.
  3. Escalate if the transaction does not exist.
  4. Classify the question for prompt-injection risk or controlled payment action.
  5. Retrieve relevant policy chunks.
  6. Build a grounded answer from transaction facts and policy text.
  7. Validate that sources exist.
  8. Return a structured AssistantResponse.

app/payment_status.py

This is the read-only payment fact store for the demo.

It loads synthetic transactions from:

data/transactions.json

It enforces ownership by checking:

request.user_id == transaction.user_id

In a real organization, this is where you would call an internal payment-status API or database-backed service. The assistant should still receive only the minimum read-only facts needed to answer.

app/rag_pipeline.py

This is the local retrieval layer.

It:

  • Reads policy markdown files from data/policies/.
  • Splits each file into paragraph chunks.
  • Tokenizes the user question and payment method.
  • Scores chunks with a simple keyword overlap.
  • Returns the top matching policy chunks.

This is not a vector database. It is a small local stand-in so the RAG pattern is visible and testable without infrastructure.

app/guardrails.py

This contains two guardrail checks:

  • Intent classification for risky user requests.
  • Grounding validation after answer generation.

The current classifier is rule-based. It catches examples like:

  • Cancel this payment now
  • Refund me
  • Ignore previous instructions
  • Reveal your system prompt

In production, this can become a policy engine, model-based classifier, rules service, or a combination of all three.

app/schemas.py

This file defines the core data contracts:

  • PaymentTransaction
  • PolicyChunk
  • Intent
  • AssistantResponse

The response contract is important because downstream systems need predictable fields for audit, UI rendering, support routing, and evaluation.

tests/test_pattern_01.py

The tests cover:

  • A grounded pending-payment answer with sources.
  • Refusal when a user asks about another user’s transaction.
  • Escalation for cancel, refund, reverse, dispute, or chargeback style actions.
  • Escalation for prompt-injection style requests.
  • Read-only settled payment status.

Run:

python -m unittest discover -s tests

Local Demo Choices vs Production Choices

This demo uses local substitutes for components that would normally be enterprise services.

CapabilityThis demoProduction alternative
User interfaceEmbedded HTML served by FastAPISupport portal, mobile app, authenticated web app, CRM case UI
Authenticationuser_id passed in requestSSO, session token, JWT, mTLS, API gateway identity
AuthorizationTransaction ownership check in local JSONCentral entitlement service, account access policy, scoped service token
Payment factsdata/transactions.jsonRead-only payment status API, ledger view, transaction service, case platform
Policy corpusMarkdown files in data/policies/CMS, policy service, knowledge base, SharePoint, Confluence, versioned docs
RetrievalKeyword overlap retrieverEmbeddings plus vector DB, hybrid search, reranking, metadata filters
LLM generationDeterministic Python string builderFoundation model through approved gateway, model router, hosted LLM, private model
Safety checksRule-based guardrailsPolicy engine, model safety classifier, human review, workflow orchestration
ObservabilityLocal test outputLogs, traces, prompt/response audit, evaluation store, incident review

Why This Demo Does Not Call a Foundation Model

The answer generation is currently deterministic Python code.

That is intentional for Pattern 01:

  • It keeps the repo runnable without API keys.
  • It avoids sending payment-like data to an external service.
  • It makes tests stable.
  • It lets developers focus on the architecture before model behavior.
  • It shows that RAG is more than prompting. The hard parts are context selection, authorization, source control, validation, and routing.

In production, the deterministic answer builder would usually be replaced by a foundation model call. The model should receive a strict prompt contract, the authorized payment facts, the retrieved policy snippets, and instructions to answer only from that context.

Production model call sketch:

flowchart TB
    Assistant[Assistant orchestrator] --> Prompt[Prompt contract]
    Assistant --> Facts[Authorized payment facts]
    Assistant --> Context[Retrieved policy snippets]
    Prompt --> Model[Approved foundation model]
    Facts --> Model
    Context --> Model
    Model --> Draft[Draft answer]
    Draft --> Validate[Grounding and safety validation]

The model should not:

  • Decide whether the user is authorized.
  • Call payment mutation APIs directly.
  • Cancel, refund, reverse, dispute, or move money.
  • Answer from hidden memory when approved context is missing.
  • Ignore missing, stale, or conflicting source material.

Production Replacement Path

Use this section as the migration map when adapting the pattern for a real team.

1. Replace Synthetic Transactions with a Read-Only Payment API

Current local component:

app/payment_status.py
data/transactions.json

Production path:

  • Create a payment-status client that calls an internal read-only API.
  • Pass the authenticated user or account context from the API gateway.
  • Return only the fields needed by the assistant.
  • Keep mutation operations out of this path.
  • Add timeout, retry, and graceful escalation behavior.

Example production facts:

{
  "transaction_id": "txn_abc",
  "customer_id": "cust_123",
  "payment_method": "card",
  "status": "pending_authorization",
  "amount": "49.99",
  "currency": "GBP",
  "merchant": "Example Merchant",
  "created_at": "2026-05-16T09:20:00Z",
  "status_reason_code": "merchant_authorization"
}

2. Replace Local Markdown Retrieval with a Governed Knowledge Source

Current local component:

app/rag_pipeline.py
data/policies/

Production path:

  • Store approved policies in a governed source system.
  • Chunk documents with stable IDs and version metadata.
  • Generate embeddings using an approved embedding model.
  • Store vectors in a vector database such as pgvector, Qdrant, Chroma, Azure AI Search, OpenSearch, or another approved platform.
  • Use metadata filters for region, product, payment method, effective date, and policy status.
  • Return source titles, source URLs, chunk IDs, and document versions.

Important metadata:

{
  "chunk_id": "card_pending_payments:v3:chunk_001",
  "title": "Card Pending Payment Policy",
  "source": "policy-system-url",
  "effective_date": "2026-05-16",
  "region": "UK",
  "payment_method": "card",
  "approved": true
}

3. Replace Deterministic Generation with an Approved Model Gateway

Current local component:

_generate_grounded_answer() in app/assistant.py

Production path:

  • Call the organization-approved model gateway.
  • Use a versioned prompt template.
  • Send only authorized facts and retrieved policy snippets.
  • Ask for structured output.
  • Require citations to retrieved chunks.
  • Validate the draft before showing it to a user.
  • Log model version, prompt version, source IDs, and response type.

Recommended output contract:

{
  "type": "answer | refusal | escalation",
  "answer": "string",
  "reason": "string",
  "next_step": "string",
  "sources": ["source labels"],
  "transaction_status": "string or null",
  "metadata": {
    "retrieved_chunk_ids": [],
    "model": "approved-model-name",
    "prompt_version": "payment-support-v1"
  }
}

4. Replace Rule-Based Guardrails with Layered Controls

Current local component:

app/guardrails.py

Production path:

  • Keep simple deterministic rules for high-confidence prohibited actions.
  • Add a model or classifier for nuanced risk detection.
  • Add policy-as-code for product, region, and regulatory constraints.
  • Add human review or controlled workflow routing for sensitive actions.
  • Add regression tests for every guardrail that matters.

Sensitive actions should go to controlled workflows:

  • Cancel payment.
  • Refund payment.
  • Reverse payment.
  • Recall ACH.
  • Raise chargeback.
  • Decide dispute outcome.
  • Change account or payment details.

5. Add Production Observability

For a real support team, log enough to debug and audit without leaking private data.

Capture:

  • Request ID.
  • Authenticated user or session reference.
  • Transaction reference.
  • Intent category.
  • Retrieved chunk IDs and versions.
  • Response type.
  • Escalation reason.
  • Model name and prompt version if using an LLM.
  • Latency and dependency failures.

Avoid logging:

  • Full payment instrument details.
  • Secrets or tokens.
  • Unnecessary personal data.
  • Raw prompts if they contain sensitive information and your governance model disallows it.

Guardrails in This Pattern

The assistant is intentionally read-only.

It can:

  • Explain payment status.
  • Cite approved policy context.
  • Tell the user the next safe support step.
  • Escalate when it lacks safe context.

It cannot:

  • Cancel a payment.
  • Refund a payment.
  • Reverse a payment.
  • File or decide a dispute.
  • Reveal system or developer instructions.
  • Answer another user’s transaction details.

Test Matrix

ScenarioInput exampleExpected result
Pending paymentWhy is my card payment pending?answer with policy source
Settled paymentWhat is the status of this card payment?answer with settled status
Unauthorized accessDifferent user_id for the transactionrefusal
Controlled actionCancel this payment nowescalation
Prompt injectionIgnore previous instructions...escalation

Development Runbook

Run tests

python -m unittest discover -s tests

Run the UI and API

python -m uvicorn app.main:app --reload --port 8000

Stop a running local server

If you started Uvicorn in the foreground, press Ctrl+C.

If it is running in the background, find the process:

Get-Process -Name python | Select-Object Id,ProcessName,Path

Then stop the matching process:

Stop-Process -Id <process-id>

Add a new policy document

  1. Add a markdown file under data/policies/.
  2. Use a first-level heading as the policy title.
  3. Add an effective_date: line.
  4. Keep paragraphs short because each paragraph becomes a retrievable chunk.
  5. Add or update tests for the scenario that should retrieve it.

Add a new transaction scenario

  1. Add a row to data/transactions.json.
  2. Add a demo case to DEMO_CASES in app/main.py.
  3. Add a unit test if it changes behavior or covers a new guardrail.

Known Limitations

This is a teaching implementation, not a production assistant.

Current limitations:

  • No real authentication.
  • No external payment API.
  • No vector database.
  • No foundation model call.
  • No streaming response.
  • No persistent audit log.
  • No policy versioning beyond file names and effective dates.
  • No frontend build system.
  • The browser UI is intentionally simple and embedded in app/main.py.

These limitations are useful for Pattern 01 because the first thing to learn is the flow. The production replacement path above shows where each local substitute should be replaced.

When to Use This Pattern

Use grounded RAG when:

  • The answer must come from approved, current knowledge.
  • The user question depends on account-specific facts.
  • The assistant must cite sources.
  • The cost of a hallucinated answer is high.
  • Missing context should trigger escalation instead of guesswork.

Payment support, fraud support, dispute intake, claims support, compliance help desks, policy Q&A, and internal operations support are good candidates.

When Not to Use This Pattern Alone

Do not use this pattern by itself when:

  • The assistant needs to move money or change account state.
  • The workflow requires legal, regulatory, or dispute decisions.
  • The answer depends on private data that cannot be safely passed to the model.
  • The source documents are stale, conflicting, or not governed.
  • There is no escalation path for uncertain answers.

For those cases, combine grounded RAG with workflow orchestration, policy-as-code, human review, and stronger audit controls.

Production Guardrail Checklist

Before using this pattern in an organization, confirm:

  • Authentication is handled outside the model.
  • Authorization is checked before retrieval and generation.
  • Payment APIs are read-only unless routed through approved workflows.
  • Retrieved context is filtered by product, region, customer type, and effective date.
  • The model is instructed to answer only from provided context.
  • The response must include sources.
  • Missing sources produce escalation.
  • Sensitive actions route to controlled workflows.
  • Prompt-injection attempts are detected and logged.
  • Evaluation tests cover happy paths, refusals, and escalations.
  • Audit logs capture source IDs, prompt version, model version, and response type.

References and Next Steps

The source files in this repo are intentionally small so the architecture is easy to inspect.

Good next improvements:

  • Add screenshots of the local UI.
  • Add a real vector database example behind the same retriever interface.
  • Add a model-gateway adapter while keeping deterministic tests.
  • Add structured prompt templates.
  • Add an evaluation dataset for more payment-support questions.
  • Add tracing and audit logging.

Next pattern in the series can build on this by adding tool calling or workflow orchestration for controlled payment support actions.

What I Learned

[!NOTE] The biggest takeaway was that with this pattern in mind, AI is not a replacement for your point-to-point API communication and should not be used for those flows β€” the RAG pattern shows how additional documentation and processes can be plugged in alongside your AI to give customers accurate, cited answers with the right context. Start with the simplest solution that works first.


What’s Next

In the next part we will be learning the next AI Usage Pattern from start to end in a systematic manner.

[!TIP] Star the GitHub repo and follow along with the series.


Have questions or feedback? Drop a comment below or connect on LinkedIn.

πŸ’¬ Comments

← Back to all posts