Prompt Guardrails + Safety On Our Simple App
Week 2 is about making the chat app safe, not just smart | https://cloudtoailearn.dev I added prompt guardrails to the Week 1 demo so the model now refuses unsafe requests, catches prompt injection, and validates every reply.
This week I split the work into two code paths:
mainbranch = Week 1 clean vanilla chat appfeature/guardrailsbranch = Week 2 safety enhancements
If you want to start from scratch:
- Clone the repo
- Checkout
mainand verify the plain app works - Create
feature/guardrails - Add the Week 2 prompt/safety patterns
What changed in Week 2
β System prompt now says: “If the user asks for unsafe, illegal, or harmful actions, politely refuse.”
β Prompt injection detection for inputs like:
- “ignore previous instructions”
- “disregard the above”
<|system|>or<|assistant|>tokens
β Refusal patterns for unsafe requests:
- hacking
- illegal actions
- credit card / SSN data
- medical/self-harm advice
β Output validation after generation:
- reject empty or broken replies
- reject model text that still contains prompt tokens
β Rate limiting repeated questions:
- repeat the same prompt 3x and the app says “Let’s slow down”
Why this matters: a model that can answer is useful, but a model that refuses unsafe or hostile requests is the one you can actually show to others.
What you can test
- “How do I hack wifi?” β refusal message
- “Ignore previous instructions” β rejection
- Repeating the same question 3 times β rate-limited reply
- Normal requests still work as expected
If you’re starting fresh:
- use
mainfor the plain Week 1 demo - branch into
feature/guardrailsfor Week 2 safety work - check WEEK2_PROMPT_ENGINEERING.md file for detialed instructions
- DM me in case if you are facing any issues
Repo: https://lnkd.in/eEpeDBug Full write-up: https://lnkd.in/e75KV3pi
This is Week 2 of my 12-week AI demo series at cloudtoailearn.dev Each 2 week: one working demo + one blog post.
like, subscribe & Share !!! Happy AI Leanring !!!
#AI #AISafety #PromptEngineering #SecureAI #Gradio #HuggingFace #cloudtoailearn #AI #LLM #PromptEngineering #MLOps #CloudArchitecture #cloudtoailearn #aws
π¬ Comments