AI Prompt Injection Lab
Four OWASP-LLM attacks run against the textbook defenses, with honest verdicts
Live LLM app plus a mock app. Four prompt-injection attacks run end-to-end through regex input filtering and output keyword blocking. Two bypass. Two are blocked. The gap tells you where production defenses actually have to live.
Year
2025
Role
AI Security Researcher
Stack
Python, OpenAI API
Alignment
OWASP LLM Top 10
Prompt InjectionLLM SecurityOpenAI APIRegex FilteringMITRE ATLASScreenshots
github.com/gocko1004/ai-prompt-injection-lab ->High-level design
The LLM app, the defense pipeline,
and where each attack ends.
and where each attack ends.
Numbered arrows trace a single attack from craft to logged verdict, 1 through 8. Full step names in the legend.
Attacker
Adversary
local / curl
attacks/*.txt
4 OWASP LLM01 payloads
HTML commentJSON escaperole overridepirate
Application- Flask app / python 3.11 / localhost:5000
POST /chat
Flask route
Defense pipeline
Input regex
ignore prev / role override
blocks 2/4
LLM client
openai.ChatCompletion
Output keyword
secret / admin / key
blocks 1/4
External + Evidence
OpenAI API
gpt-4o / api.openai.com
TLS 1.3Rate-limited
verdicts.json
pass / fail per attack
screenshots/
one PNG per run
1craft payload
2POST /chat
3raw prompt into app
4passes input regex
5HTTPS to OpenAI
6check output
7log verdict
8screenshot
Where defenses fail
Input regex matches literal strings only. HTML comments and JSON escapes slip through to the model. Output keyword block is a last line - too narrow to cover paraphrase attacks.
What production needs
Semantic input classifier in front of the model, grounded output filtering against the expected reply schema, audit trail per request. The regex + keyword pair is the floor, not the ceiling.
Textbook defenses catch textbook attacks. Two pass, two are blocked. The gap is where real defenses live.
Honest note
Regex and keyword blocking are the starter defenses. They catch the textbook attacks. They miss base64 payloads, instruction smuggling via Markdown links, and anything creative.
Running the lab is how you learn where the floor is. The ceiling needs semantic input classification and retrieval-grounded output filtering.
What this shows about me
I read the OWASP LLM Top 10 as code, not a checklist.
Each attack ships as a file in /attacks/. Each defense is a function.
I can explain why a defense fails.
Not just that it does. The pattern, the payload, the fix.
I write for both operators and executives.
Screenshots + matrix for a CISO. Code for a reviewer.
I know the limits of what I built.
This is a floor. The write-up says so explicitly.
Outcome
Four attacks run end-to-end against a live OpenAI API-backed app and a mock twin. Screenshots as evidence. Honest verdict matrix. A clear map from textbook defense to what production needs next.
Attacks implemented4
Defenses measured2 (regex + output block)
AppsMock + live OpenAI
Verdicts captured2 bypass / 2 blocked
AlignmentOWASP LLM01 (Prompt Injection)
Next stepSemantic classifier