Novamind AI/ML Security Pipeline
Baseline SAST + quality scanning on an ML pipeline, with the gaps named honestly
Three-stage ML pipeline - data, training, serving API. Bandit and Pylint run on every push, with findings committed back to the repo as artifacts. Not full AI threat coverage. A floor that most ML teams do not have yet.
Year
2026
Role
DevSecOps for ML
Stack
Python, GitHub Actions
Coverage
SAST + quality
ML PipelineBanditPylintSASTGitHub ActionsEvidence Committed
github.com/gocko1004/novamind-ai-security-pipeline ->High-level design
The ML pipeline, the data flow,
and where each scanner lives.
and where each scanner lives.
Three-stage ML pipeline on a single GitHub repo. Numbered arrows trace the commit-to-prediction path, 1 through 8. E is the evidence loop. Full step names in the legend.
Actor
ML Engineer
commits to main
Raw dataset
local / object store
External- github.com
GitHub repo
source of truth
Actions runner
security-scan.yml
On every push
security-reports/
committed back
Diffable history
ML Pipeline- python 3.11 / scikit-learn
data_pipeline.py
ingest + clean
Bandit
train.py
fit model
BanditPylint
api.py
serving endpoint
Bandit
model.pkl
pickle artefact / not scanned
pickle deserialisationsupply risk
Consumer
Client
HTTP /predict
1git push
2trigger
3SAST scan (Bandit / Pylint)
4raw data in
5processed dataset
6save pickle
7load_model()
8predict
Ecommit scan artefacts
What is covered
Bandit SAST on all three Python stages → Pylint quality on train.py → both reports committed to security-reports/ on every push. Diffable, not ephemeral.
What is not covered yet
The pickle artefact is the largest blind spot. Model scanning, adversarial input tests on /predict, training-data poisoning checks - next iteration. The case study says so out loud.
A floor, not a ceiling. SAST on every stage is the baseline most ML teams still skip.
What is not here yet
Model-artifact scanning for pickle-based payloads. Training-data poisoning detection. Adversarial input testing against the serving API. Garak on the endpoint.
These belong in the next iteration. I am not claiming the current version replaces them - I am claiming it is the starting floor most ML teams have not built.
What this shows about me
I apply DevSecOps to ML, not just web apps.
The same pipeline discipline, different attack surface.
I commit the evidence.
security-reports/ gets the scan output on every push. Diffable history.
I know the difference between SAST and AI security.
I do not conflate them. The case study says so.
I ship baselines, not perfect systems.
A floor others can build on beats a ceiling nobody reaches.
Outcome
A working scan baseline on an ML pipeline, with evidence shipped as repo artifacts. The case study names exactly what the next layer of AI-specific controls needs to cover.
ML stages covered3 (data, train, serve)
ScannersBandit + Pylint
Evidence artifactsJSON + text committed
Next layerModel + adversarial testing
Honest framingBaseline, not ceiling
Coursework originSecureAI hands-on lab