Scan Before You Adopt: Why Every Codebase Is Innocent Until Proven Safe
Every time you pull in external code without auditing it, you're trusting a stranger with the keys to your infrastructure. Here's the process — and the tool — we use to fix that.

There is a moment that happens in almost every engineering team. Someone finds a library, a script, a repo that does exactly what they need. It looks clean. The README is polished. The GitHub stars are respectable. So they pull it in, wire it up, and ship.
Nobody audited it.
This is how supply chain attacks work. Not with brute force — with trust. The 2020 SolarWinds breach compromised 18,000 organizations not because their perimeters failed, but because a build dependency was silently poisoned. The 2021 ua-parser-js incident injected a crypto miner into a package with 8 million weekly downloads. The 2022 PyTorch nightlies shipped with a malicious torchvision dependency for three days before anyone noticed.
In every case, the attack vector was the same: code someone adopted without looking.
The Problem Is Culture, Not Capability
The tools to audit code have existed for years. Static analyzers. Dependency CVE scanners. Secret detectors. Credential pattern matchers. None of them are hard to use. The failure isn't technical — it's workflow.
Nobody scans because nobody made it a habit. Because it wasn't in the checklist. Because there was no checklist.
The same engineering teams that run automated security gates on their own CI pipelines will clone a stranger's repo and integrate it into production within the hour. That asymmetry is the vulnerability.
The threat model isn't just malicious actors. It's abandoned packages with unpatched CVEs, accidental secrets committed to public repos, and dangerous patterns left by developers who didn't know better. Any of these can become your problem the moment you adopt the code.
What We Actually Check
When we scan a codebase before adoption, we want answers to six specific questions:
1. Are there committed secret files?
Does the repo contain .env, credentials.json, id_rsa, or any private key material — either currently tracked or in git history? A developer who once committed their AWS keys and then deleted the file doesn't realize those keys still exist in every clone of that repo's history.
2. Are there hardcoded credentials?
API key prefixes tell a story. sk- is OpenAI. AKIA is AWS. ghp_ is GitHub. xoxb- is Slack. AIza is Google. A quick pattern scan across the codebase finds these in seconds. Hardcoded passwords and connection strings with embedded credentials also fall into this category — they're common in backend services where a developer took a shortcut.
3. Are there dangerous code patterns?
eval() and exec() on dynamic input are classic injection vectors. subprocess.run(..., shell=True) with any user-controlled string is command injection waiting to happen. pickle.loads() on untrusted data is arbitrary code execution. These aren't theoretical — they're the kind of patterns that appear in real code written under deadline pressure by developers who knew the risk and shipped anyway.
4. Where is the code calling home?
Hardcoded external IP addresses and domains using sketchy TLDs (.xyz, .tk, .ml) inside HTTP calls are red flags. Legitimate libraries don't need to beacon to a VPS in an uncommon jurisdiction. This check is especially important for closed-source or lightly-maintained repos.
5. What does static analysis say?
bandit runs the Python AST through a set of security rules and flags anything that warrants a second look. Most MEDIUM findings are false positives — parameterized SQL queries that look like string interpolation, HTTPS calls that look like arbitrary URL opens. But HIGH findings are real, and they matter.
6. Are the dependencies clean?
pip-audit cross-references installed packages against the PyPI Advisory Database and OSV. A repo can have perfectly clean source code and still ship you a version of requests with a known SSRF vulnerability. The dependency tree is part of the attack surface.
The Tool We Built
We got tired of running these checks manually before integrating third-party repos. So we built sec-scan — a single shell script that runs all six checks, produces a clear PASS/WARN/FAIL report, and exits with a code that can be used in automation.
sec-scan ~/Downloads/some-repo
The output looks like this:
── 1. Committed secret files ──
✓ PASS .env in .gitignore
✓ PASS Git history scanned for secret file commits
── 2. Hardcoded secrets & credentials ──
✗ FAIL Possible API key (prefix: AKIA) found:
./config/deploy.py:14: aws_key = "AKIAxxxxxxxxxxxxxxxx"
── 3. Dangerous code patterns ──
⚠ WARN subprocess shell=True found:
./scripts/deploy.sh:42: subprocess.run(cmd, shell=True)
── 5. Python static analysis (bandit) ──
✓ PASS Bandit: no HIGH or MEDIUM findings
── 6. Dependency CVE scan (pip-audit) ──
✗ FAIL CVEs found in dependencies:
requests 2.28.0 CVE-2023-32681
VERDICT: DO NOT ADOPT — fix FAIL items first
Exit code 2 means stop. Exit code 1 means review and decide. Exit code 0 means clear.
sec-scan is open source and available at github.com/Invictus-Labs/sec-scan-tool. One script, no configuration required, runs on any Mac or Linux machine with bash.
How We Use It
Every external codebase that comes into our stack gets scanned before it touches production. That includes:
- Open-source libraries we're evaluating
- Repos we're forking and extending
- Third-party integrations someone found on GitHub
- Any code that didn't originate inside our own org
The scan takes 15-30 seconds for most repos. The cost of skipping it is measured in breach response hours, credential rotation sprints, and trust.
We also run it on our own repos periodically. Static analysis findings drift over time. A developer who introduced a shell=True pattern six months ago to hit a deadline — and forgot to clean it up — becomes a finding on the next scan. That's fine. That's the system working.
The Deeper Point
Security scanning isn't about distrust. It's about having a standard.
The best codebases fail scans. Not because the developers were negligent, but because bandit flags things conservatively, pip-audit doesn't always have context for which CVEs are actually reachable, and patterns that look dangerous in isolation are sometimes fine in context.
The scan is the start of the conversation, not the verdict. What it prevents is the situation where you adopt something, deploy it, and discover six months later — via an incident, not a scan — that the repo had an AWS key committed in 2022 that's been rotated but shows up in the git history, and now your security team wants to know why that key appeared in your codebase.
That conversation is worse than the 30 seconds.
The rule is simple: no external code enters the stack without a scan. Not a thorough manual review every time — just the scan. Make it a reflex, not a task.
What This Doesn't Cover
sec-scan is a first-pass filter, not a comprehensive security audit. It won't catch logic vulnerabilities, authentication bypasses, insecure cryptographic implementations, or subtle race conditions. It won't tell you whether the code does what the README claims it does. It won't replace reading the code.
What it will do is catch the things that kill you in practice: the hardcoded secret, the shell injection, the unpatched CVE in a transitive dependency, the .env file that was committed and deleted. The obvious mistakes made under pressure that didn't get caught in review.
Start there. The deeper audit can follow once you know the surface is clean.
The supply chain is the perimeter now. Act accordingly.
Explore the Invictus Labs Ecosystem
Follow the Signal
If this was useful, follow along. Daily intelligence across AI, crypto, and strategy — before the mainstream catches on.

Foresight v5.0: How I Rebuilt a Prediction Market Bot Around Candle Boundaries
The bot was right. The timing was wrong. v4.x had a fundamental reactive architecture problem — by the time signals scored, the CLOB asks were too expensive. v5.0 solved it with event-driven candle boundaries and predictive early-window scoring.

Hermes: A Political Oracle That Bets on Polymarket Using AI News Intelligence
Political prediction markets don't move on charts — they move on information. Hermes is a Python bot that scores political markets using Grok sentiment, Perplexity probability estimation, and calibration consensus from Metaculus and Manifold. Here's how it works.

Leverage: Porting the Foresight Signal Stack to Crypto Perpetuals
The signal stack I built for prediction markets turns out to work on perpetual futures — with modifications. Here's how a 9-factor scoring engine, conviction-scaled leverage, and six independent risk gates become a perps trading system.