Scan Before You Adopt: Why Every Codebase Is Innocent Until Proven Safe

There is a moment that happens in almost every engineering team. Someone finds a library, a script, a repo that does exactly what they need. It looks clean. The README is polished. The GitHub stars are respectable. So they pull it in, wire it up, and ship.

Nobody audited it.

This is how supply chain attacks work. Not with brute force — with trust. The 2020 SolarWinds breach compromised 18,000 organizations not because their perimeters failed, but because a build dependency was silently poisoned. The 2021 ua-parser-js incident injected a crypto miner into a package with 8 million weekly downloads. The 2022 PyTorch nightlies shipped with a malicious torchvision dependency for three days before anyone noticed.

In every case, the attack vector was the same: code someone adopted without looking.

The Problem Is Culture, Not Capability

The tools to audit code have existed for years. Static analyzers. Dependency CVE scanners. Secret detectors. Credential pattern matchers. None of them are hard to use. The failure isn't technical — it's workflow.

Nobody scans because nobody made it a habit. Because it wasn't in the checklist. Because there was no checklist.

The same engineering teams that run automated security gates on their own CI pipelines will clone a stranger's repo and integrate it into production within the hour. That asymmetry is the vulnerability.

◉SIGNAL

The threat model isn't just malicious actors. It's abandoned packages with unpatched CVEs, accidental secrets committed to public repos, and dangerous patterns left by developers who didn't know better. Any of these can become your problem the moment you adopt the code.

What We Actually Check

When we scan a codebase before adoption, we want answers to six specific questions:

1. Are there committed secret files?

Does the repo contain .env, credentials.json, id_rsa, or any private key material — either currently tracked or in git history? A developer who once committed their AWS keys and then deleted the file doesn't realize those keys still exist in every clone of that repo's history.

2. Are there hardcoded credentials?

API key prefixes tell a story. sk- is OpenAI. AKIA is AWS. ghp_ is GitHub. xoxb- is Slack. AIza is Google. A quick pattern scan across the codebase finds these in seconds. Hardcoded passwords and connection strings with embedded credentials also fall into this category — they're common in backend services where a developer took a shortcut.

3. Are there dangerous code patterns?

eval() and exec() on dynamic input are classic injection vectors. subprocess.run(..., shell=True) with any user-controlled string is command injection waiting to happen. pickle.loads() on untrusted data is arbitrary code execution. These aren't theoretical — they're the kind of patterns that appear in real code written under deadline pressure by developers who knew the risk and shipped anyway.

4. Where is the code calling home?

Hardcoded external IP addresses and domains using sketchy TLDs (.xyz, .tk, .ml) inside HTTP calls are red flags. Legitimate libraries don't need to beacon to a VPS in an uncommon jurisdiction. This check is especially important for closed-source or lightly-maintained repos.

5. What does static analysis say?

bandit runs the Python AST through a set of security rules and flags anything that warrants a second look. Most MEDIUM findings are false positives — parameterized SQL queries that look like string interpolation, HTTPS calls that look like arbitrary URL opens. But HIGH findings are real, and they matter.

6. Are the dependencies clean?

pip-audit cross-references installed packages against the PyPI Advisory Database and OSV. A repo can have perfectly clean source code and still ship you a version of requests with a known SSRF vulnerability. The dependency tree is part of the attack surface.

The Tool We Built

We got tired of running these checks manually before integrating third-party repos. So we built sec-scan — a single shell script that runs all six checks, produces a clear PASS/WARN/FAIL report, and exits with a code that can be used in automation.

sec-scan ~/Downloads/some-repo

The output looks like this:

── 1. Committed secret files ──
  ✓ PASS  .env in .gitignore
  ✓ PASS  Git history scanned for secret file commits

── 2. Hardcoded secrets & credentials ──
  ✗ FAIL  Possible API key (prefix: AKIA) found:
      ./config/deploy.py:14:  aws_key = "AKIAxxxxxxxxxxxxxxxx"

── 3. Dangerous code patterns ──
  ⚠ WARN  subprocess shell=True found:
      ./scripts/deploy.sh:42:  subprocess.run(cmd, shell=True)

── 5. Python static analysis (bandit) ──
  ✓ PASS  Bandit: no HIGH or MEDIUM findings

── 6. Dependency CVE scan (pip-audit) ──
  ✗ FAIL  CVEs found in dependencies:
      requests 2.28.0  CVE-2023-32681

  VERDICT: DO NOT ADOPT — fix FAIL items first

Exit code 2 means stop. Exit code 1 means review and decide. Exit code 0 means clear.

▲ALPHA

sec-scan is open source and available at github.com/Invictus-Labs/sec-scan-tool. One script, no configuration required, runs on any Mac or Linux machine with bash.

How We Use It

Every external codebase that comes into our stack gets scanned before it touches production. That includes:

Open-source libraries we're evaluating
Repos we're forking and extending
Third-party integrations someone found on GitHub
Any code that didn't originate inside our own org

The scan takes 15-30 seconds for most repos. The cost of skipping it is measured in breach response hours, credential rotation sprints, and trust.

We also run it on our own repos periodically. Static analysis findings drift over time. A developer who introduced a shell=True pattern six months ago to hit a deadline — and forgot to clean it up — becomes a finding on the next scan. That's fine. That's the system working.

The Deeper Point

Security scanning isn't about distrust. It's about having a standard.

The best codebases fail scans. Not because the developers were negligent, but because bandit flags things conservatively, pip-audit doesn't always have context for which CVEs are actually reachable, and patterns that look dangerous in isolation are sometimes fine in context.

The scan is the start of the conversation, not the verdict. What it prevents is the situation where you adopt something, deploy it, and discover six months later — via an incident, not a scan — that the repo had an AWS key committed in 2022 that's been rotated but shows up in the git history, and now your security team wants to know why that key appeared in your codebase.

That conversation is worse than the 30 seconds.

⚔DOCTRINE

The rule is simple: no external code enters the stack without a scan. Not a thorough manual review every time — just the scan. Make it a reflex, not a task.

What This Doesn't Cover

sec-scan is a first-pass filter, not a comprehensive security audit. It won't catch logic vulnerabilities, authentication bypasses, insecure cryptographic implementations, or subtle race conditions. It won't tell you whether the code does what the README claims it does. It won't replace reading the code.

What it will do is catch the things that kill you in practice: the hardcoded secret, the shell injection, the unpatched CVE in a transitive dependency, the .env file that was committed and deleted. The obvious mistakes made under pressure that didn't get caught in review.

Start there. The deeper audit can follow once you know the surface is clean.

The supply chain is the perimeter now. Act accordingly.