launchd over Cron: The Persistent Service Architecture

There is a failure mode that kills more automated systems than any bug: the scheduled job that crashes silently, runs nothing, and tells no one.

Cron does not know your job crashed. It does not care. It fires the next run at the scheduled time whether the last run succeeded, failed, or never completed. If your job depends on a previous successful run — a database that was populated, a file that was created, a service that needs to be healthy — cron gives you no mechanism to detect or recover from that dependency being broken.

launchd knows your job crashed. And it brings it back.

⚠WARNING

A cron job that crashes is just gone. It runs nothing at the next interval. No restart, no alert, no recovery. launchd treats a crash as a trigger, not a termination.

launchd vs Cron — Service Lifecycle Comparison

Auto-restart on crash

Yes

launchd only — cron cannot do this

Service registry

~/Library/LaunchAgents/

single source of truth for all agents

Blocking foreground processes

every service needs a plist

What Cron Actually Does

Cron is a time-based scheduler. It looks at a table of schedules and fires commands at the right times. That is the entire contract. It has no awareness of whether the previous run succeeded. It has no ability to restart a crashed process. It keeps no record of run outcomes beyond whatever the command itself wrote to stdout.

For simple, stateless, fire-and-forget tasks — database backups that are not time-critical, log rotation, periodic cleanup — cron is sufficient. The job runs, the job ends, cron schedules the next one.

For anything that needs to be continuously operational, cron is the wrong tool.

What launchd Actually Does

launchd is macOS's service supervisor. It manages the full lifecycle of a process: start it on schedule, monitor it while running, detect if it exits unexpectedly, and restart it automatically.

The difference is fundamental. Cron fires a command. launchd owns a service.

When the OpenClaw platform watchdog crashes at 3am due to a transient network error — which happens in any real production system — launchd detects the exit, waits a brief interval, and restarts the daemon. The downtime is measured in seconds, not until the next cron interval. The monitoring infrastructure that needs to be monitoring other things stays up.

A cron equivalent would be unmonitored for the entire interval — potentially hours — until the next scheduled run fires.

The Plist Pattern

Every persistent service gets a .plist file in ~/Library/LaunchAgents/. The plist is the full service definition:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
  "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>Label</key>
  <string>com.openclaw.watchdog</string>
  <key>ProgramArguments</key>
  <array>
    <string>/usr/bin/python3</string>
    <string>/Users/knox/.openclaw/watchdog.py</string>
  </array>
  <key>StartCalendarInterval</key>
  <dict>
    <key>Minute</key>
    <integer>0</integer>
  </dict>
  <key>StandardOutPath</key>
  <string>/Users/knox/.openclaw/logs/watchdog.log</string>
  <key>StandardErrorPath</key>
  <string>/Users/knox/.openclaw/logs/watchdog.err</string>
  <key>RunAtLoad</key>
  <true/>
  <key>KeepAlive</key>
  <true/>
</dict>
</plist>

KeepAlive: true is the key. It tells launchd: if this process exits for any reason, restart it. That single flag is the difference between a scheduled job and a managed service.

◈INSIGHT

The plist file is your service contract with the operating system. It is infrastructure as code. It is versioned, reviewable, and reproducible — unlike a crontab entry that lives in one place and is known only to whoever created it.

The macOS Sandbox Rule

There is a critical gotcha for macOS launchd users: services launched from ~/Documents/ paths may be subject to sandbox restrictions. The macOS App Sandbox can block file system access, network calls, or subprocess execution for agents launched from certain directory contexts.

The rule is simple: put launchd service scripts in ~/.openclaw/ or similar paths outside the sandbox-restricted zone. Not ~/Documents/Dev/. Not ~/Desktop/. ~/.openclaw/ is the reliable location.

This matters because a service that launches successfully but fails silently due to sandbox restrictions is harder to debug than a service that never starts. The log will show it ran. The outcome will show it did nothing. The sandbox is the invisible wall.

The `env -u CLAUDECODE` Prefix

When triggering cron or launchd jobs from within a Claude Code session, prepend env -u CLAUDECODE to the command. Without this, some tools detect the CLAUDECODE environment variable and behave differently — deferring actions, logging differently, or skipping steps that assume interactive operation.

env -u CLAUDECODE python3 /path/to/script.py — this unsets the variable for the child process, ensuring the script runs in its normal, non-agent-context mode. This is not optional when triggering crons from Claude Code. It is a reliability requirement.

A good plan violently executed now is better than a perfect plan executed next week.
— General George S. Patton · War As I Knew It

The launchd daemon does not wait for perfect conditions. It executes, detects failure, and restores. That bias for action under adversity is exactly what persistent services require.

The Logging Pattern

Each plist specifies StandardOutPath and StandardErrorPath. Logs accumulate at known, predictable paths. This is not a suggestion — it is an operational requirement.

When something goes wrong with an automated system at 2am, the first question is always: what happened? If logs are scattered across temp files, piped to /dev/null, or simply not configured, you have no answer. If every service writes to a known path, you have a complete audit trail.

The pattern: ~/.openclaw/logs/<service-name>.log and ~/.openclaw/logs/<service-name>.err. Consistent. Findable. Structured.

◉SIGNAL

NEVER run blocking foreground processes in a terminal session. If the terminal closes, the process dies. Write a plist. Register it with launchd. Let the operating system own the lifecycle.

Loading and Managing Services

# Load a new service
launchctl load ~/Library/LaunchAgents/com.openclaw.watchdog.plist

# Unload (stop and disable)
launchctl unload ~/Library/LaunchAgents/com.openclaw.watchdog.plist

# Check status
launchctl list | grep openclaw

# Run immediately (without waiting for schedule)
launchctl kickstart -k gui/$(id -u)/com.openclaw.watchdog

Once loaded, launchd owns the service. It will survive terminal sessions, survive sleep, and survive most crashes — unless the machine itself restarts, in which case RunAtLoad: true brings it back on login.

Drill

List every scheduled or automated process you currently run. For each one, answer two questions:

Does it have a launchd plist in ~/Library/LaunchAgents/?
Does the plist specify StandardOutPath and StandardErrorPath?

For every process that answers "no" to either question: write the plist today. It takes ten minutes. The alternative is a process that crashes silently and runs nothing — discovered only when something downstream breaks and you spend an hour tracing back to the root cause.

Bottom Line: Cron schedules. launchd manages. For any process that needs to be continuously operational — watchdogs, content pipelines, monitoring daemons, trade alerts — launchd is the correct infrastructure primitive. Write a plist. Set KeepAlive: true. Put it in ~/.openclaw/ to avoid sandbox issues. Configure logging paths. Load it with launchctl. Done. Your service is now a first-class citizen of the operating system, and the OS itself is responsible for keeping it alive.