Alpha Journal: Engineering a Self-Improving Trading Signal System

Most signal systems are trained once and deployed forever. Engineers spend weeks on backtesting, calibrate weights against historical data, ship the system, and then wonder why performance degrades six months later. The market changed. The signal didn't. That's not an edge — that's a slowly dying heuristic dressed up as a strategy.

Alpha Journal inverts this model entirely. The system treats every live trade as training data and every post-mortem as a calibration event. The feedback loop never closes because it's never supposed to. Signal training isn't a phase of development — it's the product.

The engineering insight that makes this work isn't novel in isolation. Closed feedback loops are standard in control theory, adaptive systems, and military doctrine. What's unusual is applying that rigor to trading signal weights with full deterministic attribution, automated degradation detection, and a PR-based human checkpoint before any parameter changes reach production.

◈INSIGHT

Signal decay is not a bug. It is an environmental mismatch between a static model and a dynamic market. The only fix is continuous recalibration — and that requires a system that learns from live outcomes, not just historical backtests.

Deterministic Attribution: Killing the LLM Crutch

The most expensive mistake in signal engineering is letting a language model explain why a trade worked. LLMs are pattern-completion engines. They'll construct a plausible narrative for any outcome. Ask an LLM why a long position succeeded and it will confidently cite momentum, macro tailwinds, and sector rotation — even if the actual driver was a single volatility signal that crossed a threshold.

Alpha Journal's attribution module is deterministic. It runs factor win-rate analysis across all active signals — no model inference, no narrative generation, no hallucination surface. Each trade outcome gets tagged to its dominant signal factor: the highest-weighted input that cleared its confidence gate at decision time. Win and loss rates accumulate per factor across a rolling window. The math doesn't lie. A language model can.

Why this matters for system integrity: When you tie performance feedback to a probabilistic narrative generator, you corrupt the feedback loop. You're not learning what worked — you're learning what story sounds most coherent. Deterministic attribution strips that ambiguity. The factor either won or lost. The rate either held or dropped. The system responds to signal, not to story.

This is the same discipline that makes military battle damage assessment useful. The debrief doesn't ask "how do you feel the strike went?" It measures crater diameter, confirmed target state, and secondary effects. Feeling is not data.

Compound Post-Mortems: Pattern Detection at Scale

Individual trade post-mortems are tactically useful and strategically useless at scale. If your system is running 40-60 decisions per week, reviewing each one in isolation produces noise. You're looking at leaves when the insight lives in the forest.

Alpha Journal solves this with compound post-mortems. Trades sharing the same dominant signal factor and the same outcome — win or loss — within a 72-hour window get collapsed into a single analytical unit. That cluster gets one narrative, generated by Claude Haiku, focused on the pattern rather than the individual events. What conditions were shared? What did the gate catch or miss? What does the cluster suggest about the factor's reliability in this environment?

War is the realm of uncertainty; three quarters of the factors on which action in war is based are wrapped in a fog of greater or lesser uncertainty.
— Carl von Clausewitz · On War

The Clausewitz problem in trading is identical. Individual trade outcomes live in that fog. But clusters of outcomes sharing the same causal factor — those burn through the uncertainty. Pattern detection requires density. Compound post-mortems create the density.

The 72-hour window is not arbitrary. It's calibrated to the decision cadence of the underlying system. Wider windows dilute signal by mixing market regimes. Narrower windows fragment the pattern. The window is a parameter, not a constant — and it's exposed in the system's constants module for tuning without touching application logic.

The output isn't stored as prose. It's indexed as structured memory — factor, cluster size, outcome rate, environmental conditions, narrative summary. The memory module makes this searchable. Future post-mortems can retrieve historical clusters with similar factor profiles and compare regime behavior. The system doesn't just learn from this week's trades. It learns from all trades with the same signature.

Degradation Detection and the Auto-Weight PR

Here's where most teams stop: they build attribution and post-mortem capability, declare victory, and wait for a human to notice when performance drops. That's a monitoring system, not a learning system.

Alpha Journal's degradation detection module runs a continuous check: three consecutive weeks with a factor win rate below 45% triggers an alert. Not a log entry. An alert — routed through the delivery module to Discord, with the degrading factor identified, its recent win-rate trend, and a recommended weight adjustment.

Degradation Threshold

<45% win rate

3 consecutive weeks triggers alert + weight review

The alert is the signal. The response is the auto-weight adjuster. This module reads the degradation state, computes a proposed adjustment to the factor weight in adaptive_weights.json, and opens a draft GitHub PR with the change. The PR includes a summary: which factor, what the current weight is, what the proposed weight is, and the performance data that drove the recommendation.

The system never auto-merges. This is the critical architectural decision. Trading parameters are not application config. An automated merge on weight adjustments removes the human from a decision that has direct P&L implications. The PR is a proposal, not a commit. Knox reviews it, evaluates the context the system can't see — macro regime, upcoming catalysts, portfolio exposure — and merges or closes it with a comment. The system learns from the merge history. Closed PRs with comments feed back into the gap monitor.

This mirrors how competent engineering organizations handle infrastructure changes. Automation can identify the problem and propose the fix. It cannot authorize the deployment. That boundary isn't bureaucracy — it's risk management.

Opportunity Cost: Auditing the Ghost P&L

Most systems optimize for what they traded. Alpha Journal also tracks what it didn't trade — and why.

The opportunity cost module analyzes skipped decisions: trades the system flagged as candidates but filtered out through confidence gates, position limits, or manual overrides. For each skipped decision, it calculates ghost P&L: the hypothetical outcome if the position had been taken at the decision price. It attributes the skip to the specific gate that blocked it.

This creates a second feedback loop that most systems ignore entirely. If a confidence gate is consistently filtering out winners — if the ghost P&L on skipped decisions is systematically positive — that gate is miscalibrated. It's not protecting you. It's costing you.

The inverse matters equally. Skipped decisions with negative ghost P&L validate the gate. The system saved capital. That's not a missed opportunity — it's the gate working. Quantifying both directions gives you a honest accounting of what your risk controls are actually doing versus what you believe they're doing.

▲ALPHA

Ghost P&L is not a regret metric. It is a gate calibration tool. If your skipped trades would have outperformed your taken trades over a 30-day window, your gates are filtering signal instead of noise.

The InDecision Framework operates on this same principle: every decision has a cost, including the decision not to act. Opportunity cost accounting closes the analytical loop that most systems leave open.

What This Means for Signal Design

Alpha Journal is an 11-module system — pull_trades, postmortem, attribution, health_report, degradation_alert, auto_weight_adjuster, opportunity_cost, gap_monitor, deliver, memory, constants — with 304 tests and 96% coverage. That test density isn't academic. Every module in a closed feedback loop is a point of failure. If attribution misfires, degradation detection acts on bad data. If the weight adjuster reads a stale state, the PR proposes the wrong change. Coverage at that level is a system integrity requirement, not a vanity metric.

The architectural lesson is this: feedback loops require trust in every link of the chain. You can't tolerate probabilistic failures in deterministic systems. The attribution module must be right. The degradation threshold must fire on the correct condition. The PR must contain accurate data. Testing is what makes the loop trustworthy, and trustworthy loops are the only kind worth running.

Signal design implications for your own systems. First, separate attribution from narrative. Use deterministic factor tagging before you introduce any language model into the post-mortem process. Second, cluster your post-mortems — individual trade reviews don't scale, pattern analysis does. Third, build degradation detection before you need it. You won't notice gradual factor decay in real time. The system has to watch for you. Fourth, quantify opportunity cost. Understand what your gates are saving and what they're costing, or you're managing risk you can't measure.

The market is not static. Neither is any signal that claims to model it. The only durable edge is a system that keeps learning — structured, deterministic, and honest about what the data actually says.

Signal training is not a launch event. It's the operating system.

Visual Summary

click to expand