ML13 min read · April 8, 2026

How to build a soft-sizing entry gate from a logistic regression in 200 lines

Annotated walkthrough of vectra-ml's entry classifier. 10 features, 6.4k training samples, sklearn-style fit/predict, exported to JSON for the live engine. No frameworks.

Vectra's entry classifier is a logistic regression. 10 features, 6.4k training samples, ~200 lines of Rust including the export path the live engine consumes. No frameworks, no Python, no scikit-learn dependency at runtime.

This post walks through every step of building one. By the end you can train, evaluate, and ship a comparable model end-to-end in a Rust workspace.

The 10 features

The features describe the bar at which the rule chain produced an Open(side) decision. Each is normalised to roughly mean-0 std-1 across the training corpus.

pub struct VolMomEntryFeatures {
    pub signal_zscore: f64,        // effective signal at entry
    pub consecutive_signal: f64,   // bars in same-sign streak
    pub vol_scalar: f64,           // vol-target sizing multiplier
    pub breadth_count: f64,        // bullish symbols this bar
    pub regime_bull: f64,          // 0/1 from heuristic classifier
    pub regime_bear: f64,
    pub above_sma: f64,            // 0/1: price > 100-bar SMA
    pub funding_rate: f64,         // current 8h funding rate
    pub bar_in_session_pct: f64,   // 0..1 progress through UTC day
    pub recent_winrate_5: f64,     // last 5 trades on this symbol
}

rust

Training

Standard binary cross-entropy with L2 regularisation. The label is whether the trade closed positive (1) or negative (0) — measured against the rule chain's actual execution, not a simulated optimum.

pub fn train_lr(
    samples: &[Sample],
    learning_rate: f64,
    epochs: usize,
    l2: f64,
) -> Weights {
    let mut w = vec![0.0; FEATURES + 1];  // +1 for bias
    for _ in 0..epochs {
        let mut grad = vec![0.0; w.len()];
        for s in samples {
            let z = dot(&w, &s.features) + w[FEATURES];
            let p = sigmoid(z);
            let err = p - s.label as f64;
            for i in 0..FEATURES {
                grad[i] += err * s.features[i];
            }
            grad[FEATURES] += err;
        }
        for i in 0..w.len() {
            grad[i] /= samples.len() as f64;
            grad[i] += l2 * w[i];
            w[i] -= learning_rate * grad[i];
        }
    }
    Weights { w }
}

rust

Hyperparameters that mattered

learning_rate = 0.05, epochs = 400, l2 = 0.01. Found via 5-fold CV on the first 80% of the training data; validated on the last 20% before going to OOS test.
Don't standardise the bias. Easy bug — feed the full feature vector through the same scaler as the data and you'll inadvertently shift the intercept. We feed bias as a separate weight, undeleted.
Class weighting. Our 6.4k sample set is 53/47 win/loss. Adding inverse-frequency class weights moved OOS AUC from 0.523 to 0.529. Small but real.

Evaluation

We evaluate via 5-fold time-series CV (no shuffling — leakage in order-of-time data is brutal). The headline metric is OOS Brier score; AUC and accuracy are secondary because we don't make a binary decision in production (see the soft-sizing post).

pub fn brier(preds: &[f64], labels: &[u8]) -> f64 {
    let n = preds.len() as f64;
    preds.iter().zip(labels)
        .map(|(p, &y)| (p - y as f64).powi(2))
        .sum::<f64>() / n
}

rust

Our v6 model lands at Brier 0.247 on OOS — versus 0.250 for a constant 0.5 prediction. Small, defensible, ships.

Export

The live engine reads model weights from a JSON file at startup (VECTRA_VOL_MOM_ENTRY_MODEL=...path...). Export is a single serde write:

#[derive(Serialize, Deserialize)]
pub struct ExportedModel {
    pub weights: Vec<f64>,
    pub feature_names: Vec<String>,
    pub trained_at: String,
    pub training_n: usize,
    pub oos_brier: f64,
}

impl ExportedModel {
    pub fn save(&self, path: &Path) -> Result<()> {
        let json = serde_json::to_string_pretty(self)?;
        std::fs::write(path, json)?;
        Ok(())
    }
}

rust

The runtime side uses the same struct with Deserialize. No version negotiation; if the JSON schema changes, old models fail loudly at startup with a serde error. We prefer that to silent miscalibration.

Why not a deep model

We've been asked. The honest answer is: 6.4k samples, 10 features, and we're already nowhere near the Bayes-optimal predictor on this feature set. A deep model wouldn't help — feature quality would. And our deep-model regression test (a 32-unit MLP, same features, same training setup) lifted Brier by 0.002. Within noise. Not worth the runtime weight.

When we have a fundamentally richer feature surface — order-book microstructure, cross-asset signals, on-chain — we'll revisit the architecture decision. Until then, logistic regression is honest and fast and we know how to debug it.

Published by Floris V. · Vectra operator

April 8, 2026

Join the waitlist →

Older

Methodology

Realistic crypto-bot Sharpe expectations: a 4-year audit

Newer

Why hard-blocking weak ML signals hurts more than it helps