Training and Deploying Models in ML.NET: A Walkthrough

Timeline showing ML.NET model lifecycle from data audit to deployment and monitoring

Building a production-ready ML.NET model is less like a “one-click wizard” and more like an orderly campaign: align the objective, marshal the data, assemble the pipeline, and deploy with guardrails. Below is a pragmatic, end-to-end timeline you can follow—from first business conversation to monitored production API—optimized for teams living in the Microsoft/.NET ecosystem.


T-30 Days: Align on the Business Outcome

Before a single line of code:

  • Define the decision you’re automating (e.g., “predict subscription churn within 30 days”).
  • Decide the action downstream (e.g., retention email, discount, human outreach).
  • Pick the success metric you’ll optimize (e.g., F1 or AUCPR for rare events; RMSE/MAE for regression).
  • Clarify constraints: latency (ms), throughput (req/sec), budget, explainability, governance.

Stoic note: Marcus Aurelius wrote about the “dichotomy of control.” In ML terms: control your data, features, and deployment discipline; accept that noise and irreducible error remain. The discipline pays dividends.


T-25 Days: Audit Data and Labels

  • Inventory sources: SQL Server, Azure SQL/Databricks/Blob CSVs, application logs.
  • Define label and time windows to avoid leakage (train only on features known before the outcome).
  • Quantify balance (class imbalance), missingness, drift risk.
  • Draft a data contract: column names, types, allowed ranges, PII handling, and refresh cadence.

Deliverable: a concise data profile + label strategy.


T-20 Days: Scaffold the .NET Solution

Create a .NET 8 solution with separate projects:

  • AInDotNet.Churn.Training (console for training)
  • AInDotNet.Churn.Inference.Api (ASP.NET Core Web API)
  • AInDotNet.Churn.Shared (DTOs, common utils)

NuGet packages (typical set):

  • Microsoft.ML
  • Microsoft.ML.AutoML (optional, for search/tuning)
  • Microsoft.ML.LightGbm (fast, strong baseline)
  • Microsoft.Extensions.ML (PredictionEnginePool for ASP.NET Core)
  • Microsoft.ML.OnnxRuntime (optional, ONNX scoring)

T-18 Days: Load Data into IDataView

using Microsoft.ML;
using Microsoft.ML.Data;

var ml = new MLContext(seed: 1);

// Strongly-typed input schema
public class CustomerEvent
{
    [LoadColumn(0)] public bool Label { get; set; }      // e.g., Churned
    [LoadColumn(1)] public float TenureDays { get; set; }
    [LoadColumn(2)] public float MonthlySpend { get; set; }
    [LoadColumn(3)] public string PlanTier { get; set; } = "";
    [LoadColumn(4)] public string Region { get; set; } = "";
    [LoadColumn(5)] public float TicketsLast90d { get; set; }
}

var data = ml.Data.LoadFromTextFile<CustomerEvent>(
    path: "data/train.csv",
    hasHeader: true,
    separatorChar: ',');

If your data lives in memory (after a SQL query), use ml.Data.LoadFromEnumerable(list).


T-16 Days: Craft a Featurization Pipeline

Turn raw columns into a clean Features vector:

var features = ml.Transforms.ReplaceMissingValues(
                    new[] { new InputOutputColumnPair("TenureDays"), new InputOutputColumnPair("MonthlySpend"), new InputOutputColumnPair("TicketsLast90d") })
               .Append(ml.Transforms.Categorical.OneHotEncoding(
                    new[] { new InputOutputColumnPair("PlanTier"), new InputOutputColumnPair("Region") }))
               .Append(ml.Transforms.Concatenate("Features", 
                    "TenureDays", "MonthlySpend", "TicketsLast90d", "PlanTier", "Region"))
               .Append(ml.Transforms.NormalizeMinMax("Features"))
               .AppendCacheCheckpoint(ml); // speed up repeated scans

Tip: AppendCacheCheckpoint accelerates experimentation and CV by caching transformed data.


T-14 Days: Pick Your Trainer Strategy

You’ve got two paths:

  1. Manual (more control):
    • Binary classification: LightGbm, SdcaLogisticRegression
    • Regression: LightGbm, Sdca
    • Multi-class: SdcaMaximumEntropy, LightGbmMulti
    • Recommendation: MatrixFactorization
  2. AutoML (faster search):
    • Use Microsoft.ML.AutoML or the Model Builder (GUI/CLI) to explore algorithms and hyperparameters.

T-12 Days: Train a Baseline

var split = ml.Data.TrainTestSplit(data, testFraction: 0.2, seed: 1);

var trainer = ml.BinaryClassification.Trainers.LightGbm(
    labelColumnName: "Label",
    featureColumnName: "Features",
    numberOfLeaves: 31,
    numberOfIterations: 200);

var pipeline = features.Append(trainer);

var model = pipeline.Fit(split.TrainSet);
var predictions = model.Transform(split.TestSet);
var metrics = ml.BinaryClassification.Evaluate(predictions, labelColumnName: "Label");

Console.WriteLine($"AUC: {metrics.AreaUnderRocCurve:F3}  PR-AUC: {metrics.AreaUnderPrecisionRecallCurve:F3}  F1: {metrics.F1Score:F3}");

For rare positives, PR-AUC often reflects business reality better than ROC-AUC.


T-10 Days: Cross-Validate and Sanity-Check

var cv = ml.BinaryClassification.CrossValidate(
    data: data,
    estimator: pipeline,
    numberOfFolds: 5,
    labelColumnName: "Label");

var meanAuc = cv.Average(f => f.Metrics.AreaUnderRocCurve);
Console.WriteLine($"5-fold AUC: {meanAuc:F3}");

Checklist:

  • Confirm no obvious data leakage (e.g., post-event columns).
  • Verify label prevalence in each fold.
  • Stratify folds if needed.

T-9 Days: Explainability with PFI (Permutation Feature Importance)

Identify which features drive predictions:

var transformedTrain = model.Transform(split.TrainSet);

// Permutation Feature Importance on the trained pipeline
var pfi = ml.BinaryClassification.PermutationFeatureImportance(
    model, transformedTrain, labelColumnName: "Label");

var featureNames = transformedTrain.Schema["Features"].Annotations
    .GetValue<VBuffer<ReadOnlyMemory<char>>>("SlotNames").DenseValues();

// Rank by impact on AUC
var aucImpacts = pfi.Select((m, i) => new { Feature = featureNames[i].ToString(), AucDelta = m.AreaUnderRocCurve.Mean })
                    .OrderByDescending(x => Math.Abs(x.AucDelta));
foreach (var r in aucImpacts.Take(10))
    Console.WriteLine($"{r.Feature}: ΔAUC = {r.AucDelta:F4}");

Use PFI to prune weak features and defend the model with stakeholders.


T-8 Days: Tune or Run AutoML

  • Manual tuning: adjust LightGBM leaves, learning rate, min data in leaf; try different normalizers.
  • AutoML: sweep trainers + hyperparameters within a compute budget. Keep logs of trials, metrics, and seeds for reproducibility.

Deliverable: chosen pipeline + hyperparameters + rationale.


T-7 Days: Freeze the Pipeline and Save the Artifact

ML.NET models are ITransformer graphs with the schema baked in.

ml.Model.Save(model, split.TrainSet.Schema, "MLModels/churn_model.zip");

Commit both churn_model.zip and a ModelCard.md documenting:

  • Data windows & sources
  • Metrics (CV + holdout)
  • Intended use & known limits
  • Owners & retraining cadence

T-6 Days: Define Strongly-Typed Prediction Contracts

public class ChurnInput
{
    public float TenureDays { get; set; }
    public float MonthlySpend { get; set; }
    public string PlanTier { get; set; } = "";
    public string Region { get; set; } = "";
    public float TicketsLast90d { get; set; }
}

public class ChurnOutput
{
    public bool PredictedLabel { get; set; }
    public float Probability { get; set; }
    public float Score { get; set; }
}

Keep these DTOs in a shared project to avoid API/training drift.


T-5 Days: Register the Model as a Versioned Asset

  • Store churn_model.zip in artifacts/models/churn/<semver>/.
  • Tag releases in Git, produce a semantic version (e.g., 1.2.0).
  • Capture the training hash (Git commit, dataset version, seed, ML.NET version).

This enables quick rollbacks and compliance audits.


T-4 Days: Deploy Path A — Real-Time API (ASP.NET Core)

Startup / Program.cs:

using Microsoft.Extensions.ML;
using AInDotNet.Churn.Shared;

builder.Services.AddPredictionEnginePool<ChurnInput, ChurnOutput>()
    .FromFile(modelName: "ChurnModel",
              filePath: Path.Combine(builder.Environment.ContentRootPath, "MLModels", "churn_model.zip"),
              watchForChanges: true);

var app = builder.Build();

Controller:

using Microsoft.AspNetCore.Mvc;
using Microsoft.Extensions.ML;

[ApiController]
[Route("api/churn")]
public class ChurnController : ControllerBase
{
    private readonly PredictionEnginePool<ChurnInput, ChurnOutput> _pool;

    public ChurnController(PredictionEnginePool<ChurnInput, ChurnOutput> pool) => _pool = pool;

    [HttpPost("score")]
    public ActionResult<ChurnOutput> Score([FromBody] ChurnInput input)
        => _pool.Predict(modelName: "ChurnModel", example: input);
}

Why PredictionEnginePool?
PredictionEngine is not thread-safe. The pool handles concurrency and hot-reload of updated models when watchForChanges: true is set.

Latency budget: LightGBM typically responds in micro- to milliseconds per record on commodity x64.


T-3 Days: Deploy Path B — Batch Scoring (Console, Worker, or Function)

For millions of rows nightly:

var ml = new MLContext();

DataViewSchema schema;
using var fs = File.OpenRead("MLModels/churn_model.zip");
var model = ml.Model.Load(fs, out schema);

// Load batch from CSV or from IEnumerable<T>
var batch = ml.Data.LoadFromTextFile<ChurnInput>("data/score.csv", hasHeader: true, separatorChar: ',');

// Score in a vectorized pass
var scored = model.Transform(batch);

// Write just the columns you need
ml.Data.SaveAsText(scored, "data/scored.csv", separatorChar: ',', headerRow: true,
    columns: new[] {
        new TextLoader.Column("PredictedLabel", DataKind.Boolean, 0),
        new TextLoader.Column("Probability", DataKind.Single, 1),
        new TextLoader.Column("Score", DataKind.Single, 2)
    });

Schedule via Windows Task Scheduler, Azure Functions/Container Apps, or Azure DevOps pipelines.


T-2 Days: Observability—Metrics, Drift, and Thresholds

  • Log inference metadata: model version, features (or hashed buckets), probability, decision threshold, latency.
  • Track population drift: compare live feature distributions vs. training. Simple options:
    • Kolmogorov–Smirnov tests on numeric features
    • Categorical top-k frequency shifts
  • Calibrate threshold for the actual cost curve (false positive vs. false negative). Keep a “quality dashboard” for business stakeholders.

T-1 Day: CI/CD, Governance, and Rollback

  • CI (training repo):
    • Run unit tests on feature transforms
    • Re-train on schedule or on data-freshness events
    • Emit artifacts: model zip + model card + metrics.json
  • CD (inference repo):
    • Publish ASP.NET Core API Docker image
    • Blue/green or canary rollout
    • Health probes + application metrics (Requests/sec, P95 latency)
  • Rollback: a single environment variable swap (model path/version) should restore N-1.

T+7 Days: Post-Deployment Review

  • Compare live outcomes vs. offline expectations.
  • Investigate segments with underperformance (e.g., Region = “New Market”).
  • Document lessons learned and update the retraining cadence (e.g., monthly or when drift > threshold).

Optional: Export to ONNX for Interop

If you need cross-platform or non-.NET consumers, consider ONNX export (supported by several ML.NET trainers and transforms) and serve with Microsoft.ML.OnnxRuntime. Keep a compatibility matrix—some transforms are not yet exportable.


Security, Privacy, and Compliance Essentials

  • PII: Hash or tokenize sensitive fields before training; never log raw PII in inference.
  • Least privilege: read-only connections from scoring service to data stores.
  • Model ownership: name a human owner and a deputy; ensure both can roll models forward/back.

Common Pitfalls (and Quick Fixes)

  1. Schema mismatch at load time
    Symptom: Column not found or type mismatches when loading the zip in production.
    Fix: Keep DTOs in a shared project; always save the model with TrainSet.Schema and validate in CI by loading and predicting on a golden sample.
  2. Thread safety issues
    Symptom: Random exceptions or corrupt predictions under load.
    Fix: Use PredictionEnginePool in ASP.NET Core; never share a single PredictionEngine across threads.
  3. Data leakage inflating offline metrics
    Symptom: Great CV metrics, poor prod performance.
    Fix: Re-verify time windows; remove proxy columns (e.g., post-churn interactions).
  4. Imbalanced labels
    Symptom: High accuracy, low recall for the minority class.
    Fix: Evaluate with PR-AUC/F1; adjust class weights (LightGBM has IsUnbalanced/ScalePosWeight semantics) and threshold.
  5. Silent model decay
    Symptom: Gradual deterioration with market or product changes.
    Fix: Monitor drift and performance proxies; trigger retraining when thresholds trip.
  6. Latency spikes
    Symptom: Occasional outliers harm SLAs.
    Fix: Pre-warm the pool, pin CPU limits per pod/container, and keep transforms lightweight (avoid huge n-gram text vectors at request time).

Mini Timeline Recap (Quick Reference)

  • T-30: Business alignment (decision, metric, constraints)
  • T-25: Data audit & label strategy
  • T-20: .NET solution scaffolding + NuGet
  • T-18: Load data into IDataView
  • T-16: Feature pipeline (+ cache)
  • T-14: Trainer strategy (manual vs. AutoML)
  • T-12: Train baseline + hold-out metrics
  • T-10: Cross-validate
  • T-9: Explainability (PFI)
  • T-8: Tune/AutoML; lock hyperparams
  • T-7: Save model (+ model card)
  • T-6: Define prediction contracts
  • T-5: Version and register artifact
  • T-4: Deploy real-time API
  • T-3: Deploy batch scoring
  • T-2: Observability & thresholds
  • T-1: CI/CD + rollback
  • T+7: Post-deploy review, retrain cadence

Conclusion: Why This Matters to Microsoft/.NET Leaders

For executives and engineering leaders anchored in the Microsoft stack, ML.NET offers tight integration, low operational friction, and predictable costs. You keep your existing DevOps, your team’s C# skills, and your governance patterns—without shuttling data and models across disjoint ecosystems. The timeline above is intentionally operational: it protects ROI by turning model building into a repeatable software process, not a one-off experiment.

In practical terms:

  • Speed: ML.NET plus ASP.NET Core yields a production API in days, not months.
  • Control: Strong typing, reproducible training, and versioned artifacts simplify audits and rollbacks.
  • Fit: Your .NET developers don’t have to context-switch into another language; they apply familiar patterns—DI, logging, tests—to ML.

Adopt the Stoic approach: control what you can (data contracts, pipelines, deployments), monitor what you can’t (drift, noise), and act decisively with a battle-tested rollback. That’s how you move from “we ran a model once” to AI as a durable capability in your .NET organization.

Want More?