another bugfix

2026-03-12 20:29:29 +11:00
parent d1237eed44
commit 20316cee91
8 changed files with 293 additions and 23 deletions
@@ -39,6 +39,7 @@ Review in report:
 - `candidate_models[*].hyperparameter_tuning`
 - `candidate_models[*].calibration_comparison`
 - `naive_baselines_test`
+- `sliced_performance_test`
 - `walk_forward_backtest`

 ## 3) Deploy
@@ -65,10 +66,10 @@ python scripts/predict_rain_model.py \

 ## 4) Rollback

-1. Identify the last known-good model artifact in `models/`.
-2. Point deployment to that artifact (worker env `RAIN_MODEL_PATH` or manual inference path).
-3. Re-run inference command and verify writes in `predictions_rain_1h`.
-4. Keep the failed artifact/report for postmortem.
+1. The worker now keeps a backup model at `RAIN_MODEL_BACKUP_PATH` and promotes new models only after candidate training succeeds.
+2. If promotion fails or no candidate model is produced, the worker keeps the active model unchanged.
+3. If inference starts without `RAIN_MODEL_PATH` but backup exists, the worker restores from backup automatically.
+4. Keep failed candidate artifacts for postmortem.

 ## 5) Monitoring

@@ -134,6 +135,7 @@ The script exits non-zero on failure, so it can directly drive alerting.
 - `RAIN_CALIBRATION_METHODS`
 - `RAIN_WALK_FORWARD_FOLDS`
 - `RAIN_ALLOW_EMPTY_DATA`
+- `RAIN_MODEL_BACKUP_PATH`
 - `RAIN_MODEL_CARD_PATH`

 Recommended production defaults: