another bugfix

This commit is contained in:
2026-03-12 20:29:29 +11:00
parent d1237eed44
commit 20316cee91
8 changed files with 293 additions and 23 deletions

View File

@@ -39,6 +39,7 @@ Review in report:
- `candidate_models[*].hyperparameter_tuning`
- `candidate_models[*].calibration_comparison`
- `naive_baselines_test`
- `sliced_performance_test`
- `walk_forward_backtest`
## 3) Deploy
@@ -65,10 +66,10 @@ python scripts/predict_rain_model.py \
## 4) Rollback
1. Identify the last known-good model artifact in `models/`.
2. Point deployment to that artifact (worker env `RAIN_MODEL_PATH` or manual inference path).
3. Re-run inference command and verify writes in `predictions_rain_1h`.
4. Keep the failed artifact/report for postmortem.
1. The worker now keeps a backup model at `RAIN_MODEL_BACKUP_PATH` and promotes new models only after candidate training succeeds.
2. If promotion fails or no candidate model is produced, the worker keeps the active model unchanged.
3. If inference starts without `RAIN_MODEL_PATH` but backup exists, the worker restores from backup automatically.
4. Keep failed candidate artifacts for postmortem.
## 5) Monitoring
@@ -134,6 +135,7 @@ The script exits non-zero on failure, so it can directly drive alerting.
- `RAIN_CALIBRATION_METHODS`
- `RAIN_WALK_FORWARD_FOLDS`
- `RAIN_ALLOW_EMPTY_DATA`
- `RAIN_MODEL_BACKUP_PATH`
- `RAIN_MODEL_CARD_PATH`
Recommended production defaults: