Implemented the next 4h-plan phase: dual-run support + explicit cutover gate.

2026-04-06 19:09:20 +10:00
parent 1ef300d25e
commit 1e750e35d1
7 changed files with 238 additions and 20 deletions
@@ -56,7 +56,7 @@ Priority key: `P0` = critical/blocking, `P1` = important, `P2` = later optimizat
 - [x] [P1] Add one expanded feature set and rerun evaluation. (completed on runtime machine 2026-03-12 with `feature_set=extended`, `model_version=rain-auto-v1-extended-202603120932`)
 - [x] [P0] Decide v1 threshold and define deployment interface.

-## 9) Extension Plan: 4-Hour Precipitation Window (Not Started)
+## 9) Extension Plan: 4-Hour Precipitation Window (In Progress)
 - [x] [P0] Lock v2 target definition for horizon extension: `rain_next_4h_mm >= <threshold_mm>` and explicitly decide whether the threshold remains `0.2mm` or is increased for 4-hour labeling. (implemented with `0.2mm` carry-forward)
 - [x] [P0] Decide rollout strategy: additive dual-horizon support (`1h` + `4h`) vs direct replacement; prefer dual-horizon for safe cutover. (implemented as additive dual-horizon)
 - [x] [P0] Parameterize label horizon in shared ML code (`scripts/rain_model_common.py`) so target columns are generated for 4-hour windows (48 x 5-minute buckets) instead of hard-coded 1-hour columns.
@@ -72,8 +72,11 @@ Priority key: `P0` = critical/blocking, `P1` = important, `P2` = later optimizat
 - [x] [P1] Update dashboard API defaults (`cmd/ingestd/web.go`) from `rain_next_1h` to the selected 4-hour model name (or make model name configurable).
 - [x] [P1] Update web UI labels/semantics (`cmd/ingestd/web/index.html`, `cmd/ingestd/web/app.js`) from “Rain 1h %” to “Rain 4h %” and verify chart legends/tooltips match the new horizon.
 - [x] [P1] Update worker/runtime defaults (`docker-compose.yml`, `scripts/run_rain_ml_worker.py`, `scripts/run_p0_rain_workflow.sh`) to use `rain_next_4h` naming/versioning.
+- [x] [P1] Add dual-run deployment support with isolated artifacts for 4h and 1h workers (`docker-compose.yml` + `scripts/rainml_py.sh` service targeting).
 - [x] [P0] Update health-check defaults (`scripts/check_rain_pipeline_health.py`) for 4-hour evaluation latency (e.g., pending-eval age threshold > 4h).
 - [x] [P1] Update docs and runbooks (`README.md`, `docs/rain_prediction.md`, `docs/rain_model_runbook.md`) so commands, table names, and target definitions match the 4-hour system.
- [ ] [P0] Run full retraining/evaluation for the 4-hour target and compare against current 1-hour model metrics before production cutover.
- [ ] [P0] Execute staged rollout: deploy schema + views, deploy model + inference, verify dashboard/health checks, then switch default model name.
+- [x] [P1] Add explicit automated cutover gate script for baseline-vs-candidate decisioning. (`scripts/check_rain_cutover_gate.py`)
+- [x] [P0] Run full retraining/evaluation for the 4-hour target and compare against current 1-hour model metrics before production cutover. (completed on runtime machine 2026-04-06; comparison vs `rain_model_report_extended_eval.json` showed regression: precision `0.7103 -> 0.5545`, PR-AUC `0.7245 -> 0.5850`, ROC-AUC `0.9184 -> 0.7843`, Brier `0.0931 -> 0.2276`)
+- [ ] [P0] Execute staged rollout: deploy schema + views, deploy model + inference, verify dashboard/health checks, then switch default model name. (schema/views/deploy/inference/health completed on runtime machine 2026-04-06; final production cutover decision remains open due 4h metric regression vs 1h baseline)
+- [ ] [P0] Improve 4-hour model quality to meet cutover gate (at minimum recover precision and calibration relative to current 1-hour production baseline) before declaring rollout complete.
 - [x] [P1] Keep rollback path documented: retain `rain_next_1h` artifacts/table access until 4-hour monitoring is stable. (documented in `docs/rain_model_runbook.md` staged rollout/rollback section)