update for 4 hour rain forecast
This commit is contained in:
+104
-5
@@ -4,6 +4,14 @@ Operational guide for training, evaluating, deploying, monitoring, and rolling b
|
||||
|
||||
## 1) One-time Setup
|
||||
|
||||
Apply 4-hour prediction table migration:
|
||||
|
||||
```sh
|
||||
docker compose exec -T timescaledb \
|
||||
psql -U postgres -d micrometeo \
|
||||
-f /docker-entrypoint-initdb.d/003_rain_predictions_4h.sql
|
||||
```
|
||||
|
||||
Apply monitoring views:
|
||||
|
||||
```sh
|
||||
@@ -21,6 +29,7 @@ python scripts/train_rain_model.py \
|
||||
--site "home" \
|
||||
--start "2026-02-01T00:00:00Z" \
|
||||
--end "2026-03-03T23:55:00Z" \
|
||||
--horizon-hours 4 \
|
||||
--feature-set "extended" \
|
||||
--model-family "auto" \
|
||||
--forecast-model "ecmwf" \
|
||||
@@ -29,7 +38,7 @@ python scripts/train_rain_model.py \
|
||||
--calibration-methods "none,sigmoid,isotonic" \
|
||||
--threshold-policy "walk_forward" \
|
||||
--walk-forward-folds 4 \
|
||||
--model-version "rain-auto-v1-extended" \
|
||||
--model-version "rain-auto-v2-extended-4h" \
|
||||
--out "models/rain_model.pkl" \
|
||||
--report-out "models/rain_model_report.json" \
|
||||
--model-card-out "models/model_card_{model_version}.md" \
|
||||
@@ -53,7 +62,8 @@ Review in report:
|
||||
python scripts/predict_rain_model.py \
|
||||
--site home \
|
||||
--model-path "models/rain_model.pkl" \
|
||||
--model-name "rain_next_1h" \
|
||||
--model-name "rain_next_4h" \
|
||||
--horizon-hours 4 \
|
||||
--dry-run
|
||||
```
|
||||
|
||||
@@ -63,7 +73,8 @@ python scripts/predict_rain_model.py \
|
||||
python scripts/predict_rain_model.py \
|
||||
--site home \
|
||||
--model-path "models/rain_model.pkl" \
|
||||
--model-name "rain_next_1h"
|
||||
--model-name "rain_next_4h" \
|
||||
--horizon-hours 4
|
||||
```
|
||||
|
||||
## 4) Rollback
|
||||
@@ -72,6 +83,7 @@ python scripts/predict_rain_model.py \
|
||||
2. If promotion fails or no candidate model is produced, the worker keeps the active model unchanged.
|
||||
3. If inference starts without `RAIN_MODEL_PATH` but backup exists, the worker restores from backup automatically.
|
||||
4. Keep failed candidate artifacts for postmortem.
|
||||
5. During 4-hour rollout stabilization, keep `predictions_rain_1h` and `rain_next_1h` model artifacts available for immediate fallback.
|
||||
|
||||
## 5) Monitoring
|
||||
|
||||
@@ -118,12 +130,13 @@ Use the health-check script in cron, systemd timer, or your alerting scheduler:
|
||||
```sh
|
||||
python scripts/check_rain_pipeline_health.py \
|
||||
--site home \
|
||||
--model-name rain_next_1h \
|
||||
--model-name rain_next_4h \
|
||||
--horizon-hours 4 \
|
||||
--max-ws90-age 20m \
|
||||
--max-baro-age 30m \
|
||||
--max-forecast-age 3h \
|
||||
--max-prediction-age 30m \
|
||||
--max-pending-eval-age 3h \
|
||||
--max-pending-eval-age 6h \
|
||||
--max-pending-eval-rows 200
|
||||
```
|
||||
|
||||
@@ -138,6 +151,7 @@ The script exits non-zero on failure, so it can directly drive alerting.
|
||||
- `RAIN_THRESHOLD_POLICY`
|
||||
- `RAIN_WALK_FORWARD_FOLDS`
|
||||
- `RAIN_ALLOW_EMPTY_DATA`
|
||||
- `RAIN_HORIZON_HOURS`
|
||||
- `RAIN_MODEL_BACKUP_PATH`
|
||||
- `RAIN_MODEL_CARD_PATH`
|
||||
|
||||
@@ -156,3 +170,88 @@ python scripts/recommend_rain_model.py \
|
||||
--top-k 5 \
|
||||
--json-out "models/rain_model_recommendation.json"
|
||||
```
|
||||
|
||||
## 9) Staged 4h Rollout Checklist
|
||||
|
||||
Run this sequence in production/staging to satisfy the 4h cutover gate:
|
||||
|
||||
1. Apply schema migration for 4h predictions:
|
||||
|
||||
```sh
|
||||
docker compose exec -T timescaledb \
|
||||
psql -U postgres -d micrometeo \
|
||||
-f /docker-entrypoint-initdb.d/003_rain_predictions_4h.sql
|
||||
```
|
||||
|
||||
2. Re-apply monitoring views (now include 1h + 4h unions):
|
||||
|
||||
```sh
|
||||
docker compose exec -T timescaledb \
|
||||
psql -U postgres -d micrometeo \
|
||||
-f /docker-entrypoint-initdb.d/002_rain_monitoring_views.sql
|
||||
```
|
||||
|
||||
3. Run a full 4h training/evaluation cycle and save report:
|
||||
|
||||
```sh
|
||||
python scripts/train_rain_model.py \
|
||||
--site "home" \
|
||||
--start "2026-02-01T00:00:00Z" \
|
||||
--end "2026-03-03T23:55:00Z" \
|
||||
--horizon-hours 4 \
|
||||
--feature-set "extended" \
|
||||
--model-family "auto" \
|
||||
--forecast-model "ecmwf" \
|
||||
--tune-hyperparameters \
|
||||
--threshold-policy "walk_forward" \
|
||||
--walk-forward-folds 4 \
|
||||
--model-version "rain-auto-v2-extended-4h" \
|
||||
--out "models/rain_model_4h.pkl" \
|
||||
--report-out "models/rain_model_report_4h.json"
|
||||
```
|
||||
|
||||
4. Compare 4h metrics against the latest 1h benchmark report before switching dashboard defaults:
|
||||
|
||||
```sh
|
||||
python scripts/compare_rain_reports.py \
|
||||
--baseline "models/rain_model_report_1h.json" \
|
||||
--candidate "models/rain_model_report_4h.json"
|
||||
```
|
||||
5. Run dry-run inference, then live inference with 4h model name/horizon:
|
||||
|
||||
```sh
|
||||
python scripts/predict_rain_model.py \
|
||||
--site home \
|
||||
--model-path "models/rain_model_4h.pkl" \
|
||||
--model-name "rain_next_4h" \
|
||||
--horizon-hours 4 \
|
||||
--dry-run
|
||||
|
||||
python scripts/predict_rain_model.py \
|
||||
--site home \
|
||||
--model-path "models/rain_model_4h.pkl" \
|
||||
--model-name "rain_next_4h" \
|
||||
--horizon-hours 4
|
||||
```
|
||||
|
||||
6. Validate health checks and dashboard data path for 4h:
|
||||
|
||||
```sh
|
||||
python scripts/check_rain_pipeline_health.py \
|
||||
--site home \
|
||||
--model-name rain_next_4h \
|
||||
--horizon-hours 4 \
|
||||
--max-pending-eval-age 6h
|
||||
```
|
||||
|
||||
7. Keep 1h path live in parallel until 4h drift/calibration remains stable for at least 7 days.
|
||||
|
||||
### Fast rollback to 1h
|
||||
|
||||
If 4h performance or pipeline health regresses:
|
||||
|
||||
1. Set worker env back to:
|
||||
`RAIN_HORIZON_HOURS=1`, `RAIN_MODEL_NAME=rain_next_1h`, and a known-good 1h model path/version.
|
||||
2. Restart `rainml` service.
|
||||
3. Confirm `check_rain_pipeline_health.py --horizon-hours 1 --model-name rain_next_1h` returns `ok`.
|
||||
4. Keep `predictions_rain_4h` data for postmortem; do not drop tables during rollback.
|
||||
|
||||
Reference in New Issue
Block a user