# Rain Model Runbook Operational guide for training, evaluating, deploying, monitoring, and rolling back the rain model. ## 1) One-time Setup Apply monitoring views: ```sh docker compose exec -T timescaledb \ psql -U postgres -d micrometeo \ -f /docker-entrypoint-initdb.d/002_rain_monitoring_views.sql ``` ## 2) Train + Evaluate Recommended evaluation run (includes validation-only tuning, calibration comparison, naive baselines, and walk-forward folds): ```sh python scripts/train_rain_model.py \ --site "home" \ --start "2026-02-01T00:00:00Z" \ --end "2026-03-03T23:55:00Z" \ --feature-set "extended" \ --model-family "auto" \ --forecast-model "ecmwf" \ --tune-hyperparameters \ --max-hyperparam-trials 12 \ --calibration-methods "none,sigmoid,isotonic" \ --threshold-policy "walk_forward" \ --walk-forward-folds 4 \ --model-version "rain-auto-v1-extended" \ --out "models/rain_model.pkl" \ --report-out "models/rain_model_report.json" \ --model-card-out "models/model_card_{model_version}.md" \ --dataset-out "models/datasets/rain_dataset_{model_version}_{feature_set}.csv" ``` Review in report: - `candidate_models[*].hyperparameter_tuning` - `candidate_models[*].calibration_comparison` - `naive_baselines_test` - `sliced_performance_test` - `threshold_tuning_walk_forward` - `walk_forward_backtest` ## 3) Deploy 1. Promote the selected artifact path to the inference worker (`RAIN_MODEL_PATH` or CLI `--model-path`). 2. Run one dry-run inference: ```sh python scripts/predict_rain_model.py \ --site home \ --model-path "models/rain_model.pkl" \ --model-name "rain_next_1h" \ --dry-run ``` 3. Run live inference: ```sh python scripts/predict_rain_model.py \ --site home \ --model-path "models/rain_model.pkl" \ --model-name "rain_next_1h" ``` ## 4) Rollback 1. The worker now keeps a backup model at `RAIN_MODEL_BACKUP_PATH` and promotes new models only after candidate training succeeds. 2. If promotion fails or no candidate model is produced, the worker keeps the active model unchanged. 3. If inference starts without `RAIN_MODEL_PATH` but backup exists, the worker restores from backup automatically. 4. Keep failed candidate artifacts for postmortem. ## 5) Monitoring ### Feature drift ```sql SELECT * FROM rain_feature_drift_daily WHERE site = 'home' ORDER BY day DESC LIMIT 30; ``` Alert heuristic: any absolute z-score > 3 for 2+ consecutive days. ### Prediction drift ```sql SELECT * FROM rain_prediction_drift_daily WHERE site = 'home' ORDER BY day DESC LIMIT 30; ``` Alert heuristic: `predicted_positive_rate` shifts by > 2x relative to trailing 14-day median. ### Calibration/performance drift ```sql SELECT * FROM rain_calibration_drift_daily WHERE site = 'home' ORDER BY day DESC LIMIT 30; ``` Alert heuristic: sustained Brier-score increase > 25% from trailing 30-day average. ## 6) Pipeline Failure Alerts Use the health-check script in cron, systemd timer, or your alerting scheduler: ```sh python scripts/check_rain_pipeline_health.py \ --site home \ --model-name rain_next_1h \ --max-ws90-age 20m \ --max-baro-age 30m \ --max-forecast-age 3h \ --max-prediction-age 30m \ --max-pending-eval-age 3h \ --max-pending-eval-rows 200 ``` The script exits non-zero on failure, so it can directly drive alerting. ## 7) Continuous Worker Defaults `docker-compose.yml` provides these controls for `rainml`: - `RAIN_TUNE_HYPERPARAMETERS` - `RAIN_MAX_HYPERPARAM_TRIALS` - `RAIN_CALIBRATION_METHODS` - `RAIN_THRESHOLD_POLICY` - `RAIN_WALK_FORWARD_FOLDS` - `RAIN_ALLOW_EMPTY_DATA` - `RAIN_MODEL_BACKUP_PATH` - `RAIN_MODEL_CARD_PATH` Recommended production defaults: - Enable tuning daily or weekly (`RAIN_TUNE_HYPERPARAMETERS=true`) - Keep walk-forward folds `0` in continuous mode, run fold backtests in scheduled evaluation jobs ## 8) Auto-Recommend Candidate To compare saved training reports and pick a deployment candidate automatically: ```sh python scripts/recommend_rain_model.py \ --reports-glob "models/rain_model_report*.json" \ --require-walk-forward \ --top-k 5 \ --json-out "models/rain_model_recommendation.json" ```