# Rain Model Runbook Operational guide for training, evaluating, deploying, monitoring, and rolling back the rain model. ## 1) One-time Setup Apply monitoring views: ```sh docker compose exec -T timescaledb \ psql -U postgres -d micrometeo \ -f /docker-entrypoint-initdb.d/002_rain_monitoring_views.sql ``` ## 2) Train + Evaluate Recommended evaluation run (includes validation-only tuning, calibration comparison, naive baselines, and walk-forward folds): ```sh python scripts/train_rain_model.py \ --site "home" \ --start "2026-02-01T00:00:00Z" \ --end "2026-03-03T23:55:00Z" \ --feature-set "extended" \ --model-family "auto" \ --forecast-model "ecmwf" \ --tune-hyperparameters \ --max-hyperparam-trials 12 \ --calibration-methods "none,sigmoid,isotonic" \ --walk-forward-folds 4 \ --model-version "rain-auto-v1-extended" \ --out "models/rain_model.pkl" \ --report-out "models/rain_model_report.json" \ --model-card-out "models/model_card_{model_version}.md" \ --dataset-out "models/datasets/rain_dataset_{model_version}_{feature_set}.csv" ``` Review in report: - `candidate_models[*].hyperparameter_tuning` - `candidate_models[*].calibration_comparison` - `naive_baselines_test` - `walk_forward_backtest` ## 3) Deploy 1. Promote the selected artifact path to the inference worker (`RAIN_MODEL_PATH` or CLI `--model-path`). 2. Run one dry-run inference: ```sh python scripts/predict_rain_model.py \ --site home \ --model-path "models/rain_model.pkl" \ --model-name "rain_next_1h" \ --dry-run ``` 3. Run live inference: ```sh python scripts/predict_rain_model.py \ --site home \ --model-path "models/rain_model.pkl" \ --model-name "rain_next_1h" ``` ## 4) Rollback 1. Identify the last known-good model artifact in `models/`. 2. Point deployment to that artifact (worker env `RAIN_MODEL_PATH` or manual inference path). 3. Re-run inference command and verify writes in `predictions_rain_1h`. 4. Keep the failed artifact/report for postmortem. ## 5) Monitoring ### Feature drift ```sql SELECT * FROM rain_feature_drift_daily WHERE site = 'home' ORDER BY day DESC LIMIT 30; ``` Alert heuristic: any absolute z-score > 3 for 2+ consecutive days. ### Prediction drift ```sql SELECT * FROM rain_prediction_drift_daily WHERE site = 'home' ORDER BY day DESC LIMIT 30; ``` Alert heuristic: `predicted_positive_rate` shifts by > 2x relative to trailing 14-day median. ### Calibration/performance drift ```sql SELECT * FROM rain_calibration_drift_daily WHERE site = 'home' ORDER BY day DESC LIMIT 30; ``` Alert heuristic: sustained Brier-score increase > 25% from trailing 30-day average. ## 6) Pipeline Failure Alerts Use the health-check script in cron, systemd timer, or your alerting scheduler: ```sh python scripts/check_rain_pipeline_health.py \ --site home \ --model-name rain_next_1h \ --max-ws90-age 20m \ --max-baro-age 30m \ --max-forecast-age 3h \ --max-prediction-age 30m \ --max-pending-eval-age 3h \ --max-pending-eval-rows 200 ``` The script exits non-zero on failure, so it can directly drive alerting. ## 7) Continuous Worker Defaults `docker-compose.yml` provides these controls for `rainml`: - `RAIN_TUNE_HYPERPARAMETERS` - `RAIN_MAX_HYPERPARAM_TRIALS` - `RAIN_CALIBRATION_METHODS` - `RAIN_WALK_FORWARD_FOLDS` - `RAIN_ALLOW_EMPTY_DATA` - `RAIN_MODEL_CARD_PATH` Recommended production defaults: - Enable tuning daily or weekly (`RAIN_TUNE_HYPERPARAMETERS=true`) - Keep walk-forward folds `0` in continuous mode, run fold backtests in scheduled evaluation jobs