bugfixes

2026-03-12 19:55:51 +11:00
parent 76851f0816
commit d1237eed44
12 changed files with 1444 additions and 82 deletions
@@ -37,10 +37,12 @@ pip install -r scripts/requirements.txt

 ## Scripts
 - `scripts/audit_rain_data.py`: data quality + label quality + class balance audit.
- `scripts/train_rain_model.py`: strict time-based split training and metrics report.
+- `scripts/train_rain_model.py`: strict time-based split training and metrics report, with optional
+  validation-only hyperparameter tuning, calibration comparison, naive baseline comparison, and walk-forward folds.
 - `scripts/predict_rain_model.py`: inference using saved model artifact; upserts into
  `predictions_rain_1h`.
 - `scripts/run_rain_ml_worker.py`: long-running worker for periodic training + prediction.
+- `scripts/check_rain_pipeline_health.py`: freshness/failure check for alerting.

 Feature-set options:
 - `baseline`: original 5 local observation features.
@@ -55,7 +57,7 @@ Model-family options (`train_rain_model.py`):

 ## Usage
 ### 1) Apply schema update (existing DBs)
-`001_schema.sql` now includes `predictions_rain_1h`.
+`001_schema.sql` includes `predictions_rain_1h`.

 ```sh
 docker compose exec -T timescaledb \
@@ -63,6 +65,14 @@ docker compose exec -T timescaledb \
  -f /docker-entrypoint-initdb.d/001_schema.sql
 ```

+Apply monitoring views:
+
+```sh
+docker compose exec -T timescaledb \
+  psql -U postgres -d micrometeo \
+  -f /docker-entrypoint-initdb.d/002_rain_monitoring_views.sql
+```
+
 ### 2) Run data audit
 ```sh
 export DATABASE_URL="postgres://postgres:postgres@localhost:5432/micrometeo?sslmode=disable"
@@ -136,6 +146,25 @@ python scripts/train_rain_model.py \
  --report-out "models/rain_model_report_auto.json"
 ```

+### 3e) Full P1 evaluation (tuning + calibration + walk-forward)
+```sh
+python scripts/train_rain_model.py \
+  --site "home" \
+  --start "2026-02-01T00:00:00Z" \
+  --end "2026-03-03T23:55:00Z" \
+  --feature-set "extended" \
+  --model-family "auto" \
+  --forecast-model "ecmwf" \
+  --tune-hyperparameters \
+  --max-hyperparam-trials 12 \
+  --calibration-methods "none,sigmoid,isotonic" \
+  --walk-forward-folds 4 \
+  --model-version "rain-auto-v1-extended-eval" \
+  --out "models/rain_model_auto.pkl" \
+  --report-out "models/rain_model_report_auto.json" \
+  --model-card-out "models/model_card_{model_version}.md"
+```
+
 ### 4) Run inference and store prediction
 ```sh
 python scripts/predict_rain_model.py \
@@ -154,6 +183,10 @@ bash scripts/run_p0_rain_workflow.sh
 The `rainml` service in `docker-compose.yml` now runs:
 - periodic retraining (default every 24 hours)
 - periodic prediction writes (default every 10 minutes)
+- configurable tuning/calibration behavior (`RAIN_TUNE_HYPERPARAMETERS`,
+  `RAIN_MAX_HYPERPARAM_TRIALS`, `RAIN_CALIBRATION_METHODS`)
+- graceful gap handling for temporary source outages (`RAIN_ALLOW_EMPTY_DATA=true`)
+- optional model-card output (`RAIN_MODEL_CARD_PATH`)

 Artifacts are persisted to `./models` on the host.

@@ -165,6 +198,7 @@ docker compose logs -f rainml
 ## Output
 - Audit report: `models/rain_data_audit.json`
 - Training report: `models/rain_model_report.json`
+- Model card: `models/model_card_<model_version>.md`
 - Model artifact: `models/rain_model.pkl`
 - Dataset snapshot: `models/datasets/rain_dataset_<model_version>_<feature_set>.csv`
 - Prediction rows: `predictions_rain_1h` (probability + threshold decision + realized
@@ -192,3 +226,5 @@ docker compose logs -f rainml
 - Data is resampled into 5-minute buckets.
 - Label is derived from incremental rain from WS90 cumulative `rain_mm`.
 - Timestamps are handled as UTC in training/inference workflow.
+- See [Data issues and mitigation rules](./rain_data_issues.md) and
+  [runbook/monitoring guidance](./rain_model_runbook.md).