another bugfix

This commit is contained in:
2026-03-12 20:29:29 +11:00
parent d1237eed44
commit 20316cee91
8 changed files with 293 additions and 23 deletions

View File

@@ -11,7 +11,7 @@ This document captures known data-quality issues observed in the rain-model pipe
| Sensor gaps | Missing 5-minute buckets from WS90/barometer ingestion. | Resample to 5-minute grid; barometer interpolated with short limit (`limit=3`); gap lengths tracked by audit. |
| Out-of-order arrivals | Late MQTT events can arrive with older `ts`. | Audit reports out-of-order count by sorting on `received_at` and checking `ts` monotonicity. |
| Duplicate rows | Replays/reconnects can duplicate sensor rows. | Audit reports duplicate counts by `(ts, station_id)` for WS90 and `(ts, source)` for barometer. |
| Forecast sparsity/jitter | Hourly forecast retrieval cadence does not always align with 5-minute features. | Select latest forecast per `ts` (`DISTINCT ON` + `retrieved_at DESC`), resample to 5 minutes, short forward/backfill windows, and clip `fc_precip_prob` to `[0,1]`. |
| Forecast sparsity/jitter | Hourly forecast retrieval cadence does not always align with 5-minute features. | Select latest forecast per `ts` (`DISTINCT ON` + `retrieved_at DESC`), resample to 5 minutes, short forward/backfill windows, and clip `fc_precip_prob` to `[0,1]`. If `precip_prob` is unavailable upstream, backfill from `precip_mm` (`>0 => 1`, else `0`). |
| Local vs UTC day boundary | Daily rainfall resets can look wrong when local timezone is not respected. | Station timezone is configured via `site.timezone` and used by Wunderground uploader; model training/inference stays UTC-based for split consistency. |
## Audit Command

View File

@@ -39,6 +39,7 @@ Review in report:
- `candidate_models[*].hyperparameter_tuning`
- `candidate_models[*].calibration_comparison`
- `naive_baselines_test`
- `sliced_performance_test`
- `walk_forward_backtest`
## 3) Deploy
@@ -65,10 +66,10 @@ python scripts/predict_rain_model.py \
## 4) Rollback
1. Identify the last known-good model artifact in `models/`.
2. Point deployment to that artifact (worker env `RAIN_MODEL_PATH` or manual inference path).
3. Re-run inference command and verify writes in `predictions_rain_1h`.
4. Keep the failed artifact/report for postmortem.
1. The worker now keeps a backup model at `RAIN_MODEL_BACKUP_PATH` and promotes new models only after candidate training succeeds.
2. If promotion fails or no candidate model is produced, the worker keeps the active model unchanged.
3. If inference starts without `RAIN_MODEL_PATH` but backup exists, the worker restores from backup automatically.
4. Keep failed candidate artifacts for postmortem.
## 5) Monitoring
@@ -134,6 +135,7 @@ The script exits non-zero on failure, so it can directly drive alerting.
- `RAIN_CALIBRATION_METHODS`
- `RAIN_WALK_FORWARD_FOLDS`
- `RAIN_ALLOW_EMPTY_DATA`
- `RAIN_MODEL_BACKUP_PATH`
- `RAIN_MODEL_CARD_PATH`
Recommended production defaults:

View File

@@ -48,6 +48,8 @@ Feature-set options:
- `baseline`: original 5 local observation features.
- `extended`: adds wind-direction encoding, lag/rolling stats, recent rain accumulation,
and aligned forecast features from `forecast_openmeteo_hourly`.
- `extended_calendar`: `extended` plus UTC calendar seasonality features
(`hour_*`, `dow_*`, `month_*`, `is_weekend`).
Model-family options (`train_rain_model.py`):
- `logreg`: logistic regression baseline.
@@ -117,6 +119,20 @@ python scripts/train_rain_model.py \
--dataset-out "models/datasets/rain_dataset_{model_version}_{feature_set}.csv"
```
### 3b.1) Train expanded + calendar (P2) feature-set model
```sh
python scripts/train_rain_model.py \
--site "home" \
--start "2026-02-01T00:00:00Z" \
--end "2026-03-03T23:55:00Z" \
--feature-set "extended_calendar" \
--model-family "auto" \
--forecast-model "ecmwf" \
--model-version "rain-auto-v1-extended-calendar" \
--out "models/rain_model_extended_calendar.pkl" \
--report-out "models/rain_model_report_extended_calendar.json"
```
### 3c) Train tree-based baseline (P1)
```sh
python scripts/train_rain_model.py \
@@ -186,6 +202,7 @@ The `rainml` service in `docker-compose.yml` now runs:
- configurable tuning/calibration behavior (`RAIN_TUNE_HYPERPARAMETERS`,
`RAIN_MAX_HYPERPARAM_TRIALS`, `RAIN_CALIBRATION_METHODS`)
- graceful gap handling for temporary source outages (`RAIN_ALLOW_EMPTY_DATA=true`)
- automatic rollback path for last-known-good model (`RAIN_MODEL_BACKUP_PATH`)
- optional model-card output (`RAIN_MODEL_CARD_PATH`)
Artifacts are persisted to `./models` on the host.
@@ -198,6 +215,7 @@ docker compose logs -f rainml
## Output
- Audit report: `models/rain_data_audit.json`
- Training report: `models/rain_model_report.json`
- Regime slices in training report: `sliced_performance_test`
- Model card: `models/model_card_<model_version>.md`
- Model artifact: `models/rain_model.pkl`
- Dataset snapshot: `models/datasets/rain_dataset_<model_version>_<feature_set>.csv`
@@ -222,6 +240,12 @@ docker compose logs -f rainml
- `fc_temp_c`, `fc_rh`, `fc_pressure_msl_hpa`, `fc_wind_m_s`, `fc_wind_gust_m_s`,
`fc_precip_mm`, `fc_precip_prob`, `fc_cloud_cover`
## Model Features (extended_calendar extras)
- `hour_sin`, `hour_cos`
- `dow_sin`, `dow_cos`
- `month_sin`, `month_cos`
- `is_weekend`
## Notes
- Data is resampled into 5-minute buckets.
- Label is derived from incremental rain from WS90 cumulative `rain_mm`.