more work on model training

This commit is contained in:
2026-03-05 20:49:19 +11:00
parent 76270e5650
commit 5b8cad905f
9 changed files with 380 additions and 48 deletions

View File

@@ -42,6 +42,11 @@ pip install -r scripts/requirements.txt
`predictions_rain_1h`.
- `scripts/run_rain_ml_worker.py`: long-running worker for periodic training + prediction.
Feature-set options:
- `baseline`: original 5 local observation features.
- `extended`: adds wind-direction encoding, lag/rolling stats, recent rain accumulation,
and aligned forecast features from `forecast_openmeteo_hourly`.
## Usage
### 1) Apply schema update (existing DBs)
`001_schema.sql` now includes `predictions_rain_1h`.
@@ -60,6 +65,7 @@ python scripts/audit_rain_data.py \
--site home \
--start "2026-02-01T00:00:00Z" \
--end "2026-03-03T23:55:00Z" \
--feature-set "baseline" \
--out "models/rain_data_audit.json"
```
@@ -72,9 +78,25 @@ python scripts/train_rain_model.py \
--train-ratio 0.7 \
--val-ratio 0.15 \
--min-precision 0.70 \
--feature-set "baseline" \
--model-version "rain-logreg-v1" \
--out "models/rain_model.pkl" \
--report-out "models/rain_model_report.json"
--report-out "models/rain_model_report.json" \
--dataset-out "models/datasets/rain_dataset_{model_version}_{feature_set}.csv"
```
### 3b) Train expanded (P1) feature-set model
```sh
python scripts/train_rain_model.py \
--site "home" \
--start "2026-02-01T00:00:00Z" \
--end "2026-03-03T23:55:00Z" \
--feature-set "extended" \
--forecast-model "ecmwf" \
--model-version "rain-logreg-v1-extended" \
--out "models/rain_model_extended.pkl" \
--report-out "models/rain_model_report_extended.json" \
--dataset-out "models/datasets/rain_dataset_{model_version}_{feature_set}.csv"
```
### 4) Run inference and store prediction
@@ -107,16 +129,28 @@ docker compose logs -f rainml
- Audit report: `models/rain_data_audit.json`
- Training report: `models/rain_model_report.json`
- Model artifact: `models/rain_model.pkl`
- Dataset snapshot: `models/datasets/rain_dataset_<model_version>_<feature_set>.csv`
- Prediction rows: `predictions_rain_1h` (probability + threshold decision + realized
outcome fields once available)
## Model Features (v1)
## Model Features (v1 baseline)
- `pressure_trend_1h`
- `humidity`
- `temperature_c`
- `wind_avg_m_s`
- `wind_max_m_s`
## Model Features (extended set)
- baseline features, plus:
- `wind_dir_sin`, `wind_dir_cos`
- `temp_lag_5m`, `temp_roll_1h_mean`, `temp_roll_1h_std`
- `humidity_lag_5m`, `humidity_roll_1h_mean`, `humidity_roll_1h_std`
- `wind_avg_lag_5m`, `wind_avg_roll_1h_mean`, `wind_gust_roll_1h_max`
- `pressure_lag_5m`, `pressure_roll_1h_mean`, `pressure_roll_1h_std`
- `rain_last_1h_mm`
- `fc_temp_c`, `fc_rh`, `fc_pressure_msl_hpa`, `fc_wind_m_s`, `fc_wind_gust_m_s`,
`fc_precip_mm`, `fc_precip_prob`, `fc_cloud_cover`
## Notes
- Data is resampled into 5-minute buckets.
- Label is derived from incremental rain from WS90 cumulative `rain_mm`.