Files
go-weatherstation/docs/rain_prediction.md

3.1 KiB

Rain Prediction (Next 1 Hour)

This project includes a baseline workflow for binary rain prediction:

Will we see >= 0.2 mm of rain in the next hour?

It uses local observations (WS90 + barometer), trains a logistic regression baseline, and writes model-driven predictions back to TimescaleDB.

P0 Decisions (Locked)

  • Target: rain_next_1h_mm >= 0.2.
  • Primary use-case: low-noise rain heads-up signal for dashboard + alert candidate.
  • Frozen v1 training window (UTC): 2026-02-01T00:00:00Z to 2026-03-03T23:55:00Z.
  • Threshold policy: choose threshold on validation set by maximizing recall under precision >= 0.70; fallback to max-F1 if the precision constraint is unreachable.
  • Acceptance gate (test split): report and track precision, recall, ROC-AUC, PR-AUC, Brier score, and confusion matrix.

Requirements

Python 3.10+ and:

pandas
numpy
scikit-learn
psycopg2-binary
joblib

Install with:

python3 -m venv .venv
source .venv/bin/activate
pip install -r scripts/requirements.txt

Scripts

  • scripts/audit_rain_data.py: data quality + label quality + class balance audit.
  • scripts/train_rain_model.py: strict time-based split training and metrics report.
  • scripts/predict_rain_model.py: inference using saved model artifact; upserts into predictions_rain_1h.

Usage

1) Apply schema update (existing DBs)

001_schema.sql now includes predictions_rain_1h.

docker compose exec -T timescaledb \
  psql -U postgres -d micrometeo \
  -f /docker-entrypoint-initdb.d/001_schema.sql

2) Run data audit

export DATABASE_URL="postgres://postgres:postgres@localhost:5432/micrometeo?sslmode=disable"

python scripts/audit_rain_data.py \
  --site home \
  --start "2026-02-01T00:00:00Z" \
  --end "2026-03-03T23:55:00Z" \
  --out "models/rain_data_audit.json"

3) Train baseline model

python scripts/train_rain_model.py \
  --site "home" \
  --start "2026-02-01T00:00:00Z" \
  --end "2026-03-03T23:55:00Z" \
  --train-ratio 0.7 \
  --val-ratio 0.15 \
  --min-precision 0.70 \
  --model-version "rain-logreg-v1" \
  --out "models/rain_model.pkl" \
  --report-out "models/rain_model_report.json"

4) Run inference and store prediction

python scripts/predict_rain_model.py \
  --site home \
  --model-path "models/rain_model.pkl" \
  --model-name "rain_next_1h"

5) One-command P0 workflow

export DATABASE_URL="postgres://postgres:postgres@localhost:5432/micrometeo?sslmode=disable"
bash scripts/run_p0_rain_workflow.sh

Output

  • Audit report: models/rain_data_audit.json
  • Training report: models/rain_model_report.json
  • Model artifact: models/rain_model.pkl
  • Prediction rows: predictions_rain_1h (probability + threshold decision + realized outcome fields once available)

Model Features (v1)

  • pressure_trend_1h
  • humidity
  • temperature_c
  • wind_avg_m_s
  • wind_max_m_s

Notes

  • Data is resampled into 5-minute buckets.
  • Label is derived from incremental rain from WS90 cumulative rain_mm.
  • Timestamps are handled as UTC in training/inference workflow.