add model training

This commit is contained in:
2026-02-02 17:08:43 +11:00
parent 737eef85ea
commit 8edd0dc8b0
7 changed files with 388 additions and 9 deletions

102
docs/rain_prediction.md Normal file
View File

@@ -0,0 +1,102 @@
# Rain Prediction (Next 1 Hour)
This project now includes a starter training script for a **binary rain prediction**:
> **Will we see >= 0.2 mm of rain in the next hour?**
It uses local observations (WS90 + barometric pressure) and trains a lightweight
logistic regression model. This is a baseline you can iterate on as you collect
more data.
## What the script does
- Pulls data from TimescaleDB.
- Resamples observations to 5-minute buckets.
- Derives **pressure trend (1h)** from barometer data.
- Computes **future 1-hour rainfall** from the cumulative `rain_mm` counter.
- Trains a model and prints evaluation metrics.
The output is a saved model file (optional) you can use later for inference.
## Requirements
Python 3.10+ and:
```
pandas
numpy
scikit-learn
psycopg2-binary
joblib
```
Install with:
```sh
python3 -m venv .venv
source .venv/bin/activate
pip install -r scripts/requirements.txt
```
## Usage
```sh
python scripts/train_rain_model.py \
--db-url "postgres://postgres:postgres@localhost:5432/micrometeo?sslmode=disable" \
--site "home" \
--start "2026-01-01" \
--end "2026-02-01" \
--out "models/rain_model.pkl"
```
You can also provide the connection string via `DATABASE_URL`:
```sh
export DATABASE_URL="postgres://postgres:postgres@localhost:5432/micrometeo?sslmode=disable"
python scripts/train_rain_model.py --site home
```
## Output
The script prints metrics including:
- accuracy
- precision / recall
- ROC AUC
- confusion matrix
If `joblib` is installed, it saves a model bundle:
```
models/rain_model.pkl
```
This bundle contains:
- The trained model pipeline
- The feature list used during training
## Data needs / when to run
For a reliable model, you will want:
- **At least 2-4 weeks** of observations
- A mix of rainy and non-rainy periods
Training with only a few days will produce an unstable model.
## Features used
The baseline model uses:
- `pressure_trend_1h` (hPa)
- `humidity` (%)
- `temperature_c` (C)
- `wind_avg_m_s` (m/s)
- `wind_max_m_s` (m/s)
These are easy to expand once you have more data (e.g. add forecast features).
## Notes / assumptions
- Rain detection is based on **incremental rain** derived from the WS90
`rain_mm` cumulative counter.
- Pressure comes from `observations_baro`.
- All timestamps are treated as UTC.
## Next improvements
Ideas once more data is available:
- Add forecast precipitation and cloud cover as features
- Try gradient boosted trees (e.g. XGBoost / LightGBM)
- Train per-season models
- Calibrate probabilities (Platt scaling / isotonic regression)