add model training
This commit is contained in:
102
docs/rain_prediction.md
Normal file
102
docs/rain_prediction.md
Normal file
@@ -0,0 +1,102 @@
|
||||
# Rain Prediction (Next 1 Hour)
|
||||
|
||||
This project now includes a starter training script for a **binary rain prediction**:
|
||||
|
||||
> **Will we see >= 0.2 mm of rain in the next hour?**
|
||||
|
||||
It uses local observations (WS90 + barometric pressure) and trains a lightweight
|
||||
logistic regression model. This is a baseline you can iterate on as you collect
|
||||
more data.
|
||||
|
||||
## What the script does
|
||||
- Pulls data from TimescaleDB.
|
||||
- Resamples observations to 5-minute buckets.
|
||||
- Derives **pressure trend (1h)** from barometer data.
|
||||
- Computes **future 1-hour rainfall** from the cumulative `rain_mm` counter.
|
||||
- Trains a model and prints evaluation metrics.
|
||||
|
||||
The output is a saved model file (optional) you can use later for inference.
|
||||
|
||||
## Requirements
|
||||
Python 3.10+ and:
|
||||
|
||||
```
|
||||
pandas
|
||||
numpy
|
||||
scikit-learn
|
||||
psycopg2-binary
|
||||
joblib
|
||||
```
|
||||
|
||||
Install with:
|
||||
|
||||
```sh
|
||||
python3 -m venv .venv
|
||||
source .venv/bin/activate
|
||||
pip install -r scripts/requirements.txt
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
```sh
|
||||
python scripts/train_rain_model.py \
|
||||
--db-url "postgres://postgres:postgres@localhost:5432/micrometeo?sslmode=disable" \
|
||||
--site "home" \
|
||||
--start "2026-01-01" \
|
||||
--end "2026-02-01" \
|
||||
--out "models/rain_model.pkl"
|
||||
```
|
||||
|
||||
You can also provide the connection string via `DATABASE_URL`:
|
||||
|
||||
```sh
|
||||
export DATABASE_URL="postgres://postgres:postgres@localhost:5432/micrometeo?sslmode=disable"
|
||||
python scripts/train_rain_model.py --site home
|
||||
```
|
||||
|
||||
## Output
|
||||
The script prints metrics including:
|
||||
- accuracy
|
||||
- precision / recall
|
||||
- ROC AUC
|
||||
- confusion matrix
|
||||
|
||||
If `joblib` is installed, it saves a model bundle:
|
||||
|
||||
```
|
||||
models/rain_model.pkl
|
||||
```
|
||||
|
||||
This bundle contains:
|
||||
- The trained model pipeline
|
||||
- The feature list used during training
|
||||
|
||||
## Data needs / when to run
|
||||
For a reliable model, you will want:
|
||||
- **At least 2-4 weeks** of observations
|
||||
- A mix of rainy and non-rainy periods
|
||||
|
||||
Training with only a few days will produce an unstable model.
|
||||
|
||||
## Features used
|
||||
The baseline model uses:
|
||||
- `pressure_trend_1h` (hPa)
|
||||
- `humidity` (%)
|
||||
- `temperature_c` (C)
|
||||
- `wind_avg_m_s` (m/s)
|
||||
- `wind_max_m_s` (m/s)
|
||||
|
||||
These are easy to expand once you have more data (e.g. add forecast features).
|
||||
|
||||
## Notes / assumptions
|
||||
- Rain detection is based on **incremental rain** derived from the WS90
|
||||
`rain_mm` cumulative counter.
|
||||
- Pressure comes from `observations_baro`.
|
||||
- All timestamps are treated as UTC.
|
||||
|
||||
## Next improvements
|
||||
Ideas once more data is available:
|
||||
- Add forecast precipitation and cloud cover as features
|
||||
- Try gradient boosted trees (e.g. XGBoost / LightGBM)
|
||||
- Train per-season models
|
||||
- Calibrate probabilities (Platt scaling / isotonic regression)
|
||||
Reference in New Issue
Block a user