103 lines
2.5 KiB
Markdown
103 lines
2.5 KiB
Markdown
# Rain Prediction (Next 1 Hour)
|
|
|
|
This project now includes a starter training script for a **binary rain prediction**:
|
|
|
|
> **Will we see >= 0.2 mm of rain in the next hour?**
|
|
|
|
It uses local observations (WS90 + barometric pressure) and trains a lightweight
|
|
logistic regression model. This is a baseline you can iterate on as you collect
|
|
more data.
|
|
|
|
## What the script does
|
|
- Pulls data from TimescaleDB.
|
|
- Resamples observations to 5-minute buckets.
|
|
- Derives **pressure trend (1h)** from barometer data.
|
|
- Computes **future 1-hour rainfall** from the cumulative `rain_mm` counter.
|
|
- Trains a model and prints evaluation metrics.
|
|
|
|
The output is a saved model file (optional) you can use later for inference.
|
|
|
|
## Requirements
|
|
Python 3.10+ and:
|
|
|
|
```
|
|
pandas
|
|
numpy
|
|
scikit-learn
|
|
psycopg2-binary
|
|
joblib
|
|
```
|
|
|
|
Install with:
|
|
|
|
```sh
|
|
python3 -m venv .venv
|
|
source .venv/bin/activate
|
|
pip install -r scripts/requirements.txt
|
|
```
|
|
|
|
## Usage
|
|
|
|
```sh
|
|
python scripts/train_rain_model.py \
|
|
--db-url "postgres://postgres:postgres@localhost:5432/micrometeo?sslmode=disable" \
|
|
--site "home" \
|
|
--start "2026-01-01" \
|
|
--end "2026-02-01" \
|
|
--out "models/rain_model.pkl"
|
|
```
|
|
|
|
You can also provide the connection string via `DATABASE_URL`:
|
|
|
|
```sh
|
|
export DATABASE_URL="postgres://postgres:postgres@localhost:5432/micrometeo?sslmode=disable"
|
|
python scripts/train_rain_model.py --site home
|
|
```
|
|
|
|
## Output
|
|
The script prints metrics including:
|
|
- accuracy
|
|
- precision / recall
|
|
- ROC AUC
|
|
- confusion matrix
|
|
|
|
If `joblib` is installed, it saves a model bundle:
|
|
|
|
```
|
|
models/rain_model.pkl
|
|
```
|
|
|
|
This bundle contains:
|
|
- The trained model pipeline
|
|
- The feature list used during training
|
|
|
|
## Data needs / when to run
|
|
For a reliable model, you will want:
|
|
- **At least 2-4 weeks** of observations
|
|
- A mix of rainy and non-rainy periods
|
|
|
|
Training with only a few days will produce an unstable model.
|
|
|
|
## Features used
|
|
The baseline model uses:
|
|
- `pressure_trend_1h` (hPa)
|
|
- `humidity` (%)
|
|
- `temperature_c` (C)
|
|
- `wind_avg_m_s` (m/s)
|
|
- `wind_max_m_s` (m/s)
|
|
|
|
These are easy to expand once you have more data (e.g. add forecast features).
|
|
|
|
## Notes / assumptions
|
|
- Rain detection is based on **incremental rain** derived from the WS90
|
|
`rain_mm` cumulative counter.
|
|
- Pressure comes from `observations_baro`.
|
|
- All timestamps are treated as UTC.
|
|
|
|
## Next improvements
|
|
Ideas once more data is available:
|
|
- Add forecast precipitation and cloud cover as features
|
|
- Try gradient boosted trees (e.g. XGBoost / LightGBM)
|
|
- Train per-season models
|
|
- Calibrate probabilities (Platt scaling / isotonic regression)
|