All checks were successful
continuous-integration/drone/push Build is passing
340 lines
15 KiB
Markdown
340 lines
15 KiB
Markdown
# Overview
|
||
vCTP is a vSphere Chargeback Tracking Platform, designed for a specific customer, so some decisions may not be applicable for your use case.
|
||
|
||
## Snapshots and Reports
|
||
- Hourly snapshots capture inventory per vCenter (concurrency via `hourly_snapshot_concurrency`).
|
||
- Daily summaries aggregate the hourly snapshots for the day; monthly summaries aggregate daily summaries for the month (or hourly snapshots if configured).
|
||
- Snapshots are registered in `snapshot_registry` so regeneration via `/api/snapshots/aggregate` can locate the correct tables (fallback scanning is also supported).
|
||
- vCenter totals pages now provide two views:
|
||
- Daily Aggregated (`/vcenters/totals/daily`) for fast long-range trends.
|
||
- Hourly Detail 45d (`/vcenters/totals/hourly`) for recent granular change tracking.
|
||
- vCenter totals performance is accelerated with compact cache tables:
|
||
- `vcenter_latest_totals` (one latest row per vCenter)
|
||
- `vcenter_aggregate_totals` (hourly/daily/monthly per-vCenter totals by snapshot time)
|
||
- VM Trace now supports two modes on `/vm/trace`:
|
||
- `view=hourly` (default) for full snapshot detail
|
||
- `view=daily` for daily aggregated trend lines (using `vm_daily_rollup` when available)
|
||
- Reports (XLSX with totals/charts) are generated automatically after hourly, daily, and monthly jobs and written to a reports directory.
|
||
- Hourly totals in reports are interval-based: each row represents `[HH:00, HH+1:00)` and uses the first snapshot at or after the hour end (including cross-day snapshots) to prorate VM presence by creation/deletion overlap.
|
||
- Monthly aggregation reports include a Daily Totals sheet with full-day interval labels (`YYYY-MM-DD to YYYY-MM-DD`) and prorated totals derived from daily summaries.
|
||
- Prometheus metrics are exposed at `/metrics`:
|
||
- Snapshots/aggregations: `vctp_hourly_snapshots_total`, `vctp_hourly_snapshots_failed_total`, `vctp_hourly_snapshot_last_unix`, `vctp_hourly_snapshot_last_rows`, `vctp_daily_aggregations_total`, `vctp_daily_aggregations_failed_total`, `vctp_daily_aggregation_duration_seconds`, `vctp_monthly_aggregations_total`, `vctp_monthly_aggregations_failed_total`, `vctp_monthly_aggregation_duration_seconds`, `vctp_reports_available`
|
||
- vCenter health/perf: `vctp_vcenter_connect_failures_total{vcenter}`, `vctp_vcenter_snapshot_duration_seconds{vcenter}`, `vctp_vcenter_inventory_size{vcenter}`
|
||
|
||
## Prorating and Aggregation Logic
|
||
Daily aggregation runs per VM using sample counts for the day:
|
||
- `SamplesPresent`: count of snapshot samples in which the VM appears.
|
||
- `TotalSamples`: count of unique snapshot timestamps for the vCenter in the day.
|
||
- `AvgIsPresent`: `SamplesPresent / TotalSamples` (0 when `TotalSamples` is 0).
|
||
- `AvgVcpuCount`, `AvgRamGB`, `AvgProvisionedDisk` (daily): `sum(values_per_sample) / TotalSamples` to time‑weight config changes and prorate partial‑day VMs.
|
||
- `PoolTinPct`, `PoolBronzePct`, `PoolSilverPct`, `PoolGoldPct` (daily): `(pool_hits / SamplesPresent) * 100`, so pool percentages reflect only the time the VM existed.
|
||
- `CreationTime`: only set when vCenter provides it; otherwise it remains `0`.
|
||
|
||
Monthly aggregation builds on daily summaries (or the daily rollup cache):
|
||
- For each VM, daily averages are converted to weighted sums: `daily_avg * daily_total_samples`.
|
||
- Monthly averages are `sum(weighted_sums) / monthly_total_samples` (per vCenter).
|
||
- Pool percentages are weighted the same way: `(daily_pool_pct / 100) * daily_total_samples`, summed, then divided by `monthly_total_samples` and multiplied by 100.
|
||
|
||
### Hourly Snapshot Fields
|
||
Each hourly snapshot row tracks:
|
||
- Identity: `InventoryId`, `Name`, `Vcenter`, `VmId`, `VmUuid`, `EventKey`, `CloudId`
|
||
- Lifecycle/timing: `CreationTime`, `DeletionTime`, `SnapshotTime`
|
||
- Placement: `ResourcePool`, `Datacenter`, `Cluster`, `Folder`
|
||
- Sizing/state: `ProvisionedDisk`, `VcpuCount`, `RamGB`, `IsTemplate`, `PoweredOn`, `SrmPlaceholder`
|
||
|
||
### Daily Aggregate Fields
|
||
Daily summary rows retain identity/placement/sizing fields and add:
|
||
- Sample coverage: `SamplesPresent`, `TotalSamples`, `AvgIsPresent`
|
||
- Time-weighted sizing: `AvgVcpuCount`, `AvgRamGB`, `AvgProvisionedDisk`
|
||
- Pool distribution percentages: `PoolTinPct`, `PoolBronzePct`, `PoolSilverPct`, `PoolGoldPct`
|
||
- Chargeback totals columns: `Tin`, `Bronze`, `Silver`, `Gold`
|
||
- Lifecycle carry-forward used by reports and trace: `CreationTime`, `DeletionTime`, `SnapshotTime`
|
||
|
||
### Monthly Aggregate Fields
|
||
Monthly summary rows keep the same aggregate fields as daily summaries and recompute them over the month:
|
||
- `SamplesPresent` is summed across days.
|
||
- Monthly averages (`AvgVcpuCount`, `AvgRamGB`, `AvgProvisionedDisk`) are weighted by each day's sample volume.
|
||
- Monthly presence (`AvgIsPresent`) is normalized by monthly total samples.
|
||
- Monthly pool percentages (`PoolTinPct`, `PoolBronzePct`, `PoolSilverPct`, `PoolGoldPct`) are weighted by each day’s sample volume before normalization.
|
||
- `Tin`, `Bronze`, `Silver`, `Gold` totals remain available for reporting output.
|
||
|
||
## RPM Layout (summary)
|
||
The RPM installs the service and defaults under `/usr/bin`, config under `/etc/dtms`, and data under `/var/lib/vctp`:
|
||
- Binary: `/usr/bin/vctp-linux-amd64`
|
||
- Systemd unit: `/etc/systemd/system/vctp.service`
|
||
- Defaults/config: `/etc/dtms/vctp.yml` (override with `-settings`), `/etc/default/vctp` (optional env flags)
|
||
- TLS cert/key: `/etc/dtms/vctp.crt` and `/etc/dtms/vctp.key` (generated if absent)
|
||
- Data: SQLite DB and reports default to `/var/lib/vctp` (reports under `/var/lib/vctp/reports`)
|
||
- Scripts: preinstall/postinstall handle directory creation and permissions.
|
||
|
||
# Settings File
|
||
Configuration now lives in the YAML settings file. By default the service reads
|
||
`/etc/dtms/vctp.yml`, or you can override it with the `-settings` flag.
|
||
|
||
```shell
|
||
vctp -settings /path/to/vctp.yml
|
||
```
|
||
|
||
If you just want to run a single inventory snapshot across all configured vCenters and
|
||
exit (no scheduler/server), use:
|
||
|
||
```shell
|
||
vctp -settings /path/to/vctp.yml -run-inventory
|
||
```
|
||
|
||
If you want a one-time SQLite cleanup to drop low-value hourly snapshot indexes and exit,
|
||
use:
|
||
|
||
```shell
|
||
vctp -settings /path/to/vctp.yml -db-cleanup
|
||
```
|
||
|
||
If you want a one-time cache backfill for the vCenter totals cache tables
|
||
(`vcenter_latest_totals` and `vcenter_aggregate_totals`) and exit, use:
|
||
|
||
```shell
|
||
vctp -settings /path/to/vctp.yml -backfill-vcenter-cache
|
||
```
|
||
|
||
The backfill command:
|
||
- Ensures/migrates `snapshot_registry` when needed.
|
||
- Rebuilds hourly/latest vCenter totals caches.
|
||
- Recomputes daily/monthly rows for `vcenter_aggregate_totals` from registered summary snapshots.
|
||
|
||
If you want a one-time SQLite-to-Postgres import and exit, use:
|
||
|
||
```shell
|
||
vctp -settings /path/to/vctp.yml -import-sqlite /path/to/legacy.sqlite3
|
||
```
|
||
|
||
The import command:
|
||
- Requires `settings.database_driver: postgres`.
|
||
- Copies data from the SQLite source into matching Postgres tables.
|
||
- Auto-creates runtime tables (hourly/daily/monthly snapshot tables and cache tables) when needed.
|
||
- Replaces existing data in imported Postgres tables during the run.
|
||
|
||
## Database Configuration
|
||
By default the app uses SQLite and creates/opens `db.sqlite3`.
|
||
|
||
PostgreSQL support is currently **experimental** and not a production target. To enable it,
|
||
set `settings.enable_experimental_postgres: true` in the settings file:
|
||
|
||
- `settings.database_driver`: `sqlite` (default) or `postgres` (experimental)
|
||
- `settings.database_url`: SQLite file path/DSN or PostgreSQL DSN
|
||
|
||
Examples:
|
||
```yaml
|
||
settings:
|
||
database_driver: sqlite
|
||
enable_experimental_postgres: false
|
||
database_url: ./db.sqlite3
|
||
|
||
settings:
|
||
database_driver: postgres
|
||
enable_experimental_postgres: true
|
||
database_url: postgres://user:pass@localhost:5432/vctp?sslmode=disable
|
||
```
|
||
|
||
### Initial PostgreSQL Setup
|
||
Create a dedicated PostgreSQL role and database (run as a PostgreSQL superuser):
|
||
|
||
```sql
|
||
CREATE ROLE vctp_user LOGIN PASSWORD 'change-this-password';
|
||
CREATE DATABASE vctp OWNER vctp_user;
|
||
```
|
||
|
||
Connect to the new database and grant privileges required for migrations and runtime table/index management:
|
||
|
||
```sql
|
||
\c vctp
|
||
ALTER DATABASE vctp OWNER TO vctp_user;
|
||
ALTER SCHEMA public OWNER TO vctp_user;
|
||
GRANT CONNECT, TEMP ON DATABASE vctp TO vctp_user;
|
||
GRANT USAGE, CREATE ON SCHEMA public TO vctp_user;
|
||
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO vctp_user;
|
||
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO vctp_user;
|
||
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON TABLES TO vctp_user;
|
||
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON SEQUENCES TO vctp_user;
|
||
```
|
||
|
||
Verify effective schema privileges (useful if migrations fail creating `goose_db_version`):
|
||
|
||
```sql
|
||
SELECT has_schema_privilege('vctp_user', 'public', 'USAGE,CREATE');
|
||
```
|
||
|
||
Recommended auth/network configuration:
|
||
|
||
- Ensure PostgreSQL is listening on the expected interface/port in `postgresql.conf` (for example, `listen_addresses` and `port`).
|
||
- Allow vCTP connections in `pg_hba.conf`. Example entries:
|
||
|
||
```conf
|
||
# local socket
|
||
local vctp vctp_user scram-sha-256
|
||
# TCP from application subnet
|
||
host vctp vctp_user 10.0.0.0/24 scram-sha-256
|
||
```
|
||
|
||
- Reload/restart PostgreSQL after config changes (`SELECT pg_reload_conf();` or your service manager).
|
||
- Ensure host firewall/network ACLs allow traffic to PostgreSQL (default `5432`).
|
||
|
||
Example `vctp.yml` database settings:
|
||
|
||
```yaml
|
||
settings:
|
||
database_driver: postgres
|
||
enable_experimental_postgres: true
|
||
database_url: postgres://vctp_user:change-this-password@db-hostname:5432/vctp?sslmode=disable
|
||
```
|
||
|
||
Validate connectivity before starting vCTP:
|
||
|
||
```shell
|
||
psql "postgres://vctp_user:change-this-password@db-hostname:5432/vctp?sslmode=disable"
|
||
```
|
||
|
||
PostgreSQL migrations live in `db/migrations_postgres`, while SQLite migrations remain in
|
||
`db/migrations`.
|
||
|
||
## Snapshot Retention
|
||
Hourly and daily snapshot table retention can be configured in the settings file:
|
||
|
||
- `settings.hourly_snapshot_max_age_days` (default: 60)
|
||
- `settings.daily_snapshot_max_age_months` (default: 12)
|
||
|
||
## Runtime Environment Flags
|
||
These optional flags are read from the process environment (for example via `/etc/default/vctp`):
|
||
|
||
- `DAILY_AGG_GO`: set to `1` (default in `src/vctp.default`) to use the Go daily aggregation path.
|
||
- `MONTHLY_AGG_GO`: set to `1` (default in `src/vctp.default`) to use the Go monthly aggregation path.
|
||
|
||
## Credential Encryption Lifecycle
|
||
At startup, vCTP resolves `settings.vcenter_password` using this order:
|
||
|
||
1. If value starts with `enc:v1:`, decrypt using the active key.
|
||
2. If no prefix, attempt legacy ciphertext decryption (active key, then legacy fallback keys).
|
||
3. If decrypt fails and value length is greater than 2, treat value as plaintext.
|
||
|
||
When steps 2 or 3 succeed, vCTP rewrites the setting in-place to `enc:v1:<ciphertext>`.
|
||
|
||
Behavior notes:
|
||
- Plaintext values with length `<= 2` are rejected.
|
||
- Malformed ciphertext is rejected safely (short payloads do not panic).
|
||
- Legacy encrypted values can still be migrated forward automatically.
|
||
|
||
## Deprecated API Endpoints
|
||
These endpoints are considered legacy and are disabled by default unless `settings.enable_legacy_api: true`:
|
||
|
||
- `/api/event/vm/create`
|
||
- `/api/event/vm/modify`
|
||
- `/api/event/vm/move`
|
||
- `/api/event/vm/delete`
|
||
- `/api/cleanup/updates`
|
||
- `/api/cleanup/vcenter`
|
||
|
||
When disabled, they return HTTP `410 Gone` with JSON error payload.
|
||
|
||
## Settings Reference
|
||
All configuration lives under the top-level `settings:` key in `vctp.yml`.
|
||
|
||
General:
|
||
- `settings.log_level`: logging verbosity (e.g., `debug`, `info`, `warn`, `error`)
|
||
- `settings.log_output`: log format, `text` or `json`
|
||
|
||
Database:
|
||
- `settings.database_driver`: `sqlite` or `postgres` (experimental)
|
||
- `settings.enable_experimental_postgres`: set `true` to allow PostgreSQL startup
|
||
- `settings.database_url`: SQLite file path/DSN or PostgreSQL DSN
|
||
|
||
HTTP/TLS:
|
||
- `settings.bind_ip`: IP address to bind the HTTP server
|
||
- `settings.bind_port`: TCP port to bind the HTTP server
|
||
- `settings.bind_port` below `1024` (for example `443`) requires privileged bind permissions.
|
||
The packaged systemd unit grants `CAP_NET_BIND_SERVICE` to the `vctp` user; if you run
|
||
vCTP outside that unit, grant equivalent capability or use a non-privileged port.
|
||
- `settings.bind_disable_tls`: `true` to serve plain HTTP (no TLS)
|
||
- `settings.tls_cert_filename`: PEM certificate path (TLS mode)
|
||
- `settings.tls_key_filename`: PEM private key path (TLS mode)
|
||
|
||
vCenter:
|
||
- `settings.encryption_key`: optional explicit key source for credential encryption/decryption.
|
||
If unset, vCTP derives a host key from hardware/host identity.
|
||
- `settings.vcenter_username`: vCenter username
|
||
- `settings.vcenter_password`: vCenter password (auto-encrypted on startup if plaintext length > 2)
|
||
- `settings.vcenter_insecure`: `true` to skip TLS verification
|
||
- `settings.enable_legacy_api`: set `true` to temporarily re-enable deprecated legacy endpoints
|
||
- `settings.vcenter_event_polling_seconds`: deprecated and ignored
|
||
- `settings.vcenter_inventory_polling_seconds`: deprecated and ignored
|
||
- `settings.vcenter_inventory_snapshot_seconds`: hourly snapshot cadence (seconds)
|
||
- `settings.vcenter_inventory_aggregate_seconds`: daily aggregation cadence (seconds)
|
||
- `settings.vcenter_addresses`: list of vCenter SDK URLs to monitor
|
||
|
||
Credential encryption:
|
||
- New encrypted values are written with `enc:v1:` prefix.
|
||
|
||
Snapshots:
|
||
- `settings.hourly_snapshot_concurrency`: max concurrent vCenter snapshots (0 = unlimited)
|
||
- `settings.hourly_snapshot_max_age_days`: retention for hourly tables
|
||
- `settings.daily_snapshot_max_age_months`: retention for daily tables
|
||
- `settings.hourly_index_max_age_days`: age gate for keeping per-hourly-table indexes (`-1` disables cleanup, `0` trims all)
|
||
- `settings.snapshot_cleanup_cron`: cron expression for cleanup job
|
||
- `settings.reports_dir`: directory to store generated XLSX reports (default: `/var/lib/vctp/reports`)
|
||
- `settings.hourly_snapshot_retry_seconds`: interval for retrying failed hourly snapshots (default: 300 seconds)
|
||
- `settings.hourly_snapshot_max_retries`: maximum retry attempts per vCenter snapshot (default: 3)
|
||
|
||
Filters/chargeback:
|
||
- `settings.tenants_to_filter`: list of tenant name patterns to exclude
|
||
- `settings.node_charge_clusters`: list of cluster name patterns for node chargeback
|
||
- `settings.srm_activeactive_vms`: list of SRM Active/Active VM name patterns
|
||
|
||
# Developer setup
|
||
|
||
## Pre-requisite tools
|
||
|
||
```shell
|
||
go install github.com/a-h/templ/cmd/templ@v0.3.977
|
||
go install github.com/sqlc-dev/sqlc/cmd/sqlc@v1.29.0
|
||
go install github.com/swaggo/swag/cmd/swag@v1.16.6
|
||
```
|
||
|
||
## Database
|
||
This project now uses [goose](https://github.com/pressly/goose) for DB migrations.
|
||
|
||
Install via `brew install goose` on a mac, or install via golang with command `go install github.com/pressly/goose/v3/cmd/goose@latest`
|
||
|
||
Create a new up/down migration file with this command
|
||
```shell
|
||
goose -dir db/migrations sqlite3 ./db.sqlite3 create init sql
|
||
```
|
||
|
||
```shell
|
||
sqlc generate
|
||
```
|
||
|
||
## HTML templates
|
||
Run `templ generate -path ./components` to generate code based on template files
|
||
|
||
## Documentation
|
||
Run `swag init --exclude "pkg.mod,pkg.build,pkg.tools" -o server/router/docs`
|
||
|
||
## Tests
|
||
Run the test suite:
|
||
|
||
```shell
|
||
go test ./...
|
||
```
|
||
|
||
Recommended static analysis:
|
||
|
||
```shell
|
||
go vet ./...
|
||
```
|
||
|
||
## CI/CD (Drone)
|
||
- `.drone.yml` defines a Docker pipeline:
|
||
- Restore/build caches for Go modules/tools.
|
||
- Build step installs generators (`templ`, `sqlc`, `swag`), regenerates code/docs, runs project scripts, and produces the `vctp-linux-amd64` binary.
|
||
- RPM step packages via `nfpm` using `vctp.yml`, emits RPMs into `./build/`.
|
||
- Optional SFTP deploy step uploads build artifacts (e.g., `vctp*`) to a remote host.
|
||
- Cache rebuild step preserves Go caches across runs.
|