614 lines
29 KiB
Markdown
614 lines
29 KiB
Markdown
# Overview
|
||
vCTP is a vSphere Chargeback Tracking Platform, designed for a specific customer, so some decisions may not be applicable for your use case.
|
||
|
||
## Snapshots and Reports
|
||
- Hourly snapshots capture inventory per vCenter (concurrency via `hourly_snapshot_concurrency`).
|
||
- Daily summaries aggregate the hourly snapshots for the day; monthly summaries aggregate daily summaries for the month (or hourly snapshots if configured).
|
||
- Snapshots are registered in `snapshot_registry` so regeneration via `/api/snapshots/aggregate` can locate the correct tables (fallback scanning is also supported).
|
||
- vCenter totals pages now provide two views:
|
||
- Daily Aggregated (`/vcenters/totals/daily`) for fast long-range trends.
|
||
- Hourly Detail 45d (`/vcenters/totals/hourly`) for recent granular change tracking.
|
||
- vCenter totals performance is accelerated with compact cache tables:
|
||
- `vcenter_latest_totals` (one latest row per vCenter)
|
||
- `vcenter_aggregate_totals` (hourly/daily/monthly per-vCenter totals by snapshot time)
|
||
- VM Trace now supports two modes on `/vm/trace`:
|
||
- `view=hourly` (default) for full snapshot detail
|
||
- `view=daily` for daily aggregated trend lines (using `vm_daily_rollup` when available)
|
||
- Reports (XLSX with totals/charts) are generated automatically after hourly, daily, and monthly jobs and written to a reports directory.
|
||
- Hourly totals in reports are interval-based: each row represents `[HH:00, HH+1:00)` and uses the first snapshot at or after the hour end (including cross-day snapshots) to prorate VM presence by creation/deletion overlap.
|
||
- Monthly aggregation reports include a Daily Totals sheet with full-day interval labels (`YYYY-MM-DD to YYYY-MM-DD`) and prorated totals derived from daily summaries.
|
||
- Prometheus metrics are exposed at `/metrics`:
|
||
- Snapshots/aggregations: `vctp_hourly_snapshots_total`, `vctp_hourly_snapshots_failed_total`, `vctp_hourly_snapshot_last_unix`, `vctp_hourly_snapshot_last_rows`, `vctp_daily_aggregations_total`, `vctp_daily_aggregations_failed_total`, `vctp_daily_aggregation_duration_seconds`, `vctp_monthly_aggregations_total`, `vctp_monthly_aggregations_failed_total`, `vctp_monthly_aggregation_duration_seconds`, `vctp_reports_available`
|
||
- vCenter health/perf: `vctp_vcenter_connect_failures_total{vcenter}`, `vctp_vcenter_snapshot_duration_seconds{vcenter}`, `vctp_vcenter_inventory_size{vcenter}`
|
||
|
||
## Prorating and Aggregation Logic
|
||
Daily aggregation runs per VM using sample counts for the day:
|
||
- `SamplesPresent`: count of snapshot samples in which the VM appears.
|
||
- `TotalSamples`: count of unique snapshot timestamps for the vCenter in the day.
|
||
- `AvgIsPresent`: `SamplesPresent / TotalSamples` (0 when `TotalSamples` is 0).
|
||
- `AvgVcpuCount`, `AvgRamGB`, `AvgProvisionedDisk` (daily): `sum(values_per_sample) / TotalSamples` to time‑weight config changes and prorate partial‑day VMs.
|
||
- `PoolTinPct`, `PoolBronzePct`, `PoolSilverPct`, `PoolGoldPct` (daily): `(pool_hits / SamplesPresent) * 100`, so pool percentages reflect only the time the VM existed.
|
||
- `CreationTime`: only set when vCenter provides it; otherwise it remains `0`.
|
||
|
||
Monthly aggregation builds on daily summaries (or the daily rollup cache):
|
||
- For each VM, daily averages are converted to weighted sums: `daily_avg * daily_total_samples`.
|
||
- Monthly averages are `sum(weighted_sums) / monthly_total_samples` (per vCenter).
|
||
- Pool percentages are weighted the same way: `(daily_pool_pct / 100) * daily_total_samples`, summed, then divided by `monthly_total_samples` and multiplied by 100.
|
||
|
||
### Hourly Snapshot Fields
|
||
Each hourly snapshot row tracks:
|
||
- Identity: `InventoryId`, `Name`, `Vcenter`, `VmId`, `VmUuid`, `EventKey`, `CloudId`
|
||
- Lifecycle/timing: `CreationTime`, `DeletionTime`, `SnapshotTime`
|
||
- Placement: `ResourcePool`, `Datacenter`, `Cluster`, `Folder`
|
||
- Sizing/state: `ProvisionedDisk`, `VcpuCount`, `RamGB`, `IsTemplate`, `PoweredOn`, `SrmPlaceholder`
|
||
|
||
### Daily Aggregate Fields
|
||
Daily summary rows retain identity/placement/sizing fields and add:
|
||
- Sample coverage: `SamplesPresent`, `TotalSamples`, `AvgIsPresent`
|
||
- Time-weighted sizing: `AvgVcpuCount`, `AvgRamGB`, `AvgProvisionedDisk`
|
||
- Pool distribution percentages: `PoolTinPct`, `PoolBronzePct`, `PoolSilverPct`, `PoolGoldPct`
|
||
- Chargeback totals columns: `Tin`, `Bronze`, `Silver`, `Gold`
|
||
- Lifecycle carry-forward used by reports and trace: `CreationTime`, `DeletionTime`, `SnapshotTime`
|
||
|
||
### Monthly Aggregate Fields
|
||
Monthly summary rows keep the same aggregate fields as daily summaries and recompute them over the month:
|
||
- `SamplesPresent` is summed across days.
|
||
- Monthly averages (`AvgVcpuCount`, `AvgRamGB`, `AvgProvisionedDisk`) are weighted by each day's sample volume.
|
||
- Monthly presence (`AvgIsPresent`) is normalized by monthly total samples.
|
||
- Monthly pool percentages (`PoolTinPct`, `PoolBronzePct`, `PoolSilverPct`, `PoolGoldPct`) are weighted by each day’s sample volume before normalization.
|
||
- `Tin`, `Bronze`, `Silver`, `Gold` totals remain available for reporting output.
|
||
|
||
## RPM Layout (summary)
|
||
The RPM installs the service and defaults under `/usr/bin`, config under `/etc/dtms`, and data under `/var/lib/vctp`:
|
||
- Binary: `/usr/bin/vctp-linux-amd64`
|
||
- Systemd unit: `/etc/systemd/system/vctp.service`
|
||
- Defaults/config: `/etc/dtms/vctp.yml` (override with `-settings`), `/etc/default/vctp` (optional env flags)
|
||
- TLS cert/key: `/etc/dtms/vctp.crt` and `/etc/dtms/vctp.key` (generated if absent)
|
||
- Data: SQLite DB and reports default to `/var/lib/vctp` (reports under `/var/lib/vctp/reports`)
|
||
- Scripts: preinstall/postinstall handle directory creation and permissions.
|
||
|
||
# Settings File
|
||
Configuration now lives in the YAML settings file. By default the service reads
|
||
`/etc/dtms/vctp.yml`, or you can override it with the `-settings` flag.
|
||
|
||
```shell
|
||
vctp -settings /path/to/vctp.yml
|
||
```
|
||
|
||
If you just want to run a single inventory snapshot across all configured vCenters and
|
||
exit (no scheduler/server), use:
|
||
|
||
```shell
|
||
vctp -settings /path/to/vctp.yml -run-inventory
|
||
```
|
||
|
||
If you want a one-time SQLite cleanup to drop low-value hourly snapshot indexes and exit,
|
||
use:
|
||
|
||
```shell
|
||
vctp -settings /path/to/vctp.yml -db-cleanup
|
||
```
|
||
|
||
If you want a one-time cache backfill for the vCenter totals cache tables
|
||
(`vcenter_latest_totals` and `vcenter_aggregate_totals`) and exit, use:
|
||
|
||
```shell
|
||
vctp -settings /path/to/vctp.yml -backfill-vcenter-cache
|
||
```
|
||
|
||
The backfill command:
|
||
- Ensures/migrates `snapshot_registry` when needed.
|
||
- Rebuilds hourly/latest vCenter totals caches.
|
||
- Recomputes daily/monthly rows for `vcenter_aggregate_totals` from registered summary snapshots.
|
||
|
||
If you want a one-time SQLite-to-Postgres import and exit, use:
|
||
|
||
```shell
|
||
vctp -settings /path/to/vctp.yml -import-sqlite /path/to/legacy.sqlite3
|
||
```
|
||
|
||
The import command:
|
||
- Requires `settings.database_driver: postgres`.
|
||
- Copies data from the SQLite source into matching Postgres tables.
|
||
- Auto-creates runtime tables (hourly/daily/monthly snapshot tables and cache tables) when needed.
|
||
- Replaces existing data in imported Postgres tables during the run.
|
||
|
||
If you want a one-time canonical aggregation benchmark (Go vs SQL cores) and exit, use:
|
||
|
||
```shell
|
||
vctp -settings /path/to/vctp.yml -benchmark-aggregations -benchmark-runs 3
|
||
```
|
||
|
||
The benchmark command:
|
||
- Uses canonical cache sources (`vm_hourly_stats` for daily, `vm_daily_rollup` for monthly).
|
||
- Runs Go and SQL aggregation cores for the latest available daily/monthly windows.
|
||
- Writes results to startup logs and exits without changing scheduled defaults.
|
||
|
||
### Benchmark method and decision record
|
||
- Run the benchmark on the target environment and database profile before deciding defaults:
|
||
- `vctp -settings /path/to/vctp.yml -benchmark-aggregations -benchmark-runs 3`
|
||
- Current local comparison snapshot (2026-04-20) is recorded in `phase-metrics-2026-04-20.md`.
|
||
- Latest tuned Postgres snapshot (2026-04-21, `runs=3`) showed:
|
||
- Daily window (`2026-04-21` to `2026-04-22` UTC): Go avg `2.261369712s` vs SQL avg `1m31.738727387s` (Go ~`40.57x` faster).
|
||
- Monthly window (`2026-04-01` to `2026-05-01` UTC): Go avg `3.705308832s` vs SQL avg `3.065612298s` (SQL ~`1.21x` faster).
|
||
- Default-path decision remains `settings.scheduled_aggregation_engine: go`.
|
||
- Promote SQL only when representative production-scale **Postgres** runs show clear, repeatable wins.
|
||
|
||
## Database Configuration
|
||
By default the app uses SQLite and creates/opens `db.sqlite3`.
|
||
|
||
PostgreSQL support is currently **experimental** and not a production target. To enable it,
|
||
set `settings.enable_experimental_postgres: true` in the settings file:
|
||
|
||
- `settings.database_driver`: `sqlite` (default) or `postgres` (experimental)
|
||
- `settings.database_url`: SQLite file path/DSN or PostgreSQL DSN
|
||
|
||
Examples:
|
||
```yaml
|
||
settings:
|
||
database_driver: sqlite
|
||
enable_experimental_postgres: false
|
||
database_url: ./db.sqlite3
|
||
|
||
settings:
|
||
database_driver: postgres
|
||
enable_experimental_postgres: true
|
||
database_url: postgres://user:pass@localhost:5432/vctp?sslmode=disable
|
||
```
|
||
|
||
### Initial PostgreSQL Setup
|
||
Create a dedicated PostgreSQL role and database (run as a PostgreSQL superuser):
|
||
|
||
```sql
|
||
CREATE ROLE vctp_user LOGIN PASSWORD 'change-this-password';
|
||
CREATE DATABASE vctp OWNER vctp_user;
|
||
```
|
||
|
||
Connect to the new database and grant privileges required for migrations and runtime table/index management:
|
||
|
||
```sql
|
||
\c vctp
|
||
ALTER DATABASE vctp OWNER TO vctp_user;
|
||
ALTER SCHEMA public OWNER TO vctp_user;
|
||
GRANT CONNECT, TEMP ON DATABASE vctp TO vctp_user;
|
||
GRANT USAGE, CREATE ON SCHEMA public TO vctp_user;
|
||
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO vctp_user;
|
||
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO vctp_user;
|
||
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON TABLES TO vctp_user;
|
||
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON SEQUENCES TO vctp_user;
|
||
```
|
||
|
||
Verify effective schema privileges (useful if migrations fail creating `goose_db_version`):
|
||
|
||
```sql
|
||
SELECT has_schema_privilege('vctp_user', 'public', 'USAGE,CREATE');
|
||
```
|
||
|
||
Recommended auth/network configuration:
|
||
|
||
- Ensure PostgreSQL is listening on the expected interface/port in `postgresql.conf` (for example, `listen_addresses` and `port`).
|
||
- Allow vCTP connections in `pg_hba.conf`. Example entries:
|
||
|
||
```conf
|
||
# local socket
|
||
local vctp vctp_user scram-sha-256
|
||
# TCP from application subnet
|
||
host vctp vctp_user 10.0.0.0/24 scram-sha-256
|
||
```
|
||
|
||
- Reload/restart PostgreSQL after config changes (`SELECT pg_reload_conf();` or your service manager).
|
||
- Ensure host firewall/network ACLs allow traffic to PostgreSQL (default `5432`).
|
||
|
||
Example `vctp.yml` database settings:
|
||
|
||
```yaml
|
||
settings:
|
||
database_driver: postgres
|
||
enable_experimental_postgres: true
|
||
database_url: postgres://vctp_user:change-this-password@db-hostname:5432/vctp?sslmode=disable
|
||
```
|
||
|
||
Validate connectivity before starting vCTP:
|
||
|
||
```shell
|
||
psql "postgres://vctp_user:change-this-password@db-hostname:5432/vctp?sslmode=disable"
|
||
```
|
||
|
||
### PostgreSQL tuning baseline (20 vCPU / 64 GB host)
|
||
If your PostgreSQL instance is still running near-default settings, use this as a practical starting profile for vCTP workloads (hourly ingest + daily/monthly aggregation).
|
||
|
||
Choose one profile:
|
||
- Dedicated DB host (PostgreSQL is the primary service on this machine): use the `dedicated` values.
|
||
- Shared host (vCTP app + PostgreSQL on same machine): use the `shared` values.
|
||
|
||
Recommended `postgresql.conf` starting points:
|
||
|
||
```conf
|
||
# Memory
|
||
shared_buffers = 16GB # dedicated
|
||
# shared_buffers = 12GB # shared
|
||
effective_cache_size = 48GB # dedicated
|
||
# effective_cache_size = 36GB # shared
|
||
work_mem = 32MB # dedicated
|
||
# work_mem = 16MB # shared
|
||
maintenance_work_mem = 2GB # dedicated
|
||
# maintenance_work_mem = 1GB # shared
|
||
|
||
# WAL / checkpoints
|
||
wal_compression = on
|
||
checkpoint_timeout = 15min
|
||
checkpoint_completion_target = 0.9
|
||
max_wal_size = 16GB
|
||
min_wal_size = 2GB
|
||
|
||
# Parallelism and connections
|
||
max_connections = 120
|
||
max_worker_processes = 20
|
||
max_parallel_workers = 20
|
||
max_parallel_workers_per_gather = 4
|
||
max_parallel_maintenance_workers = 4
|
||
|
||
# Planner / IO (SSD/NVMe)
|
||
random_page_cost = 1.1
|
||
effective_io_concurrency = 200
|
||
default_statistics_target = 200
|
||
|
||
# Autovacuum for high-write canonical tables
|
||
autovacuum_max_workers = 6
|
||
autovacuum_naptime = 30s
|
||
autovacuum_vacuum_scale_factor = 0.02
|
||
autovacuum_analyze_scale_factor = 0.01
|
||
autovacuum_vacuum_cost_limit = 2000
|
||
|
||
# Useful diagnostics
|
||
track_io_timing = on
|
||
log_temp_files = 32MB
|
||
```
|
||
|
||
Apply and validate:
|
||
- Reload config (`SELECT pg_reload_conf();`) or restart PostgreSQL if required by your platform.
|
||
- Confirm active values with:
|
||
|
||
```sql
|
||
SHOW shared_buffers;
|
||
SHOW effective_cache_size;
|
||
SHOW work_mem;
|
||
SHOW maintenance_work_mem;
|
||
SHOW max_wal_size;
|
||
SHOW autovacuum_vacuum_scale_factor;
|
||
```
|
||
|
||
After tuning, rerun the canonical benchmark and compare against your pre-tuning snapshot:
|
||
|
||
```shell
|
||
vctp -settings /path/to/vctp.yml -benchmark-aggregations -benchmark-runs 3
|
||
```
|
||
|
||
Notes:
|
||
- `work_mem` is per sort/hash operation, not per session; avoid setting it too high globally.
|
||
- Keep `settings.scheduled_aggregation_engine: go` as default unless repeated production-scale benchmarks show SQL is consistently faster on your canonical Postgres data.
|
||
|
||
PostgreSQL migrations live in `db/migrations_postgres`, while SQLite migrations remain in
|
||
`db/migrations`.
|
||
|
||
## Snapshot Retention
|
||
Hourly and daily snapshot table retention can be configured in the settings file:
|
||
|
||
- `settings.hourly_snapshot_max_age_days` (default: 60)
|
||
- `settings.daily_snapshot_max_age_months` (default: 12)
|
||
|
||
## Runtime Environment Flags
|
||
These optional flags are read from the process environment (for example via `/etc/default/vctp`):
|
||
|
||
- `DAILY_AGG_GO`: set to `1` (default in `src/vctp.default`) to force Go for manual daily runs.
|
||
- `DAILY_AGG_SQL`: set to `1` to force legacy SQL fallback for manual daily runs.
|
||
- `MONTHLY_AGG_GO`: set to `1` (default in `src/vctp.default`) to force Go for manual monthly runs.
|
||
- `MONTHLY_AGG_SQL`: set to `1` to force legacy SQL fallback for manual monthly runs.
|
||
|
||
Scheduled aggregation engine selection is controlled by YAML (`settings.scheduled_aggregation_engine`), not these env vars.
|
||
|
||
## Authentication and Authorization
|
||
Authentication uses LDAP bind + JWT bearer tokens.
|
||
|
||
Login flow:
|
||
1. Call `POST /api/auth/login` with JSON body:
|
||
```json
|
||
{ "username": "your-user", "password": "your-password" }
|
||
```
|
||
2. On success, use returned `access_token` as:
|
||
```http
|
||
Authorization: Bearer <access_token>
|
||
```
|
||
3. Optional whoami/debug check: call `GET /api/auth/me` with the bearer token to view current JWT identity/role claims.
|
||
|
||
Auth audit logging:
|
||
- vCTP emits structured `auth_audit` log events for login decisions, token validation denials, and role authorization denials.
|
||
- Logs include request metadata and decision reason, but do not log credentials or raw bearer tokens.
|
||
|
||
Auth modes:
|
||
- `settings.auth_mode: disabled`: middleware bypassed.
|
||
- `settings.auth_mode: optional`: protected endpoints accept missing token, but validate any provided token.
|
||
- `settings.auth_mode: required`: protected endpoints require a valid bearer token.
|
||
|
||
Role policy:
|
||
- `viewer`: read/report APIs (for example `/api/report/*`, `/api/diagnostics/daily-creation`).
|
||
- `admin`: mutating/admin APIs (for example `/api/snapshots/*` mutating endpoints, `/api/event/*`, `/api/import/vm`, `/api/encrypt`, `/api/vcenters/cache/rebuild`).
|
||
- `admin` implies `viewer` access.
|
||
|
||
### LDAP group configuration (`auth_group_role_mappings` and `ldap_groups`)
|
||
Use full LDAP group DNs for both settings (for example `CN=vctp-admins,OU=Groups,DC=example,DC=com`).
|
||
|
||
- `settings.auth_group_role_mappings` is required when `settings.auth_enabled: true`.
|
||
- Mapping values must be `viewer` or `admin`.
|
||
- A user must resolve to at least one mapped role to log in.
|
||
- `settings.ldap_groups` is optional and acts as an additional allowlist gate.
|
||
- If `settings.ldap_groups` is empty/omitted, allowlist checking is skipped, but mapped-role resolution is still required.
|
||
- DN comparisons are normalized (trimmed + case-insensitive), but using exact directory DNs is still recommended.
|
||
|
||
Example (common setup where viewer/admin groups are both mapped and allowlisted):
|
||
|
||
```yaml
|
||
settings:
|
||
auth_enabled: true
|
||
auth_mode: required
|
||
ldap_bind_address: ldaps://ad01.example.com:636
|
||
ldap_base_dn: DC=example,DC=com
|
||
# Optional user lookup scope; defaults to ldap_base_dn when omitted.
|
||
ldap_user_base_dn: OU=Users,DC=example,DC=com
|
||
auth_group_role_mappings:
|
||
"CN=vctp-viewers,OU=Groups,DC=example,DC=com": viewer
|
||
"CN=vctp-admins,OU=Groups,DC=example,DC=com": admin
|
||
ldap_groups:
|
||
- "CN=vctp-viewers,OU=Groups,DC=example,DC=com"
|
||
- "CN=vctp-admins,OU=Groups,DC=example,DC=com"
|
||
```
|
||
|
||
Example (`ldap_groups` omitted, only role mapping enforced):
|
||
|
||
```yaml
|
||
settings:
|
||
auth_enabled: true
|
||
auth_mode: required
|
||
auth_group_role_mappings:
|
||
"CN=vctp-viewers,OU=Groups,DC=example,DC=com": viewer
|
||
"CN=vctp-admins,OU=Groups,DC=example,DC=com": admin
|
||
```
|
||
|
||
Example (`ldap_groups` can be broader, but users still need at least one mapped role):
|
||
|
||
```yaml
|
||
settings:
|
||
auth_enabled: true
|
||
auth_mode: required
|
||
auth_group_role_mappings:
|
||
"CN=vctp-viewers,OU=Groups,DC=example,DC=com": viewer
|
||
"CN=vctp-admins,OU=Groups,DC=example,DC=com": admin
|
||
ldap_groups:
|
||
- "CN=vctp-viewers,OU=Groups,DC=example,DC=com"
|
||
- "CN=vctp-admins,OU=Groups,DC=example,DC=com"
|
||
- "CN=platform-operators,OU=Groups,DC=example,DC=com"
|
||
```
|
||
|
||
Tip: after a successful login, call `GET /api/auth/me` and inspect the returned `groups` claim to copy exact group DN values from your directory.
|
||
|
||
Public endpoints:
|
||
- UI pages (`/`, `/vcenters`, `/snapshots/*`, `/vm/trace`)
|
||
- Swagger UI/docs (`/swagger`, `/swagger/`, `/swagger.json`)
|
||
- Metrics (`/metrics`)
|
||
- Login (`/api/auth/login`)
|
||
|
||
Debug endpoints:
|
||
- `/debug/pprof/*` handlers are only registered when `settings.enable_pprof: true`.
|
||
- When enabled, they require an authenticated `admin` token.
|
||
|
||
## Airgapped Static Assets
|
||
vCTP is safe for airgapped operation without internet/CDN dependencies for UI/docs assets:
|
||
|
||
- CSS, JS, and favicon assets are bundled into the binary via Go `embed` and served from local routes (`/assets/*`, `/favicon*`).
|
||
- Swagger UI is vendored under `server/router/swagger-ui-dist` and served locally from `/swagger/*`.
|
||
- Swagger spec is served locally from `/swagger.json` (`validatorUrl` is disabled in the initializer).
|
||
- Static responses include cache headers. In release builds, versioned assets are served with long-lived cache headers and immutable caching.
|
||
|
||
This means runtime access to external asset hosts is not required.
|
||
|
||
## Credential Encryption Lifecycle
|
||
At startup, vCTP resolves `settings.vcenter_password` using this order:
|
||
|
||
1. If value starts with `enc:v1:`, decrypt using the active key.
|
||
2. If no prefix, attempt legacy ciphertext decryption (active key, then legacy fallback keys).
|
||
3. If decrypt fails and value length is greater than 2, treat value as plaintext.
|
||
|
||
When steps 2 or 3 succeed, vCTP rewrites the setting in-place to `enc:v1:<ciphertext>`.
|
||
|
||
Behavior notes:
|
||
- Plaintext values with length `<= 2` are rejected.
|
||
- Malformed ciphertext is rejected safely (short payloads do not panic).
|
||
- Legacy encrypted values can still be migrated forward automatically.
|
||
|
||
## Deprecated API Endpoints
|
||
These endpoints are considered legacy and are disabled by default unless `settings.enable_legacy_api: true`:
|
||
|
||
- `/api/event/vm/create`
|
||
- `/api/event/vm/modify`
|
||
- `/api/event/vm/move`
|
||
- `/api/event/vm/delete`
|
||
- `/api/cleanup/updates`
|
||
- `/api/cleanup/vcenter`
|
||
|
||
When disabled, they return HTTP `410 Gone` with JSON error payload.
|
||
|
||
## Compatibility mode lifecycle (`snapshot_table_compat_mode`)
|
||
- Default is `true` during migration phases.
|
||
- `true`: scheduled hourly capture continues writing legacy `inventory_hourly_*` outputs in addition to canonical tables.
|
||
- `false`: scheduled hourly capture writes canonical hourly cache and lifecycle/totals caches only.
|
||
- Disable criteria:
|
||
- parity/integration/compatibility test gates are passing
|
||
- baseline-vs-post-change metrics comparison is recorded and accepted
|
||
- repair/backfill workflows are validated in the target environment
|
||
- Rollback to legacy hourly output is immediate: set `snapshot_table_compat_mode: true` and restart the service.
|
||
- Compatibility repair/backfill workflows remain available through:
|
||
- `POST /api/snapshots/aggregate`
|
||
- `POST /api/snapshots/repair`
|
||
- `POST /api/snapshots/repair/all`
|
||
- `POST /api/snapshots/regenerate-hourly-reports`
|
||
- `POST /api/vcenters/cache/rebuild`
|
||
- `vctp -settings /path/to/vctp.yml -backfill-vcenter-cache`
|
||
|
||
## Migration runbook (staged rollout, rollback, repair)
|
||
1. Baseline: capture current metrics/state (`phase0-baseline.md` style snapshot) and verify auth/report contracts.
|
||
2. Enable canonical runtime settings (already defaulted): `capture_write_batch_size: 1000`, `snapshot_table_compat_mode: true`, `async_report_generation: true`, `scheduled_aggregation_engine: go`.
|
||
3. Deploy and monitor: review `/metrics`, `snapshot_runs`, `cron_status`, and generated reports for at least one full hourly/daily cycle.
|
||
4. Validate canonicity gates: run parity/integration/compatibility suites and compare baseline vs post-change metrics.
|
||
5. Optional compatibility reduction: set `snapshot_table_compat_mode: false` only after step 4 passes and repair workflows are validated.
|
||
6. SQL default switch gate: only evaluate after production-scale Postgres benchmark evidence; otherwise keep `scheduled_aggregation_engine: go`.
|
||
|
||
Rollback triggers:
|
||
- sustained increase in `vctp_*_failed_total` metrics
|
||
- missing/stale summary tables or report outputs
|
||
- material mismatch between totals endpoints and expected aggregates
|
||
- repeated job timeout or cron failure indicators
|
||
|
||
Rollback actions:
|
||
1. Set `scheduled_aggregation_engine: go` (if changed) and restart.
|
||
2. Set `snapshot_table_compat_mode: true` and restart.
|
||
3. Run `POST /api/snapshots/repair/all`.
|
||
4. Run `POST /api/snapshots/regenerate-hourly-reports` and/or `-backfill-vcenter-cache` as needed.
|
||
5. Re-check `/metrics`, `snapshot_runs`, and endpoint/report correctness before closing the incident.
|
||
|
||
## Settings Reference
|
||
All configuration lives under the top-level `settings:` key in `vctp.yml`.
|
||
|
||
General:
|
||
- `settings.log_level`: logging verbosity (e.g., `debug`, `info`, `warn`, `error`)
|
||
- `settings.log_output`: log format, `text` or `json`
|
||
|
||
Database:
|
||
- `settings.database_driver`: `sqlite` or `postgres` (experimental)
|
||
- `settings.enable_experimental_postgres`: set `true` to allow PostgreSQL startup
|
||
- `settings.database_url`: SQLite file path/DSN or PostgreSQL DSN
|
||
|
||
HTTP/TLS:
|
||
- `settings.bind_ip`: IP address to bind the HTTP server
|
||
- `settings.bind_port`: TCP port to bind the HTTP server
|
||
- `settings.bind_port` below `1024` (for example `443`) requires privileged bind permissions.
|
||
The packaged systemd unit grants `CAP_NET_BIND_SERVICE` to the `vctp` user; if you run
|
||
vCTP outside that unit, grant equivalent capability or use a non-privileged port.
|
||
- `settings.bind_disable_tls`: `true` to serve plain HTTP (no TLS)
|
||
- `settings.tls_cert_filename`: PEM certificate path (TLS mode)
|
||
- `settings.tls_key_filename`: PEM private key path (TLS mode)
|
||
|
||
Authentication:
|
||
- `settings.auth_enabled`: enables LDAP/JWT auth components.
|
||
- `settings.auth_mode`: `disabled`, `optional`, or `required`.
|
||
- `settings.auth_jwt_signing_key`: base64 signing key for JWTs.
|
||
- RPM postinstall auto-generates and writes this key to `/etc/dtms/vctp.yml` if it is missing/empty.
|
||
- `settings.auth_token_lifespan_minutes`: JWT access token lifetime.
|
||
- `settings.auth_jwt_issuer`: expected JWT issuer.
|
||
- `settings.auth_jwt_audience`: expected JWT audience.
|
||
- `settings.auth_clock_skew_seconds`: allowed clock skew for token validation.
|
||
- `settings.auth_group_role_mappings`: map of LDAP group DN -> role (`viewer` or `admin`).
|
||
- `settings.ldap_groups`: optional allowlist of LDAP group DNs required for login.
|
||
- `settings.auth_group_role_mappings` must be non-empty when `settings.auth_enabled: true`.
|
||
- A user must belong to at least one mapped group to receive any role and log in.
|
||
- `settings.ldap_groups` empty/omitted means no allowlist filter, but mapped-role requirement still applies.
|
||
- `settings.ldap_bind_address`: LDAP/LDAPS URL used for authentication.
|
||
- `settings.ldap_base_dn`: LDAP base DN fallback used for user lookup when `settings.ldap_user_base_dn` is not set.
|
||
- `settings.ldap_user_base_dn`: optional user lookup base DN; defaults to `settings.ldap_base_dn`.
|
||
- `settings.ldap_trust_cert_file`: optional CA cert file for LDAP TLS.
|
||
- `settings.ldap_disable_validation`: disables LDAP TLS cert validation.
|
||
- `settings.ldap_insecure`: insecure LDAP TLS mode.
|
||
- `settings.enable_pprof`: enables `/debug/pprof/*` routes (still admin-gated).
|
||
|
||
vCenter:
|
||
- `settings.encryption_key`: optional explicit key source for credential encryption/decryption.
|
||
If unset, vCTP derives a host key from hardware/host identity.
|
||
- `settings.vcenter_username`: vCenter username
|
||
- `settings.vcenter_password`: vCenter password (auto-encrypted on startup if plaintext length > 2)
|
||
- `settings.vcenter_insecure`: `true` to skip TLS verification
|
||
- `settings.enable_legacy_api`: set `true` to temporarily re-enable deprecated legacy endpoints
|
||
- `settings.vcenter_event_polling_seconds`: deprecated and ignored
|
||
- `settings.vcenter_inventory_polling_seconds`: deprecated and ignored
|
||
- `settings.vcenter_inventory_snapshot_seconds`: hourly snapshot cadence (seconds)
|
||
- `settings.vcenter_inventory_aggregate_seconds`: daily aggregation cadence (seconds)
|
||
- `settings.vcenter_addresses`: list of vCenter SDK URLs to monitor
|
||
|
||
Credential encryption:
|
||
- New encrypted values are written with `enc:v1:` prefix.
|
||
|
||
Snapshots:
|
||
- `settings.hourly_snapshot_concurrency`: max concurrent vCenter snapshots (0 = unlimited)
|
||
- `settings.hourly_snapshot_max_age_days`: retention for hourly tables
|
||
- `settings.daily_snapshot_max_age_months`: retention for daily tables
|
||
- `settings.hourly_index_max_age_days`: age gate for keeping per-hourly-table indexes (`-1` disables cleanup, `0` trims all)
|
||
- `settings.snapshot_cleanup_cron`: cron expression for cleanup job
|
||
- `settings.reports_dir`: directory to store generated XLSX reports (default: `/var/lib/vctp/reports`)
|
||
- `settings.capture_write_batch_size`: hourly canonical write batch size (default: `1000`)
|
||
- `settings.snapshot_table_compat_mode`: keep writing legacy hourly snapshot tables during migration (default: `true`)
|
||
- `settings.async_report_generation`: defer report generation from the hourly capture hot path (default: `true`)
|
||
- `settings.report_summary_pivots`: optional list to override Summary worksheet pivot titles/names/ranges in daily/monthly XLSX reports
|
||
- `metric`: one of `avg_vcpu`, `avg_ram`, `prorated_vm_count`, `vm_name_count`
|
||
- `title`: pivot title text shown on Summary sheet
|
||
- `pivot_name`: internal pivot table name in the XLSX workbook
|
||
- `pivot_range`: target range (for example `Summary!A3:H40` or `A3:H40`)
|
||
- `title_cell` (optional): explicit title cell; if omitted, derived from `pivot_range`
|
||
- `settings.hourly_snapshot_retry_seconds`: interval for retrying failed hourly snapshots (default: 300 seconds)
|
||
- `settings.hourly_snapshot_max_retries`: maximum retry attempts per vCenter snapshot (default: 3)
|
||
- `settings.postgres_vm_hourly_partitioning_enabled`: Postgres-only toggle to migrate/manage `vm_hourly_stats` as monthly range partitions (default: `false`)
|
||
- `settings.scheduled_aggregation_engine`: scheduled daily/monthly engine (`go` default, `sql` for canonical SQL rollout)
|
||
|
||
Filters/chargeback:
|
||
- `settings.tenants_to_filter`: list of tenant name patterns to exclude
|
||
- `settings.node_charge_clusters`: list of cluster name patterns for node chargeback
|
||
- `settings.srm_activeactive_vms`: list of SRM Active/Active VM name patterns
|
||
|
||
# Developer setup
|
||
|
||
## Pre-requisite tools
|
||
|
||
```shell
|
||
go install github.com/a-h/templ/cmd/templ@v0.3.977
|
||
go install github.com/sqlc-dev/sqlc/cmd/sqlc@v1.29.0
|
||
go install github.com/swaggo/swag/cmd/swag@v1.16.6
|
||
```
|
||
|
||
## Database
|
||
This project now uses [goose](https://github.com/pressly/goose) for DB migrations.
|
||
|
||
Install via `brew install goose` on a mac, or install via golang with command `go install github.com/pressly/goose/v3/cmd/goose@latest`
|
||
|
||
Create a new up/down migration file with this command
|
||
```shell
|
||
goose -dir db/migrations sqlite3 ./db.sqlite3 create init sql
|
||
```
|
||
|
||
```shell
|
||
sqlc generate
|
||
```
|
||
|
||
## HTML templates
|
||
Run `templ generate -path ./components` to generate code based on template files
|
||
|
||
## Documentation
|
||
Run `swag init --exclude "pkg.mod,pkg.build,pkg.tools" -o server/router/docs`
|
||
|
||
## Tests
|
||
Run the test suite:
|
||
|
||
```shell
|
||
go test ./...
|
||
```
|
||
|
||
Recommended static analysis:
|
||
|
||
```shell
|
||
go vet ./...
|
||
```
|
||
|
||
## CI/CD (Drone)
|
||
- `.drone.yml` defines a Docker pipeline:
|
||
- Restore/build caches for Go modules/tools.
|
||
- Build step installs generators (`templ`, `sqlc`, `swag`), regenerates code/docs, runs project scripts, and produces the `vctp-linux-amd64` binary.
|
||
- RPM step packages via `nfpm` using `vctp.yml`, emits RPMs into `./build/`.
|
||
- Optional SFTP deploy step uploads build artifacts (e.g., `vctp*`) to a remote host.
|
||
- Cache rebuild step preserves Go caches across runs.
|